Back to [Archive] Post-review discussions

[OBG] Nature of Race Full
First of all, the set of 315 Fst values that I calculated using VCFtools (which employs Weir and Cockeram Fst formula) on 1000 Genomes phase 3 data for 26 populations can be seen here (https://docs.google.com/spreadsheets/d/1n-C061ZAVCjtN_D9RZLCZJYuur-DTSIVP2xUx1HCA2w/edit?usp=sharing ). I report Fst for 1st and 21st chromosomes (columns C and D). They are practically identical (r=0.995) so either can be used to represent the whole genome. Note that these include SNPs and indels. If you use these Fst values in your paper, please cite my last article (http://dx.doi.org/10.6084/m9.figshare.1393160 ) because they are in the supplementary material there.

THERE IS INDEED MUCH CONFUSION ON INTERPRETING FST AS RELATIVE BETWEEN POPULATION VARIANCE.
It appears that the expected BETWEEN population variance should be 2*Fst, after correcting for the inbreeding coefficient.


Davide,

Would it be possible for you to partition global variance into between continental race, between individual within race, and within individual variance?

See table 4 here for an example.

"To measure the differentiation between populations, the widely used statistic FST [17] and its unbiased estimator [18] were used. FST estimates were averaged over all loci, and 95% confidence intervals (CIs) of the average FST were calculated by bootstrap resampling with 10000 replications...Along with FST, variance components were estimated to reflect intra-individual, inter-individual and inter-population differences in genetic variation."

There appear to be programs which allow for this -- but no one does it. If you need, I will write Nishiyama et al. regarding method/statistical program.

Also, link rot: http://dx.doi.org/10.6084/m9.figshare.1393160


The links work, just there were issues with parentheses. My paper is here: http://dx.doi.org/10.6084/m9.figshare.1393160

and Fst values are here: https://docs.google.com/spreadsheets/d/1n-C061ZAVCjtN_D9RZLCZJYuur-DTSIVP2xUx1HCA2w/edit?usp=sharing

Vcftools uses Weir and Cockerham's 1984 formula, which actually includes within-individual variance. So it appears that Sarich and Miele's contention that Fst values have to be multiplied by 2 is wrong, as Fst calculations (at least in the formula provided by Weir and Cockerham) already account for diploidy. So just use the Fst values reported in my excel table to indicate relative between-population differentiation. As you can see from Weir and Cockerham's paper attached, the formula (1) uses three components of variance: Ea=between populations variance; Eb=between individuals within populations; Ec=between gametes within individuals.
Vcftools directly outputs the Fst values and I cannot see the variance components in the output files. However, it should be possible to get them.
I will have to figure out if it's possible to retrieve the single variance components from the VCFtools output and I'll get back to you. However, since Vcftools already calculates Fst automatically using these 3 variance components, is there a specific reason why you need to know their values?
Vcftools uses Weir and Cockerham's 1984 formula, which actually includes within-individual variance. So it appears that Sarich and Miele's contention that Fst values have to be multiplied by 2 is wrong, as Fst calculations (at least in the formula provided by Weir and Cockerham) already account for diploidy.


Ok, but look at the equation on page 1359 (the right hand side)

Their 0 value, which is said (on page 1358, bottom right) to be ~ Wright's FST is

a/ (a + b + c),

where

a = between populations
b = between individuals within populations
c = between gametes within individuals

But Sarich's points was that the relevant value should be:

a/ (a+b)

because when between group trait comparisons are made no one typically includes intra-individual variance (though one could).

so you would need to find a way to exclude c

-- but, ya, one of the program lines should include the a/b/c values.

....
Vcftools uses Weir and Cockerham's 1984 formula, which actually includes within-individual variance. So it appears that Sarich and Miele's contention that Fst values have to be multiplied by 2 is wrong, as Fst calculations (at least in the formula provided by Weir and Cockerham) already account for diploidy.


Ok, but look at the equation on page 1359 (the right hand side)

Their 0 value, which is said (on page 1358, bottom right) to be ~ Wright's FST is

a/ (a + b + c),

where

a = between populations
b = between individuals within populations
c = between gametes within individuals

But Sarich's points was that the relevant value should be:

a/ (a+b)

because when between group trait comparisons are made no one typically includes intra-individual variance (though one could).

so you would need to find a way to exclude c

-- but, ya, one of the program lines should include the a/b/c values.

....


I will try to find a way to locate the partitions of the a/b/c values in the VCFtools output (no guarantee I will be successful but I may also ask around). However, I am a bit puzzled by Sarich's argument. I think that if Fst includes within individuals variance, it is for a reason. I guess what he's trying to say is that when you compare phenotypes you ignore within-individual variability, as phenotypes do not vary within individuals. E.g. a person cannot have both black and blonde hair, although they can have the alleles both for black and blonde hairs. But if we are considering genetic distances alone, then I think it's right to include between gametes within-individuals variance, so the Fst values should be used as they are.
then I think it's right to include between gametes within-individuals variance, so the Fst values should be used as they are.


Ok, but some respectable population geneticists say that e.g., 85% to 95% of the variance is between individuals. Whatever we deem to be the legitimate genetic differentiation value, I hope we can agree that such a claim is factually incorrect. I emailed Guido Barbujani years back (2011):

...

"I recently came across your paper "Human genome diversity: frequently asked questions" and noticed the following passage, concerning the amount of genetic variation between individuals of different population:

"Moving to the second question, differences between populations are often summarized by another popular figure, FST = 0.15 (Box 2), and this means that they account for roughly 15% of the species’ genetic variance [17–19]. The remaining 85% represents the average difference between members of the same population. One way to envisage these figures is to say that the expected genetic difference between unrelated individuals from distant continents exceeds by 15% the expected difference between members of the same community [20]"

...So, it's incorrect to say "The remaining 85% represents the average difference between members of the same population." This is not a trivial point, since some argue that the variance between individuals of different populations is "too small" for significant mean phenotypic differences (medical, etc.). In reality, if I'm reading this right, around 1/3 of the total genetic variance between individuals is between individuals of different populations."

...

He replied:

"thanks for your mail and for your interest in our work. I am not sure I completely agree with your argument. When it comes to judging whether population A differs from population B, does it really count whether variation within A (and B) is distributed mostly within or between individuals? The point, I think, is that the two distributions broadly overlap and their averages are close to each other....This is the crucial point; human populations are, on average, internally so variable, that each of their members happens to resemble some members of distant populations more than some other members of the same population. This has major practical implications, for instance in the unsubstantiated claim that some form of racial medicine is possible or useful."

...

He then went on to repeat the claim since and to use it to argue against the possibility of medically relevant and other genetic differences. Sarich's position, unlike Barbujani's, was at least not factually incorrect. Sarich just noted that one could partition out intra-individual variance. Whatever the case -- I'm interested in divergence in neutral traits -- which is the real issue. And we still haven't resolved that question. I think we only re-established -- based on a firmer understanding -- that some genetic variance is within individuals.

When it comes to quantitative differences, contra Barbujani, it might "really count whether variation within A (and B) is distributed mostly within or between individuals[/b]?"

2*Fst or 1*Fst.

Anyways, thanks for helping me think this through.
then I think it's right to include between gametes within-individuals variance, so the Fst values should be used as they are.


Ok, but some respectable population geneticists say that e.g., 85% to 95% of the variance is between individuals. Whatever we deem to be the legitimate genetic differentiation value, I hope we can agree that such a claim is factually incorrect. I emailed Guido Barbujani years back (2011):

...

"I recently came across your paper "Human genome diversity: frequently asked questions" and noticed the following passage, concerning the amount of genetic variation between individuals of different population:

"Moving to the second question, differences between populations are often summarized by another popular figure, FST = 0.15 (Box 2), and this means that they account for roughly 15% of the species’ genetic variance [17–19]. The remaining 85% represents the average difference between members of the same population. One way to envisage these figures is to say that the expected genetic difference between unrelated individuals from distant continents exceeds by 15% the expected difference between members of the same community [20]"

...So, it's incorrect to say "The remaining 85% represents the average difference between members of the same population." This is not a trivial point, since some argue that the variance between individuals of different populations is "too small" for significant mean phenotypic differences (medical, etc.). In reality, if I'm reading this right, around 1/3 of the total genetic variance between individuals is between individuals of different populations."

...

He replied:

"thanks for your mail and for your interest in our work. I am not sure I completely agree with your argument. When it comes to judging whether population A differs from population B, does it really count whether variation within A (and B) is distributed mostly within or between individuals? The point, I think, is that the two distributions broadly overlap and their averages are close to each other....This is the crucial point; human populations are, on average, internally so variable, that each of their members happens to resemble some members of distant populations more than some other members of the same population. This has major practical implications, for instance in the unsubstantiated claim that some form of racial medicine is possible or useful."

...

He then went on to repeat the claim since and to use it to argue against the possibility of medically relevant and other genetic differences. Sarich's position, unlike Barbujani's, was at least not factually incorrect. Sarich just noted that one could partition out intra-individual variance. Whatever the case -- I'm interested in divergence in neutral traits -- which is the real issue. And we still haven't resolved that question. I think we only re-established -- based on a firmer understanding -- that some genetic variance is within individuals.

When it comes to quantitative differences, contra Barbujani, it might "really count whether variation within A (and B) is distributed mostly within or between individuals[/b]?"

2*Fst or 1*Fst.

Anyways, thanks for helping me think this through.


Yes, I agree it's factually incorrect to say that 85% of the variance is between individuals. The real variance between individuals is about half than that (a bit higher than half if we account for inbreeding).

I will try to find out what the a,b,c values for the 1000 Genomes data are.
Pith: My deduction is that for a typical trait varied due to drift phenotypic variation is roughly ~ quantitative genetic variance*h^2, where quantitative genetic variance is roughly 2 SNP Fst; thus, assuming a typical narrow heritability of 0.5, one gets phenotypic variation ~ SNP Fst. And this is why the (phenotypic) craniometric continental eta-squared is ~ 0.12.
Pith: My deduction is that for a typical trait varied due to drift phenotypic variation is roughly ~ quantitative genetic variance*h^2, where quantitative genetic variance is roughly 2 SNP Fst; thus, assuming a typical narrow heritability of 0.5, one gets phenotypic variation ~ SNP Fst. And this is why the (phenotypic) craniometric continental eta-squared is ~ 0.12.


The rewrite was:

As for the soundness of the argument:

A-1. When it comes to discussions of genetic contributions to phenotypic differences between groups, what is of relevance are the differences in the specific genes associated with specific traits, not the average genetic differences between groups. Regarding racial groups, genetic variation at a typical locus will have no functional consequence since a typical locus is selectively neutral. As such, average genetic variation will tend to measure neutral mutations and so index the time of divergence and the degree of isolation between populations (Sarich and Miele, 2004). The upshot is that the average genetic variation across loci does not allow one to predict the amount of differentiation in loci that were under selection – the very ones that are typically relevant when it comes to discussions of behavioral genetic and many other socially significant differences, ones which presumably were and are subject to selective pressure. With regards to these, one must look at differentiation in specific genetic regions (for example: Wu and Zhang, 2011) and the specific genes that code for differences. To give a concrete example in which overall genetic differentiation is unindicative of differentiation with respect to a specific trait which has been under selection, at their extremes, northern and southern Europeans differ in height by approximately one standard deviation (Turchin et al., 2012; supplementary data). These height differences are substantially genetically determined (Turchin et al., 2012). Yet average European interpopulation SNP Fst values are trivial at 0.001 to 0.01 (Tian et al., 2009).

A-2. Measures of genetic differentiation based on fixation (e.g., FST and ΦST) are often poor measures of true genetic differentiation. As Bird et al. (2011) remind us: "Using fixation indices will systematically underestimate genetic differentiation, especially when using highly polymorphic markers such as microsatellites (Hedrick 1999)." This is because maximum Fst values are limited by heterozygosity. To provide an example of this underestimation, Long and Kittles (2003) found a between-population microsatellite Fst of 0.11 based on a sample of human populations; when they added chimpanzees to the set of human populations, the between-population Fst rose only to 0.18. There were several reasons for the results, one of which was that the maximum possible Fst value, given the markers used, was well below the theoretical maximum of 1 both in the case of the human comparisons and in the case of the human and primate comparisons. The take away is that, as Mountain and Risch (2004) noted after citing this example in relation to their discussion of genetic contributions to phenotypic differences among ethnic and racial groups, “a low Fst estimate implies little about the degree to which genes contribute to between-group differences.”

A-3. The above noted, with caution genetic variability as indexed by Fst and Fst analogs can be and often is used to index the expected variation in quantitative traits owing to genetic drift (neutral variation) (Leinonen et al., 2013). That is, researchers sometimes make comparisons between Fst and an index of heritable quantitative trait variability called Qst. This comparison is made to help evaluate if the quantitative trait variation found between populations is larger or smaller than would be expected based on neutral variation (i.e., if the traits under question were under selection). Whitlock (2008) explains the measure Qst:

The calculation of QST for a trait requires two quantities: the additive genetic variance of the trait within a population (V A, within) and the genetic variance among populations (V G, among). For diploids, QST is calculated as

Qst = V G, among / (V G, among + 2V A, within)

For haploids, the same equation applies, but without the '2' in the denominator. [That '2' for the diploid case comes from the fact that the quantitative genetic variance among populations is proportional to two times FST (Wright 1951).]


What is particularly relevant to the present discussion is Whitlock's last statement, since we are interested in predicting quantitative genetic variance from Fst, not comparing Qst to Fst. Amongst diploid populations, the predicted quantitative genetic trait variance is equal to 2FST/(1 + FST) (Leinonen et al., 2013). The 2 in the equation comes from the fact that roughly half of the genetic variation within diploid populations is within individuals. (Note: 68: Cole et al. (1986), alternatively, note: "Wright (1943, 1951) showed, using a model of additive gene effects at a single locus, that variation among populations in the value of a selectively neutral quantitative character is, in expectation, ơ2B = 2Fst ơ20 , where ơ20 is the genetic variance expected under panmixia with the same gene frequencies, and FST is the correlation among uniting gametes relative to the total population.")

A-4. The point immediately above is often missed even by respected population geneticists, so it is worth elaborating on. It is not infrequently erroneously claimed that, in context to human races, 85 to 95% of the total genetic variance is between individuals within populations (e.g., Barbujani and Colonna, 2010). This is incorrect since among diploids a large chunk of the total genetic variance is captured within individuals (Harpending, 2002; Sarich and Miele, 2004). The between population variance values given by Fst and Fst analogs is out of the summed variance between populations, between individuals within populations, and within individuals (e.g., Weir and Cockerham (1984)). Roughly half of the total variance for diploid populations is expected to be intra-individual. As such, the ratio of genetic variance between populations to that between individuals and populations (but not within individuals) is Fst/(Fst+1/2(1-Fst)), which is mathematically equivalent to 2FST/(1 + FST), the predicted amount of quantitative genetic trait variance owing to drift. Nishiyama et al. (2012) give an example which illustrates the flaw in deducing from low between population Fst values that the overwhelming portion of variance is between individuals. The authors decomposed the SNP genetic variance for various Japanese populations into inter-subpopulational, inter-individual, and intra-individual variance. They found that between 96.7 and 99.6% of the variance was located within individuals. When intra-individual variance was partitioned out, roughly the same percent of genetic variance was located between individuals and between subpopulations as between individuals and within subpopulations. The decomposition is shown in Table 4.11 below.

Of course, most of the variance was still “inter-individual” in the sense of inter- plus intra-individual (i.e., intrapopulational). In the same way, of course, most diversity, in general, is “inter-racial” in the sense of inter-racial plus inter individual plus intra-individual. It just was not mostly inter-individual in the sense of exclusively between individuals. Does this matter? Well, it casts the oft-referenced genetic variance ratios in a different light. And it is relevant if one's argument is that phenotypic differences between individuals between groups can not be substantially congenitally conditioned because there is "too little" between group genetic variation (relative to that between individuals within groups).

Table 4.11. Total genetic variance partitioned into variance between subpopulations, among individuals within subpopulations, and within individuals in a Japanese sample

Between Among individuals Within
Population within subpopulations Individuals
Amami vs. Mainland Variance component 0.03 0.02 2.13
Relative proportion (5) 1.2(%) 1.1(%) 97.7(%)
Okinawa vs. Mainland Variance component 0.04 0.03 2.12
Relative proportion (5) 1.9(%) 1.4(%) 96.7(%)
Amami vs. Okinawa Variance component <0.01 <0.01 2.2
Relative proportion (5) 0.2(%) 0.2(%) 99.6(%)

(Based on Table 4 in Nishiyama et al. (2012).)

A-5. Getting back to the main point, if we wish to estimate expected quantitative genetic trait variation it is often advised to avoid using low mutation rate genetic markers such as microsatellites, which, as discussed, have high Hs values and thus necessarily exhibit low Fst values. It is often advised instead to use SNPs, both because these markers do not tend to have very high Hs values and because SNP variation codes for typical quantitative trait variation (Edelaar and Björklund, 2011). Another way to look at this is to consider that the magnitude of (fixation index estimated) genetic differentiation varies by the class of loci analyzed, with part of this variation being attributable to loci variation in Hs (Jakobsson et al., 2013); for example, for humans, continental microsatellite, SNP, and mtDNA Fst values are typically around, respectively, 0.05, 0.12, and 0.20. Were one to try to infer the magnitude of genetically conditioned phenotypic variation from typical indices of fixation (e.g., Fst values), it would make sense to use the class of loci that most likely underpins the relevant trait variation. For example, since variation in single-nucleotide polymorphisms (SNPs) explains variation in many interesting polygenetic traits such height and intelligence (for example: Yang et al., 2010; Davies et al., 2011), it would make more sense to attempt to infer magnitudes of genetic differentiation in these traits from SNP Fst values than from microsatellite or mtDNA ones.

Now, these five considerations set up the problem for the “too little variance” argument, with its implicit premise that the ratio of genetically mediated phenotypic variability in socially significant traits is roughly concordant with the ratio of average genetic variability. As will be seen, the argument lends itself to the opposite of the conclusion drawn by biological race antagonists. This is for the following reasons:

B-1. The magnitude of average genetic differentiation depends on the biological divisions in question. It makes no sense to argue that differences between regional biological races (e.g., Europeans and West Africans) cannot be genetically conditioned on the account of supposedly small differences between continental races (e.g., Caucasoids and Negroids). The magnitudes of the genetic differentiation in SNPs between some regional races are shown below in Table 4.10.

Table 4.10. Intercontinental autosomal genetic distance based on SNPs for 1000 Genomes (below diagonal) and HapMap3 (above diagonal)

YRI (Yoruba) CHB (Chinese) CEU (European)
YRI (Yoruba) 0.183 0.156
CHB (Chinese) 0.161 0.11
CEU (European) 0.139 0.106

(From: Bhatia et al. (2013), Table 2. Based on recommended ratio to average method.)

B-2. The magnitude of SNP differentiation, as indexed by Fst, is not small, even between continental races, according to population genetic and social scientific standards. The median continental race SNP Fst value is said to be around 0.12 (Li et al., 2008; Campbell and Tishkoff, 2008; Elhaik, 2012; Bhatia et al., 2013), with the estimated magnitudes varying somewhat due to the choice of specific loci, the method of aggregation employed, the Fst estimators used, and so on; see Bhatia et al. (2013). With regards to population genetic standards, the difference would be moderate by Sewall Wright’s (1978) not infrequently cited scale. By this:

0 to 0.05 indicates little genetic differentiation; 0.05 to 0.15 indicates moderate genetic differentiation, 0.15 to 0.25 indicates great genetic differentiation, 0.25 indicate very great genetic differentiation.

Now, with regards to social scientific standards, if we naively treat our Fst statistic as indexing the proportion of the total genetic variance lying between groups we can interpret it in terms of eta-squared. A between group variance of 0.12 would be moderate. Alternatively, treating the SNP Fst = 0.12 as an index of between group variance, one can convert the value into standardized differences, a metric in which, in the social sciences, group comparisons are often made. The formula is shown below:

Given the law of total variance:
z = 2(sqrt((a/w)))
z = between group standardized difference; a = ratio of variance between to within populations;
w = variance within populations.

If one assumes normality and equal variances, a 12% between-population variation is equivalent to a d-value of ~0.74, which is typically said to be “medium” to "large." For illustration, the relationships between percent variance between populations and various statistics are shown in Table 4.12 (from Cohen (1988)).

Table 4.12. Interpreting and comparing effect sizes in the social sciences

Size of Effect Cohen's f [1] %Variance (eta-squared) [2] Cohen's d [3] Pearson's r
Small 0.1 1 0.2 0.1
Medium 0.25 6 0.51 0.25
Large 0.4 14 0.81 0.38
[1] Cohen's f = Square Root of eta-squared / (1-eta-squared). [2] Eta-squared is interpretable as the variance that lies between groups relative to the total variance (Cahan and Gamliel, 2011). [3] Cohen's d is the mean difference between populations divided by the pooled standard deviation.

Of course, as noted, contrary to what is sometimes stated (e.g., Fish (2013)), unlike eta-squared, Fst values are rarely out of 1 in practice. For example, in their Table 1 and 2, Xu et al. (2008); give expected heterozygosity (Hs) values for Japanese (JPT), Chinese (CHB), Uyghurs (UIG), Europeans (CEU), and Yorubi (YRI) based on 20177 SNPs. The average Hs came out to about 0.30, meaning that the maximum possible SNP Fst value – the value that would be found if populations had no alleles in common – would be 0.7. Treating Fst as something akin to eta-squared is therefore problematic.

B-3. But all of this neglects the points made in A-3 and A-4. As said, our Fst value is relative to total variance, which includes the irrelevant, in this context, intra-individual variance. Instead of the expected (simply owing to neutral variation) between group quantitative genetic variation being proportional to Fst, it is proportional to roughly 2(Fst). Solving Fst= V G, among / (V G, among + 2V A, within) for a between group quantitative genetic variance value (V G, among), when Fst = 0.12, gives us V G, among = 0.22. This value can then be inputted back into the equation given in B.-2. When done so, we find that more than a little quantitative genetic variance is expected to lie between groups.

Can the “too little variance” argument be salvaged? It cannot. To avoid a racial-hereditarian conclusion, it must be discarded — but how? One could, citing the points made in A-2, argue that, in general, there is little correlation between average genetic variability and genetically mediated phenotypic variability. But this is not the case at least with regard to many classes of character differences. Relethford (2009), for example, notes:

Several studies have looked at estimates of FST based on the global craniometric dataset originally collected by Howells (1973, 1989, 1995, 1996)… Using an average heritability of 0.55, Relethford (1994, 2002) found that estimates of FST based on all 57 traits ranged from 0.11 to 0.14 depending on the number of geographic regions sampled. These FST values are similar in magnitude to those estimated in a number of studies of classical genetic markers and DNA markers.

Relethford's (2009) "craniometric Fst" values approximate Qst ones — and, in this case, they also happen to be roughly equivalent to human (genetic) Fst ones. When we grouped Howells' 28 populations into six major continental races — West Eurasians, East Eurasians, Australoids, Negroids, Amerindians, and Pacific Islanders — and ran an ANOVA using the first principle component we found a (phenotypic craniometric) eta-squared of between 0.10 and 0.14 (depending on the specific method used), which is in line with Relethford's (2009) findings. Assuming a heritability of about 0.55, as Relethford does, we get a craniometric quantitative genetic variance value of around 0.22, as predicted based on the considerations in B-3. (Note 72: "The phenotypic variance between populations is lower than the quantitative genetic variance in proportion with the heritability (i.e., variance in phenotype owing to genes). When h^2 = 1, the phenotypic variance is equal to the quantitative genetic variance")

Similar results have been reported in context to dental traits (e.g., Hanihara (2008)). Taken together, the theory and evidence suggests that, between continental races, Fst values are roughly half of the size of the between group quantitative genetic ones -- at least for traits varied due to drift -- and are roughly proportional to the phenotypic variance values when the narrow heritability of the traits is modest (e.g., 0.5). Yet, as we noted above, Fst values are medium to large as judged by social scientific standards. So if one grants the premise of the "too little variance" argument, that between population genetic variance indexes variance in behavioral genetic or other social relevant traits, one is left with medium to large genetically conditioned differences. Instead of Lewontin's conclusion, one is left with the early Franz Boas's deduction:

It does not seem probable that the minds of races which show variations in their anatomical structure should act in exactly the same manner. Differences of structure must be accompanied by differences of function, physiological as well as psychological; and, as we found clear evidence of difference in structure between the races, so we must anticipate that differences in mental characteristics will be found. [...] (Boas, 1974).

Of course, one could try to argue that differentiation in the genes that underlay interesting physiological and neurological functions is trivial — but the empirical evidence speaks against such an argument. As an example of such evidence, in the context of regional (European, East Asian, West African) population differences, Wu and Zhang (2011) conclude:

[W]e find that genes involved in osteoblast development, hair follicles development, pigmentation, spermatid, nervous system and organ development, and some metabolic pathways have higher levels of population differentiation. Surprisingly, we find that Mendelian-disease genes appear to have a significant excessive of SNPs with high levels of population differentiation, possibly because the incidence and susceptibility of these diseases show differences among populations.
Another way to escape the reverse of the “too little variation” argument would be to reiterate a version of Loring Brace's (1999) argument. According to this, since each population has an equal ability to use language and to develop culture, they must not differ in behavioral traits such as intelligence. This type of argument, of course, is (logically) ridiculous when applied in context to normally distributed traits because within each and every population innumerous subpopulations exist which do differ in these traits. If these subpopulations which differ can exist then populations which exist can differ. Worse, it is already known that human populations do differ in the said traits; the debate is over “why,” not “whether.”

Ultimately, the way around the early Boas' deduction is to reiterate our point A-1. However, numerous philosophers of science and population geneticists have deemed the "too little variance" argument to be valid (for example: Kitcher, 2007; Brown and Armelago, 2001; Barbujani and Colonna, 2010), so perhaps its reversed version can not be so easily dismissed. Such an argument could never establish a genetic basis for specific differences; but, perhaps, as suggested by Boas and others, it provides probabilistic support — a baseline expectation — for the existence of some behavioral genetic differences. Whatever the case, to the extent that the "too little variance" argument is deemed to be valid it clearly fails to support the position in defense of which it has been enlisted.
I post here a review by Kevin MacDonald, recommending publication: https://drive.google.com/file/d/0B7hcznd4DKKQSnpEYUdPNm9DMWVBcUM3ZkhBMHhjUlFCS3RF/view?usp=sharing


Davide,

With three approvals the paper is publishable.

I uploaded the "final" edition. I set the publication date for June 2. (I will look it over one more time tomorrow.)
As I feared, the table 4.4 hasn't changed, yet I asked you a million of times to change it...

And I even provided a table model. If a journal accepts something like table 4.4, that's a shame.

Furthermore, it seems there are many things that have changed since the last time I read it. I prefer you hold it a little bit (say, 2 days maximum) before I read it again, although it's unlikely I would ask you to change anything again.
As I feared, the table 4.4 hasn't changed, yet I asked you a million of times to change it...

And I even provided a table model. If a journal accepts something like table 4.4, that's a shame.

Furthermore, it seems there are many things that have changed since the last time I read it. I prefer you hold it a little bit (say, 2 days maximum) before I read it again, although it's unlikely I would ask you to change anything again.


Dear Meng Hu,

I tried to fix table 4.4 as you requested, however I was unable to.

And yes, I made a few edits. Specifically, I:

Edited the whole of section I
Completely rewrote section I-I
Rewrote the discussion at the end of II-A on the early subspecies concepts
Added a table to section III-B (on old race/subspecies definitions)
Added to the end of IV-F (a figure and discussion on ancient admixture)
Added section IV-J (on potential human species)
Rewrote section IV-K (on genetic differences)
Added to some of the the tables in IV-K and M (i.e., 4.12 and 4.14)
Heavily edited section V-B (particularly the critique of Templeton)
Reorganized and proofed sections VI-C to F

There were no particularly substantive changes as far as I am concerned. I made the changes because new material came available -- for example some translations of French and Latin texts which I did not have before and some material on phylogenetic concepts.

But, yes, I guess that we will hold off on publication until you look things over again.

Thanks for the interest, by the way.

(I really just could not figure out the table 4.4)
First of all, the set of 315 Fst values that I calculated using VCFtools (which employs Weir and Cockeram Fst formula) on 1000 Genomes phase 3 data for 26 populations can be seen here (https://docs.google.com/spreadsheets/d/1n-C061ZAVCjtN_D9RZLCZJYuur-DTSIVP2xUx1HCA2w/edit?usp=sharing ). I report Fst for 1st and 21st chromosomes (columns C and D). They are practically identical (r=0.995) so either can be used to represent the whole genome. Note that these include SNPs and indels. If you use these Fst values in your paper, please cite my last article (http://dx.doi.org/10.6084/m9.figshare.1393160 ) because they are in the supplementary material there.

THERE IS INDEED MUCH CONFUSION ON INTERPRETING FST AS RELATIVE BETWEEN POPULATION VARIANCE.
It appears that the expected BETWEEN population variance should be 2*Fst, after correcting for the inbreeding coefficient.


Davide,

Would it be possible for you to partition global variance into between continental race, between individual within race, and within individual variance?

See table 4 here for an example.

"To measure the differentiation between populations, the widely used statistic FST [17] and its unbiased estimator [18] were used. FST estimates were averaged over all loci, and 95% confidence intervals (CIs) of the average FST were calculated by bootstrap resampling with 10000 replications...Along with FST, variance components were estimated to reflect intra-individual, inter-individual and inter-population differences in genetic variation."

There appear to be programs which allow for this -- but no one does it. If you need, I will write Nishiyama et al. regarding method/statistical program.

Also, link rot: http://dx.doi.org/10.6084/m9.figshare.1393160


Chuck, I ask you to hold for a couple of days. I have worked hard and spent a lot of time trying to get Vcftools to provide variance components (I had to re-write part of the code) and I am almost done, so I'd like you to have a look at those before you publish this paper. Is it ok?
Admin
John,

Your tables are ugly because you insert them as images. I don't know why you do this. You need to insert them as tables.

In LibreOffice it works like this. Supposing you have the data in a spreadsheet (Calc), then select then region you want and copy it. Then use the special paste in Writer. Select paste as real text format (RTF). Now you have a proper table in Writer.
John,

Your tables are ugly because you insert them as images. I don't know why you do this. You need to insert them as tables.

In LibreOffice it works like this. Supposing you have the data in a spreadsheet (Calc), then select then region you want and copy it. Then use the special paste in Writer. Select paste as real text format (RTF). Now you have a proper table in Writer.


Davide,

Sorry, I forgot that you were still working on that.

Emil,

Some of the figures are made by hand using PicPick; so they are naturally inserted as images. Only one table could be formatted otherwise; that is the one which MH mentioned.

I do not know how to use LibreOffice and my file is in Word 2007. I would need a solution using Word. Also, the tables only turn "ugly" when Word --> PDF.
Admin
LibreOffice can open Word files (.doc or .docx).

The solution is presumably similar in Word. Paste special to insert a table as a table. I haven't used Word since I discovered Open Office ~10 years ago, so I don't know exactly.
As I feared, the table 4.4 hasn't changed, yet I asked you a million of times to change it...

And I even provided a table model. If a journal accepts something like table 4.4, that's a shame.

Furthermore, it seems there are many things that have changed since the last time I read it. I prefer you hold it a little bit (say, 2 days maximum) before I read it again, although it's unlikely I would ask you to change anything again.


OK, I redid table 4.4 (and Figure 1.3).
Davide,

Sorry, I forgot that you were still working on that.


I am almost done. I am running the code on all the 10 unique pairwise comparisons for the five 1000 Genome Races. I will put the results into Excel spreadsheet, then add the VCFtools + R code to OSF and the output files (hopefully they'll not be too big even after compression). Will post results here soon.
I post here a review by Kevin MacDonald, recommending publication: https://drive.google.com/file/d/0B7hcznd4DKKQSnpEYUdPNm9DMWVBcUM3ZkhBMHhjUlFCS3RF/view?usp=sharing


I thank Kevin for his considerate review and for taking the time to read this lengthy and complex paper.

I would like to address two points which he made and to solicit feedback since this may allow me to clarify my arguments and presentation.

Point # 1.

Kevin: "In seeking a useful, non-arbitrary concept of race, he emphasizes that “the underlying natural divisions exhibit small discontinuous jumps in genetic distance between them (Rosenberg et al., 2005).” Such “clusters can be identified objectively at a given grain of focus.These clusters can then be used to infer objectively delineated races” (45). This would yield formal definitions of race as naturally occurring descent groups.

Whether such discontinuous jumps in genetic distance do occur between people separated by distinct evolutionary histories is entirely an empirical matter. He argues cogently against the common complaint that because different, even arbitrary divisions can be made, races don’t really exist (45).
Fuerstaddresses the issue of clines which are often used to argue against the concept of race. It is indeed possible to delineate races based only on clines:
Intuitively, it would seem that without significant discontinuities or at least sharp increases in the slope of the cline, racial designations would be invidiously arbitrary. However, Fuerst solves this by distinguishing between formal races, which do show discontinuities (citing Rosenberg, 2011), and informal races."

................

John: I have distinguished between races as natural (i.e., genealogical/genomic) divisions and clines as character gradients. The two phenomena are fundamentally different. The issue of clines, as originally defined, is largely irrelevant to that of races. The former indexes overall ancestry and genetic kinship, the latter does not.

I have noted that races as natural divisions can be cut out of genomic and population continua just as types of radiation (e.g., visible light) can be cut out of the electromagnetic spectrum. I have pointed out that, historically, intraspecific races were conceived as falling in a population continuum. It was their continuous nature which evidenced that they were not species (e.g., Blumenbach, Prichard, Darwin). I have also pointed out that contemporaneously evolutionary and some phylogenetic taxa subspecies (i.e., formally recognized races) can also be cut out likewise (e.g., Mayr and Ashlock, 1991; Groves, 2004).

There is no justification for requiring races, in general, to exhibit discontinuities.

I have made the point, in line with Darwin, that even if races were biologically arbitrarily delineated, they would still make for nature divisions. They would still cut out groups where members were arranged by genetic relationship. I did note that traditional human divisions nonetheless exhibited genomic discontinuities; thus they can be identified by cluster analysis. I also noted that genomic discontinuities provide support, given contemporaneous conventions, for the position that traditional continental races represent taxa subspecies.

Importantly, I think that "biologically arbitrarily" delineated natural divisions are practically useful, just as "electromagnetically arbitrary" divisions are so. In the former case, individuals are arranged by overall genetic similarity which is what is of importance, both empirically and from the perspective of EGI, to the extent that the latter matters to some. Also, I fail to see how cutting out biologically arbitrarily delineated natural divisions would be invidious. Would defining "family" in a way that acknowledged continuous relations -- nuclear to extended kin -- be likewise? I make this point, because in personal communications, Jiannbin Lee Shiao made a similar case (about invidiousness) -- one which seems to be popular among sociologists; he, however, did not reply to my detailed response. I wish someone could explain so that I might better understand the argument.

Point #2

Kevin: "I suspect that race differences in IQ are influenced genetically, not only for the reasons listed by Fuerst, but also for several other reasons, as outlined by Rushton and Jensen (the worldwide distribution of test scores, g factor of mental ability, heritability, brain size and cognitive ability, transracial adoption, racial admixture, regression, related life-history traits, human origins research, and hypothesized environmental variables”

.....

John: I do not find Jensen and Rushton's evidence compelling. I have myself replicated and extended many of their results. In discussions of those, I have noted the problems. The brain size data is dubious; for example, recently a large multi-country Latin American study n > 7000 showed no association between head circumference and racial genomic ancestry. The regression to the mean and Spearman's hypothesis evidence are compatible with a shared environmental explanation -- I have demonstrated this. The published transracial adoption data shows mixed results and are underpowered; unpublished studies, by myself and Jason Malloy, show ambiguous results. The racial gaps in the UK -- and perhaps in the Netherlands -- have greatly diminished. World-wide scores are often found to be measure non-invariant. Racial admixture data, again which I have written about elsewhere, could in theory be explained by a shared environmental hypothesis. Thus, in the paper, I noted that the evidence is largely indirect and that the issue is still undetermined. Those who feel that Whites -- or e.g., Han Chinese in Southeast Asia -- are being unjustly accused of racism and/or discriminated against should try to establish or find more evidence in support of the position that differences have an evolutionary basis.

Again, I thank Kevin for his thoughtful review. I hope my responses help to clarify some issues.

Sincerely,

John
There are many grammar errors, but I don't want to wait until I finish with those details, so I re-approve and will send the corrections to Chuck by mail.

Concerning the modified/added content, I don't see anything I really think is wrong and I also appreciate the fact that the idea that subspecies (unlike intraspecific races) were thought as involving taxonomic categories has been expressed in a better, and more insisting way (so, in a sense, Krom's comments have been useful in this regard). The response to Pigliucci & Kaplan (2003) in section 5 has also been improved.
Publication of this paper is currently on hold until we solve the Fst issue (related to the a,b,c) components. It's our (Chuck and mine) belief that this is an important point that it cannot be ignored. I am working on a solution (I am almost done, just one small last issue to solve) and Chuck in the meantime can make the necessary corrections to grammar errors.