Back to [Archive] Post-review discussions
Novel findings about recent selection on stature and the universality of genetic effects on phenotype.
Line 52 needs the link fixed.
Table 4 needs some rearranging so that numbers are not unnecessarily split over lines.
Table 5 refers to "top 10", but there are only 4 listed (?).
The negative loading of the Chinese study may be because the decreasing alleles instead of the increasing ones are mentioned. A data error not a sampling problem. Have you contacted the authors to make sure that they are correct?
Interesting combination of Allen's rule and Lynn's theory (rule?). The correlation inside populations might be due to assortative mating for height given its usefulness in physical combat. Or it the association of height and g is due to mutational load decreasing both hence creating the genetic correlation.
Table 4 needs some rearranging so that numbers are not unnecessarily split over lines.
Table 5 refers to "top 10", but there are only 4 listed (?).
The negative loading of the Chinese study may be because the decreasing alleles instead of the increasing ones are mentioned. A data error not a sampling problem. Have you contacted the authors to make sure that they are correct?
Interesting combination of Allen's rule and Lynn's theory (rule?). The correlation inside populations might be due to assortative mating for height given its usefulness in physical combat. Or it the association of height and g is due to mutational load decreasing both hence creating the genetic correlation.
The mutational load-height link you mentioned is interesting and would provide a good explanation of the "paradox", as mutational load does not vary substantially by race. I double checked on SNPedia and it seems like their results are correct. However, their significant hits are only 5, compared to 10 of the other 5 studies. So the weight should be half. I suppose the polygenic scores should consist of the same number of alleles (10 for each of the 6 studies) in order to avoid a few "bad" alleles to skew the results. However, I tried to run PCA without the Hao et al's polygenic score and the results are practically identical
The negative loading of the Chinese study may be because the decreasing alleles instead of the increasing ones are mentioned. A data error not a sampling problem. Have you contacted the authors to make sure that they are correct?
Interesting combination of Allen's rule and Lynn's theory (rule?). The correlation inside populations might be due to assortative mating for height given its usefulness in physical combat. Or it the association of height and g is due to mutational load decreasing both hence creating the genetic correlation.
Good data, sloppy presentation.
1. Almost half the references that are called out in the text are missing from the reference list.
2. The last page seems to be missing, so that Table 5 is truncated and Table 6 is missing entirely.
3. Line 14-15: Do the frequencies of all 46 alleles differ across populations? With so many SNPs, my guess is that only a certain proportion of them load on the first principal component. This is inevitable because it is extremely unlikely that all 46 SNPs replicate. The GWAS literature is full of associations that nobody can replicate. Suggestion: Expand Table 5 to present all 46 SNPs, with their PC loadings and the p value reported in the original study. What is the correlation between PC loading and p value? If SNPs with higher PC loadings tend to have lower p values, this would support the suspicion that many of those that don't load on the PC are false positives. If there is no such correlation, perhaps we need not one PC but 2 or 3. Also, somewhere you should mention that these 46 SNPs are not strongly linked and therefore can be considered independent in a statistical sense.
4. Line 30: Better allele frequencies instead of gene frequencies.
5. Line 30-31: Statistically significant in what comparison? Did you relate average height in the population to the PC for height-increasing alleles?
6. Line 33: The "focusing on a single study" comes out of the blue. Better: 2 sentences earlier, include the information that the earlier study used results from only one study. What study was this?
7. Line 51: The "et al" is missing from some of the references cited here, and the Hao reference is not in the list.
8. Spell out PCA.
9. Line 60-61: Was this a PCA without rotation? Was the unrotated first component the only component worth studying, or were there other components as well? Best: Present the scree plot, so readers can see whether a single component fits best or if there is something else that needs to be explored either in this study or in subsequent studies. Generally: Don't simplify too much and don't gloss over those findings that don't fit your theory or anybody else's theories. Data that don't fit into the theoretical mold are exactly what we need to guide future research.
10. For the same reason, include in the discussion the seeming anomaly of high prevalence of height-increasing alleles in African hunter-gatherers. These people (bushmen, pygmies) are not exactly known for their tall stature. This means there must be genetic determinants of height that are different from those that have been picked up in the association studies, or at least from those that form your first principal component. For future research you could propose, for example, that association studies for height should be performed in admixed populations such as Bantu-Pygmy. Such admixed populations are found in Africa. Perhaps some minor component that you find in PCA of all 46 SNPs with varimax rotation proves to be predictive of height differences between African hunter-gatherers and other Africans. Another kind of research that you can propose for the future: look at the functions of the genes in which the height SNPs are located. As far as I know there is a gene ontology data base somewhere, and perhaps you can group the genes that seem to be involved in height into different categories such as connective tissue proteins and growth factor receptors or signaling.
10. You discuss cold winters theory and Allen's rule. Another factor that might be important is selection for lower energy expenditure. Reducing body size is a brute-force approach through which natural selection can reduce energy expenditure. Perhaps East Asians are short because of cold winters, and pygmies and bushmen are short because of undernutrition in marginal habitats. In that case you cannot expect that the same shortness alleles were selected in the different populations.
1. Almost half the references that are called out in the text are missing from the reference list.
2. The last page seems to be missing, so that Table 5 is truncated and Table 6 is missing entirely.
3. Line 14-15: Do the frequencies of all 46 alleles differ across populations? With so many SNPs, my guess is that only a certain proportion of them load on the first principal component. This is inevitable because it is extremely unlikely that all 46 SNPs replicate. The GWAS literature is full of associations that nobody can replicate. Suggestion: Expand Table 5 to present all 46 SNPs, with their PC loadings and the p value reported in the original study. What is the correlation between PC loading and p value? If SNPs with higher PC loadings tend to have lower p values, this would support the suspicion that many of those that don't load on the PC are false positives. If there is no such correlation, perhaps we need not one PC but 2 or 3. Also, somewhere you should mention that these 46 SNPs are not strongly linked and therefore can be considered independent in a statistical sense.
4. Line 30: Better allele frequencies instead of gene frequencies.
5. Line 30-31: Statistically significant in what comparison? Did you relate average height in the population to the PC for height-increasing alleles?
6. Line 33: The "focusing on a single study" comes out of the blue. Better: 2 sentences earlier, include the information that the earlier study used results from only one study. What study was this?
7. Line 51: The "et al" is missing from some of the references cited here, and the Hao reference is not in the list.
8. Spell out PCA.
9. Line 60-61: Was this a PCA without rotation? Was the unrotated first component the only component worth studying, or were there other components as well? Best: Present the scree plot, so readers can see whether a single component fits best or if there is something else that needs to be explored either in this study or in subsequent studies. Generally: Don't simplify too much and don't gloss over those findings that don't fit your theory or anybody else's theories. Data that don't fit into the theoretical mold are exactly what we need to guide future research.
10. For the same reason, include in the discussion the seeming anomaly of high prevalence of height-increasing alleles in African hunter-gatherers. These people (bushmen, pygmies) are not exactly known for their tall stature. This means there must be genetic determinants of height that are different from those that have been picked up in the association studies, or at least from those that form your first principal component. For future research you could propose, for example, that association studies for height should be performed in admixed populations such as Bantu-Pygmy. Such admixed populations are found in Africa. Perhaps some minor component that you find in PCA of all 46 SNPs with varimax rotation proves to be predictive of height differences between African hunter-gatherers and other Africans. Another kind of research that you can propose for the future: look at the functions of the genes in which the height SNPs are located. As far as I know there is a gene ontology data base somewhere, and perhaps you can group the genes that seem to be involved in height into different categories such as connective tissue proteins and growth factor receptors or signaling.
10. You discuss cold winters theory and Allen's rule. Another factor that might be important is selection for lower energy expenditure. Reducing body size is a brute-force approach through which natural selection can reduce energy expenditure. Perhaps East Asians are short because of cold winters, and pygmies and bushmen are short because of undernutrition in marginal habitats. In that case you cannot expect that the same shortness alleles were selected in the different populations.
I had computed the correlation between p value and PC loading, but was unsure on whether to report this or not because this was not significant (r= 0.224, p=0.111). However, it's in the expected direction (lower p values associated with higher PC loadings), which suggests there are some false positives.
All the PCAs are WITHOUT rotation because they produced only 1 PC which accounted for the vast majority of the variance.
All the PCAs are WITHOUT rotation because they produced only 1 PC which accounted for the vast majority of the variance.
Which correlation type was tried on the vectors of p-values and PC loading? Perhaps try Spearman's rho as well.
Which correlation type was tried on the vectors of p-values and PC loading? Perhaps try Spearman's rho as well.
Tried it too but got a very similar result
The author also needs to attach the dataset before it can be published.
I will, but so far only 1 reviewer has posted his comments.
Perhaps more would comment if the datasets were attached. For instance, then I could have run the Spearman correlations myself and reported them (although I'm not a reviewer for this journal). Attaching datasets is a requirement per the rules for submission. Not just a requirement for 'just before publication'.
That's a highly contentious issue. Many authors do not want to share their data set before an article is published, for obvious reasons.
I will, but so far only 1 reviewer has posted his comments.
I noticed your thing days ago. But I was too busy.
So I read it now. That was interesting. I see no particular problems with it. Some comments anyway.
1- You forgot a "g" in polygenic in table 1.
2- For your ANOVA, I suggest you to either display the table with the independent var. and a little note below this table that precise which is the dependent var. Or just simply write it up in the text. That would greatly ease interpretation, I think.
3- Still with regard to ANOVA, I know that it is recommended to conduct Tuckey post-hoc test after ANOVA (not before) but could you precise the reason in your text ? Some people that are not familiar with stats perhaps don't know that. More precision is always wellcomed. In your text, you write you are conducting ANOVA, then you report the numbers, and just after that, you say you use Tuckey test and this indicates which groups differ significantly from which one. It's too abrupt. There is no clarity at all if you don't precise what's the goal of Tuckey tests, so for those who don't understand ANOVA and Tuckey, these readers are not able to follow your reasoning and what you're trying to do. The Tuckey, if my memory is correct, helps you to determine "which" groups differ from which one, i.e., it assesses the difference in each pairs of groups, whereas ANOVA will only tell you that the groups differ from each other. When you have more than 2 groups, that is a problem for ANOVA. And that is the utility of Tuckey. But you must precise it.
4- Finally, I do not understand the following :
These findings suggest that the selective pressures on stature across populations were opposite to those for IQ or educational attainment. This contrasts the positive correlation found within populations, where taller people have a small IQ advantage over shorter people (Pearce et al., 2005; Humphreys, Davey & Park, 1985). This correlation is in part due to common genetic factors (Marioni et al., 2014) and assortative mating (Beauchamp et al., 2011).
I don't see the chain between the 1rst and 2nd phrases. It seems they don't talk about the same thing ? Also, concerning the last sentence "This correlation is in part due to common genetic...", are you referring to the second sentence ? It's not clear to me.
P.S. : Personally, I see no problem for that publication if you publish the data after its official publication.
Could the authors make the appropriate corrections and then post a new draft. Both GM and MH offered suggestions so far. I will review the paper after the mentioned corrections are made.
Could the authors make the appropriate corrections and then post a new draft. Both GM and MH offered suggestions so far. I will review the paper after the mentioned corrections are made.
Work in progress
Response to GM and updated paper with additional analyses attached.
1. Fixed but I could find only 1 reference missing, not “almost half”.
2. Fixed
3. It’s not possible to carry out PCA on 46 alleles with only 14 populations (1000 genomes) or 9 ethnic groups (ALFRED). That’s why I used polygenic scores instead (each polygenic score being the average of the frequencies of 10 SNPs from each GWAS). As shown in table 2, these polygenic scores all load very highly (with the exception of one) on a single principal component.
There is a weak correlation between p values and PC loadings buti t is not statistically significant. However, given that there is a correlation, I will report it.
4. Fixed
5. Fixed
6. Fixed
7. Fixed
8. Fixed
9. I have opted for not using rotation as the solution was more clearly interpretable and as it’s simpler than using rotation.There is evidence that the low stature of Pygmies is due to only a few genes, thus cannot be captured by common polygenic variation. I have no clue what the height of the San is, I could find no information on the internet. You seem to imply they’re short, are you aware of any studies on the San height?
10. There are many different explanations for the inverse correlation between height and IQ at a cross-racial level. As for now we have no data to test this hypothesis, I won’t delve too much into detail and I will reserve this for a later work on the evolutionary origins of this phenomenon. For the purposes of this paper, I will just report the observed correlation and propose what to me seems like the most plausible explanation.
1. Fixed but I could find only 1 reference missing, not “almost half”.
2. Fixed
3. It’s not possible to carry out PCA on 46 alleles with only 14 populations (1000 genomes) or 9 ethnic groups (ALFRED). That’s why I used polygenic scores instead (each polygenic score being the average of the frequencies of 10 SNPs from each GWAS). As shown in table 2, these polygenic scores all load very highly (with the exception of one) on a single principal component.
There is a weak correlation between p values and PC loadings buti t is not statistically significant. However, given that there is a correlation, I will report it.
4. Fixed
5. Fixed
6. Fixed
7. Fixed
8. Fixed
9. I have opted for not using rotation as the solution was more clearly interpretable and as it’s simpler than using rotation.There is evidence that the low stature of Pygmies is due to only a few genes, thus cannot be captured by common polygenic variation. I have no clue what the height of the San is, I could find no information on the internet. You seem to imply they’re short, are you aware of any studies on the San height?
10. There are many different explanations for the inverse correlation between height and IQ at a cross-racial level. As for now we have no data to test this hypothesis, I won’t delve too much into detail and I will reserve this for a later work on the evolutionary origins of this phenomenon. For the purposes of this paper, I will just report the observed correlation and propose what to me seems like the most plausible explanation.
G. Meisenberg has sent me his review by email and on top of that he's also done some formatting. I attach the review here with my replies. I have also slightly changed my paper to accomodate his comments.The new version is attached!
1)Check the sign of the principal component in the Hao et al study….
Height increasing alleles should load positively on the PC. None of the SNPs in this study shows up in the others. I think the study was just underpowered or not very good at all.
2)What creates the second component is the misfit of Rs1787200A. Somehow, this SNP does not fit the pattern. In what way is this SNP different from the others? Perhaps this could investigated by looking at gene ontology. What are the functions of the genes (if any) that are most likely affected by these 6 SNPs? Perhaps this could be a subject for the next study.
This alleles shows up in the 2 GWAS carried out on Africans.So it’s possible that it has height increasing effects only among Africans.
3) How did you handle the p values of those SNPs that were multiple hits? For example, if an SNP came up with a significance of p=.10-8 in one study and again with a p=10-7 in another study, did you compute this into a p of 10-15? Also, I would not correlate with the p values themselves but with their logarithm. Look at the skewness of the p values and their logarithms. It is often best to use the measure with the lower skewness.
No, I just selected one of the two p values at random. I used the exponent (e.g. 10). I obtained a positive correlation, that is higher exponent (lower p) associated with higher PC loadings. Another reviewer has performed Spearman-rank order correlation and obtained essentially identical results.
4)How was this calculated? With a sample size of 3, I wouldn’t expect any statistical significance.
Sample size is N=46, which is the number of alleles. 3 is the number of groups.
1)Check the sign of the principal component in the Hao et al study….
Height increasing alleles should load positively on the PC. None of the SNPs in this study shows up in the others. I think the study was just underpowered or not very good at all.
2)What creates the second component is the misfit of Rs1787200A. Somehow, this SNP does not fit the pattern. In what way is this SNP different from the others? Perhaps this could investigated by looking at gene ontology. What are the functions of the genes (if any) that are most likely affected by these 6 SNPs? Perhaps this could be a subject for the next study.
This alleles shows up in the 2 GWAS carried out on Africans.So it’s possible that it has height increasing effects only among Africans.
3) How did you handle the p values of those SNPs that were multiple hits? For example, if an SNP came up with a significance of p=.10-8 in one study and again with a p=10-7 in another study, did you compute this into a p of 10-15? Also, I would not correlate with the p values themselves but with their logarithm. Look at the skewness of the p values and their logarithms. It is often best to use the measure with the lower skewness.
No, I just selected one of the two p values at random. I used the exponent (e.g. 10). I obtained a positive correlation, that is higher exponent (lower p) associated with higher PC loadings. Another reviewer has performed Spearman-rank order correlation and obtained essentially identical results.
4)How was this calculated? With a sample size of 3, I wouldn’t expect any statistical significance.
Sample size is N=46, which is the number of alleles. 3 is the number of groups.
G. Meisenberg has sent me his review by email and on top of that he's also done some formatting. I attach the review here with my replies. I have also slightly changed my paper to accomodate his comments.The new version is attached!
I noticed a few (possible) grammatical/typo/wording problems; review attached; make alterations as you deem fit. I approve publication.
I corrected the typos that Chuck has indicated. Attached is the updated version. 2 reviewers (Meng Hu and Chuck) have approved publication, if GM approves it too, this paper can be published.
Files are attached.
Files are attached.
More dataset files...
Thanks for the data, I will look at it. But, for the next time, would it be possible to compress all of the files/data into one file, such as in .ZIP or .RAR ? It's what Emil did here. It's much easier when you have lot of files to share.