Back to Submissions

Divergent selection on height and cognitive ability: evidence from Fst and polygenic scores

Submission status
Published

Submission Editor
Emil O. W. Kirkegaard

Author
Davide Piffer

Title
Divergent selection on height and cognitive ability: evidence from Fst and polygenic scores

Abstract

Tests of selection based on population differentiation were performed on two highly polygenic traits important for success and satisfaction in life: height and educational attainment (EA).

Polygenic scores (PGS) of EA and height, computed across three public genomic databases revealed differences between populations (1000 Genomes, HGDP, gnomAD) that matched the average IQ and height of ethnic groups (r ~0.9).

A moderately strong correlation between latitude and EA PGS (r= 0.68), significantly deviating from the correlations of random SNPs, suggests the implication of climate (seasonality or winter temperature) for selection on cognitive abilities.

The global Fst index revealed population differentiation at height and EA loci, significantly deviating from random SNPs.

Substantial LD decay between Africans and Europeans was found (r= 0.6) but there was no correlation between Linkage disequilibrium (LD) decay and population differences in polygenic scores (r= 0.015, p= 0.45) for EA, and slight inflation of height PGS difference due to LD decay (r= -0.04, p= 0.0315).

 implying that LD decay does not produce a bias in polygenic scores of non-European populations. Finally, it is shown that PGS differences are more sensitive to SNP significance than Fst, reflecting the major limitations of Fst as an index of selection.

 

 

Keywords
IQ, education, polygenic selection, gwas, polygenic scores

Supplemental materials link
https://osf.io/6dvfc/

Pdf

Paper

Typeset Pdf

Typeset Paper

Reviewers ( 0 / 0 / 3 )
Emund the Old: Accept
Gerhard Meisenberg: Accept
Dr. g: Accept

Sun 31 Jan 2021 09:39

Reviewer

Very nice study, important results, well worth publishing.

General suggestions: Try to elaborate/explicate more in the result section. I think it would serve you well to have an interested member of the public in mind as a reader, so spell out a lot of the genetic jargon. Regardless of that, the results section would benefit from being more cohesive, avoiding the one-sentence exclamations of a whole body of results (typically one for each figure). I think all the implications of each (busy) figure requires quite a lot more unpacking in the text. A good rule of thumb is that the essence of the results should be conveyed in text, and the reader able to understand the gist of it all without the figure (and tables). I have lots and lots of detailed suggestions and comments in a word file that I will try to upload.

Author
Replying to

Very nice study, important results, well worth publishing.

General suggestions: Try to elaborate/explicate more in the result section. I think it would serve you well to have an interested member of the public in mind as a reader, so spell out a lot of the genetic jargon. Regardless of that, the results section would benefit from being more cohesive, avoiding the one-sentence exclamations of a whole body of results (typically one for each figure). I think all the implications of each (busy) figure requires quite a lot more unpacking in the text. A good rule of thumb is that the essence of the results should be conveyed in text, and the reader able to understand the gist of it all without the figure (and tables). I have lots and lots of detailed suggestions and comments in a word file that I will try to upload.

Thanks for the comments. I have uploaded a new version of the manuscript with LD-decay analysis performed also for height, where I found an effect of LD decay on population differences in PGS. Looking forward to receiving your detailed suggestions and comments!

Author
Replying to

Very nice study, important results, well worth publishing.

General suggestions: Try to elaborate/explicate more in the result section. I think it would serve you well to have an interested member of the public in mind as a reader, so spell out a lot of the genetic jargon. Regardless of that, the results section would benefit from being more cohesive, avoiding the one-sentence exclamations of a whole body of results (typically one for each figure). I think all the implications of each (busy) figure requires quite a lot more unpacking in the text. A good rule of thumb is that the essence of the results should be conveyed in text, and the reader able to understand the gist of it all without the figure (and tables). I have lots and lots of detailed suggestions and comments in a word file that I will try to upload.

I have read your detailed comments and suggestions and I revised my manuscript accordingly. I also expanded the LD decay analysis, and found higher European-African PGS difference for Education, but lower for height, after selecting the SNPs with the lowest LD decay (r>0.8). I also computed LD decay for Chinese, and it's somewhat smaller than for Africans. Selecting the SNPs with lower LD decay results in larger Chinese-European gap for EDU PGS too. I ran a multiple linear regression of PGS with Latitude and super-populations as independent variables, and found that latitude's effect on PGS was reduced after taking into account super-population, but it remained significant.

Reviewer

I note that somewhere around half of my comments have been acted upon, and that those that have not been acted upon are mainly about somewhat bigger changes, such as expanding or clarifying sections of the test. I will not repeat any of my suggestions here, but I still think many of them would substantially improve the presentation. One may disagree on that, of course, and that is fine. But I would still want to stress that this is an important study that deserves to be conveniently communicated to as broad a readership as possible. In particular, I would recommend expanding upon the results section, as mentioned before, and also look over the sometimes very short one-sentence paragraphs. Finally, I found some new details that are commented in the PDF.

Reviewer

Sorry it took so long for me to review this manuscript. Here are my suggestions for improving this manuscript:

  • In the introduction, I am surprised that the Kong et al. (2017) study was not cited as another example (this time in Iceland) for modern directional selection on educational attainment. Please incorporate this into the manuscript.
  • The following sentence is too vague: "The existence of correlations with opposite sign to intelligence makes it problematic to use EA as a proxy for IQ." What correlations are you talking about? How strong are they? What are the exact variables you're referring to?
  • Please refer to the Lee et al. (2018) EA PGS as "EA3" to distinguish it from earlier polygenic scores for educational attainment (and any that may come in the future).
  • Page 3: n = 456,837 is not "somewhat smaller" than n = 700,000. Remove the word "somewhat".
  • I'm not fan of using Wikipedia as a source in scholarly articles. On the other hand, I can see that it's probably a more comprehensive list of average nation heights than any single scholarly source would be. If you can find a better source, please use that. Otherwise, I'll let it slide. For countries that have more than one entry (e.g., Russia), please explain how you chose a preferred height measure.
  • The latitude scatterplots seem to be missing some of the populations (e.g., Utah Whites, US Blacks). Please clarify why these populations are missing. (My guess is that it seems to be because the DNA data were not collected from groups that were indigenous to the parts of the world where they live. But it would be nice to know for sure.)
  • Because some of these correlations are based on a very small number of populations (e.g., the gnomAD data), I think the author should report 95% confidence intervals around each correlation so that nobody interprets a greater level of exactness in these results than actually exists.
  • Page 15: Specify that the difference in EA3 PGSs between CEU and YRI was "statistically significant." Don't just use "significant" by itself because people would be more likely to interpret this to mean "important." Also, report SDs for both average EA3 PGSs for the CEU and YRI groups.
  • On p. 15, always say "EA3 PGS" instead of just "PGS" because this paper does make use of more than one PGS. (You will never regret being more exact than less exact in a scholarly paper.)
  • On p. 15, for the paragraph that begins, "There was no correlation between LD decay . . ." add a sentence that interprets this result so that non-experts can know what this result means.
  • Figure 15 needs more clear explanations so that readers can interpret the correlation table more easily. I figured out that "LD decay" in the title and "LD_decay" in the figure are both linkage disequilibrium decay. The "delta" variables seem to correlate with PGS differences, and FST = Fst, of course. The "abs_delta" variables seem to be the absolute allele frequency difference. Thus, I seemed to have figured out the scatterplot, but it took too long and distracted me from reading the main text.
  • Add a sentence or two of interpretation on p. 20 at the end of the section entitled "Difference in correlation coefficient between GWAS significant and quasi-random SNPs." (This is also a place where confidence intervals would help with interpretation. They would make it clear whether pairs of correlations are statistically significantly different or not.)
  • Report the mean and SD of the PGSs that are analyzed on p. 21, preferrably for each population (though descriptive statistics at teh continent level would be acceptable).
  • Page 22: "Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction." This is an important, plausible interpretation that has implications that are very different from the interpretation in the rest of the paragraph. This sentence should be a topic sentence of a new paragraph that explores this interpretation.
  • Page 22: Rewrite ". . . this reult rarely occurs by chance . . ." to ". . . this result would be highly unlikely under a model of random SNPs with similar minor allele frequency . . ."
  • Optional edit to p. 23, paragraph 2: A reminder to the reader that traits can be under selection pressure at the same time for a variety of reasons would be helpful. What springs to mind for me is that this can occur because (a) they are genetically linked and pleiotropic alleles are selected for because one trait is beneficial for passing on an organism's genes and the other trait(s) are selected for by chance, (b) the same environment selects for multiple traits at the same time, or (c) different environmental selection pressures happen to act simultaneously on multiple traits. What I'm trying to say is that it would be nice to remind the reader that we don't know exactly why both EDU and skin pigmentation might be selected for in the ancestors of high-latitude populations.

Good luck with revisions.

Author
Replying to

Please see more comments attached

Thanks for the additional comments. I will incorporate them in my next re-submission.

Author
Replying to

Sorry it took so long for me to review this manuscript. Here are my suggestions for improving this manuscript:

  • In the introduction, I am surprised that the Kong et al. (2017) study was not cited as another example (this time in Iceland) for modern directional selection on educational attainment. Please incorporate this into the manuscript.
  • The following sentence is too vague: "The existence of correlations with opposite sign to intelligence makes it problematic to use EA as a proxy for IQ." What correlations are you talking about? How strong are they? What are the exact variables you're referring to?
  • Please refer to the Lee et al. (2018) EA PGS as "EA3" to distinguish it from earlier polygenic scores for educational attainment (and any that may come in the future).
  • Page 3: n = 456,837 is not "somewhat smaller" than n = 700,000. Remove the word "somewhat".
  • I'm not fan of using Wikipedia as a source in scholarly articles. On the other hand, I can see that it's probably a more comprehensive list of average nation heights than any single scholarly source would be. If you can find a better source, please use that. Otherwise, I'll let it slide. For countries that have more than one entry (e.g., Russia), please explain how you chose a preferred height measure.
  • The latitude scatterplots seem to be missing some of the populations (e.g., Utah Whites, US Blacks). Please clarify why these populations are missing. (My guess is that it seems to be because the DNA data were not collected from groups that were indigenous to the parts of the world where they live. But it would be nice to know for sure.)
  • Because some of these correlations are based on a very small number of populations (e.g., the gnomAD data), I think the author should report 95% confidence intervals around each correlation so that nobody interprets a greater level of exactness in these results than actually exists.
  • Page 15: Specify that the difference in EA3 PGSs between CEU and YRI was "statistically significant." Don't just use "significant" by itself because people would be more likely to interpret this to mean "important." Also, report SDs for both average EA3 PGSs for the CEU and YRI groups.
  • On p. 15, always say "EA3 PGS" instead of just "PGS" because this paper does make use of more than one PGS. (You will never regret being more exact than less exact in a scholarly paper.)
  • On p. 15, for the paragraph that begins, "There was no correlation between LD decay . . ." add a sentence that interprets this result so that non-experts can know what this result means.
  • Figure 15 needs more clear explanations so that readers can interpret the correlation table more easily. I figured out that "LD decay" in the title and "LD_decay" in the figure are both linkage disequilibrium decay. The "delta" variables seem to correlate with PGS differences, and FST = Fst, of course. The "abs_delta" variables seem to be the absolute allele frequency difference. Thus, I seemed to have figured out the scatterplot, but it took too long and distracted me from reading the main text.
  • Add a sentence or two of interpretation on p. 20 at the end of the section entitled "Difference in correlation coefficient between GWAS significant and quasi-random SNPs." (This is also a place where confidence intervals would help with interpretation. They would make it clear whether pairs of correlations are statistically significantly different or not.)
  • Report the mean and SD of the PGSs that are analyzed on p. 21, preferrably for each population (though descriptive statistics at teh continent level would be acceptable).
  • Page 22: "Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction." This is an important, plausible interpretation that has implications that are very different from the interpretation in the rest of the paragraph. This sentence should be a topic sentence of a new paragraph that explores this interpretation.
  • Page 22: Rewrite ". . . this reult rarely occurs by chance . . ." to ". . . this result would be highly unlikely under a model of random SNPs with similar minor allele frequency . . ."
  • Optional edit to p. 23, paragraph 2: A reminder to the reader that traits can be under selection pressure at the same time for a variety of reasons would be helpful. What springs to mind for me is that this can occur because (a) they are genetically linked and pleiotropic alleles are selected for because one trait is beneficial for passing on an organism's genes and the other trait(s) are selected for by chance, (b) the same environment selects for multiple traits at the same time, or (c) different environmental selection pressures happen to act simultaneously on multiple traits. What I'm trying to say is that it would be nice to remind the reader that we don't know exactly why both EDU and skin pigmentation might be selected for in the ancestors of high-latitude populations.

Good luck with revisions.

Thanks for your useful suggestions and comments! I submitted the revised paper. Below, you will find replies to your comments.

  • In the introduction, I am surprised that the Kong et al. (2017) study was not cited as another example (this time in Iceland) for modern directional selection on educational attainment. Please incorporate this into the manuscript.

     

    Done

     

  • The following sentence is too vague: "The existence of correlations with opposite sign to intelligence makes it problematic to use EA as a proxy for IQ." What correlations are you talking about? How strong are they? What are the exact variables you're referring to?

    Added explanation

  • Please refer to the Lee et al. (2018) EA PGS as "EA3" to distinguish it from earlier polygenic scores for educational attainment (and any that may come in the future). Done

  • Page 3: n = 456,837 is not "somewhat smaller" than n = 700,000. Remove the word "somewhat". Done

  • I'm not fan of using Wikipedia as a source in scholarly articles. On the other hand, I can see that it's probably a more comprehensive list of average nation heights than any single scholarly source would be. If you can find a better source, please use that. Otherwise, I'll let it slide. For countries that have more than one entry (e.g., Russia), please explain how you chose a preferred height measure. Added explanation.

  • The latitude scatterplots seem to be missing some of the populations (e.g., Utah Whites, US Blacks). Please clarify why these populations are missing. (My guess is that it seems to be because the DNA data were not collected from groups that were indigenous to the parts of the world where they live. But it would be nice to know for sure.) The latitude x PGS scatterplots include only samples for HGDP. The samples for the other datasets (Ashkenazi, Amish, Utah Whites, etc.) were collected from non-indigenous groups, as you correctly inferred.

  • Because some of these correlations are based on a very small number of populations (e.g., the gnomAD data), I think the author should report 95% confidence intervals around each correlation so that nobody interprets a greater level of exactness in these results than actually exists. A choice was made not to report significance levels or C.I. for the correlations as these would be misleading. In fact, there is a large amount of spatial autocorrelation in the data, and traditional statistical tests treat the data as independent, hence inflating the p values or narrowing the confidence intervals. In previous papers (Piffer, 2015, 2019). I showed that the correlations survive controls for autocorrelations and since the correlations are even stronger here, that analysis doesn’t need to be repeated here. With latitude I used a different approach, the simulation using random SNPs, which gives an idea of how significant the result is.

  • Page 15: Specify that the difference in EA3 PGSs between CEU and YRI was "statistically significant." Don't just use "significant" by itself because people would be more likely to interpret this to mean "important." Also, report SDs for both average EA3 PGSs for the CEU and YRI groups. Done. I reported the confidence interval instead of the SD because the differences in percentage points are so small, the SD is not intuitive in this instance.

  • On p. 15, always say "EA3 PGS" instead of just "PGS" because this paper does make use of more than one PGS. (You will never regret being more exact than less exact in a scholarly paper.). OK

  • On p. 15, for the paragraph that begins, "There was no correlation between LD decay . . ." add a sentence that interprets this result so that non-experts can know what this result means. Done.

  • Figure 15 needs more clear explanations so that readers can interpret the correlation table more easily. I figured out that "LD decay" in the title and "LD_decay" in the figure are both linkage disequilibrium decay. The "delta" variables seem to correlate with PGS differences, and FST = Fst, of course. The "abs_delta" variables seem to be the absolute allele frequency difference. Thus, I seemed to have figured out the scatterplot, but it took too long and distracted me from reading the main text. OK

  • Add a sentence or two of interpretation on p. 20 at the end of the section entitled "Difference in correlation coefficient between GWAS significant and quasi-random SNPs." (This is also a place where confidence intervals would help with interpretation. They would make it clear whether pairs of correlations are statistically significantly different or not.) Added sentence and confidence intervals.

  • Report the mean and SD of the PGSs that are analyzed on p. 21, preferrably for each population (though descriptive statistics at teh continent level would be acceptable). The PGS are standardized (Z scores) and centered around the mean for the whole sample.

  • Page 22: "Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction." This is an important, plausible interpretation that has implications that are very different from the interpretation in the rest of the paragraph. This sentence should be a topic sentence of a new paragraph that explores this interpretation.

  • Page 22: Rewrite ". . . this reult rarely occurs by chance . . ." to ". . . this result would be highly unlikely under a model of random SNPs with similar minor allele frequency . . .". Done

  • Optional edit to p. 23, paragraph 2: A reminder to the reader that traits can be under selection pressure at the same time for a variety of reasons would be helpful. What springs to mind for me is that this can occur because (a) they are genetically linked and pleiotropic alleles are selected for because one trait is beneficial for passing on an organism's genes and the other trait(s) are selected for by chance, (b) the same environment selects for multiple traits at the same time, or (c) different environmental selection pressures happen to act simultaneously on multiple traits. What I'm trying to say is that it would be nice to remind the reader that we don't know exactly why both EDU and skin pigmentation might be selected for in the ancestors of high-latitude populations. OK

Good luck with revisions.

 

 

Reviewer

Most of my concerns have been met, with the following exceptions:

  • I cannot find where explicitly you state in the manuscript that non-indigenous populations are not included in the latitude scatterplots or analyses. This is important information that should be included.
  • This comment from before seems not to have any response or led to any changes in the manuscript:
    • Page 22: "Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction." This is an important, plausible interpretation that has implications that are very different from the interpretation in the rest of the paragraph. This sentence should be a topic sentence of a new paragraph that explores this interpretation.

Once these changes have been made, I will find the manuscript acceptable for publication.

Author
Replying to

Most of my concerns have been met, with the following exceptions:

  • I cannot find where explicitly you state in the manuscript that non-indigenous populations are not included in the latitude scatterplots or analyses. This is important information that should be included.
  • This comment from before seems not to have any response or led to any changes in the manuscript:
    • Page 22: "Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction." This is an important, plausible interpretation that has implications that are very different from the interpretation in the rest of the paragraph. This sentence should be a topic sentence of a new paragraph that explores this interpretation.

Once these changes have been made, I will find the manuscript acceptable for publication.

  • I cannot find where explicitly you state in the manuscript that non-indigenous populations are not included in the latitude scatterplots or analyses. This is important information that should be included. Added to page 4.
  • This comment from before seems not to have any response or led to any changes in the manuscript:
    • Page 22: "Another interpretation is that the tag variants represent random noise which does not bias the population means in one direction." This is an important, plausible interpretation that has implications that are very different from the interpretation in the rest of the paragraph. This sentence should be a topic sentence of a new paragraph that explores this interpretation. Expanded the discussion on this topic. I also added a new method to compute PGS, based on the allele frequency weighted by the LD decay for each SNP and added a table with PGS for YRI and CHB (p.18)

 

Reviewer

My concerns have been adequately addressed, and I believe that this manuscript is acceptable for publication.

Admin

Looks good, I just have a few questions. 

  • On page 4 you mention "Polygenic scores were computed using the three largest publicly available population genetics datasets: 1000 Genomes (“1KG”), gnomAD (https://gnomad.broadinstitute.org/) and the Human Genome Diversity Project (HGDP).", and later write "The HGDP sample". Where would I find this sample? Perhaps I've overlooked something, but looking at your references I can't seem to find it. If I search for it online I get this page: https://www.hagsc.org/hgdp/
  • On page 28 in your references, you mention "The American Journal of Human Genetics, Volume 0, Issue 0". I can't seem to find it. When I search online https://www.sciencedirect.com/journal/the-american-journal-of-human-genetics/issues?page=2 the oldest volume I'm able to find is 61. Do you have a link, doi or a hard copy available? 
  • Perhaps I've overlooked the use of "The American Journal of Human Genetics, Volume 0, Issue 0", because I can only find it in your references, but not in the paper itself. On what page and which line should I look at to find it?
Author
Replying to Mon 22 Mar 2021 22:57

Looks good, I just have a few questions. 

  • On page 4 you mention "Polygenic scores were computed using the three largest publicly available population genetics datasets: 1000 Genomes (“1KG”), gnomAD (https://gnomad.broadinstitute.org/) and the Human Genome Diversity Project (HGDP).", and later write "The HGDP sample". Where would I find this sample? Perhaps I've overlooked something, but looking at your references I can't seem to find it. If I search for it online I get this page: https://www.hagsc.org/hgdp/
  •  
  • On page 28 in your references, you mention "The American Journal of Human Genetics, Volume 0, Issue 0". I can't seem to find it. When I search online https://www.sciencedirect.com/journal/the-american-journal-of-human-genetics/issues?page=2 the oldest volume I'm able to find is 61. Do you have a link, doi or a hard copy available? 
  •  
  • Perhaps I've overlooked the use of "The American Journal of Human Genetics, Volume 0, Issue 0", because I can only find it in your references, but not in the paper itself. On what page and which line should I look at to find it?

Thanks for your comments. I have added a link to the HGDP files. At the moment it's not working for some reason, but that's the link I used to download them from. 

Sorry about the "The American Journal of Human Genetics, Volume 0, Issue 0" reference. That was an error. The correct reference is "Stern, Aaron J. et al. (2021). Disentangling selection on genetically correlated polygenic traits via whole-genome genealogies. American Journal of Human Genetics, 108:219-239. doi: 10.1016/j.ajhg.2020.12.005".

I have updated the paper.

Bot

Authors have updated the submission to version #12

Reviewer

Good work. Some specifics:

Abstract, 5th paragraph:

Substantial Linkage Equilibrium (LD) Decay between Africans and Europeans …

This should be linkage disequilibrium, not linkage equilibrium.


 

Last sentence of abstract:

Finally, it is shown that PGS differences are more sensitive to the significance of GWAS loci than Fst,…

Better: Finally, it is shown that PGS differences are more sensitive than Fst to the significance of GWAS loci

Page 2, last sentence of first complete paragraph: …, which can inflate signals of selection due to co-variance between genes and the environment. Is this current or ancestral environment with which the genes covary?

Table 1: Are these coefficients from regression models with African as the omitted control? This should be mentioned.

Figure 6: I notice that EA polygenic scores for African American and Barbados are almost as low as those of the West African groups, although they have 20% - 25% European admixture. If this is not a fluke, it could mean that most of the admixture-related gains were eaten up by dysgenics. Perhaps something on these lines could be mentioned in the discussion.

Figure 15 is missing a proper heading.

Discussion: You find that European-African polygenic scores for educational attainment differ by more than 1.5 standard deviations. Some readers may find that strange because the actual difference between black and white Americans today is barely one standard deviation, and differences in educational attainment are even less. Such apparent discrepancies could possibly lead people to conclude that therefore there must be something wrong with the genetic results. Can you offer an explanation for this apparent inconsistency? What looks most obvious to me is that the polygenic scores include only the effects of common polymorphisms but not those of rare variants that are mostly subject to mutation-selection balance (aka “genetic garbage”). While polygenic scores reflect the cumulative effect of selection over more than 50,000 years, the genetic garbage is of more recent origin. If recent selection for educational attainment (last few centuries) was the same in different populations, the genetic garbage will be about the same. Therefore, the total genetic difference will be less, and possibly far less, than the 1.5 SD difference in polygenic scores. So, the large size of the PGS difference would indicate that genetic population differences originated through early selection rather than very recent selection.


 

Admin
Replying to Davide Piffer

Thanks for your comments. I have added a link to the HGDP files. At the moment it's not working for some reason, but that's the link I used to download them from. 

Sorry about the "The American Journal of Human Genetics, Volume 0, Issue 0" reference. That was an error. The correct reference is "Stern, Aaron J. et al. (2021). Disentangling selection on genetically correlated polygenic traits via whole-genome genealogies. American Journal of Human Genetics, 108:219-239. doi: 10.1016/j.ajhg.2020.12.005".

I have updated the paper.

Thank you for the update. I especially appreciate the ftp link to the HGDP files. 

Author
Replying to Reviewer 2

Good work. Some specifics:

Abstract, 5th paragraph:

Substantial Linkage Equilibrium (LD) Decay between Africans and Europeans …

This should be linkage disequilibrium, not linkage equilibrium.


 

Last sentence of abstract:

Finally, it is shown that PGS differences are more sensitive to the significance of GWAS loci than Fst,…

Better: Finally, it is shown that PGS differences are more sensitive than Fst to the significance of GWAS loci

Page 2, last sentence of first complete paragraph: …, which can inflate signals of selection due to co-variance between genes and the environment. Is this current or ancestral environment with which the genes covary?

Table 1: Are these coefficients from regression models with African as the omitted control? This should be mentioned.

Figure 6: I notice that EA polygenic scores for African American and Barbados are almost as low as those of the West African groups, although they have 20% - 25% European admixture. If this is not a fluke, it could mean that most of the admixture-related gains were eaten up by dysgenics. Perhaps something on these lines could be mentioned in the discussion.

Figure 15 is missing a proper heading.

Discussion: You find that European-African polygenic scores for educational attainment differ by more than 1.5 standard deviations. Some readers may find that strange because the actual difference between black and white Americans today is barely one standard deviation, and differences in educational attainment are even less. Such apparent discrepancies could possibly lead people to conclude that therefore there must be something wrong with the genetic results. Can you offer an explanation for this apparent inconsistency? What looks most obvious to me is that the polygenic scores include only the effects of common polymorphisms but not those of rare variants that are mostly subject to mutation-selection balance (aka “genetic garbage”). While polygenic scores reflect the cumulative effect of selection over more than 50,000 years, the genetic garbage is of more recent origin. If recent selection for educational attainment (last few centuries) was the same in different populations, the genetic garbage will be about the same. Therefore, the total genetic difference will be less, and possibly far less, than the 1.5 SD difference in polygenic scores. So, the large size of the PGS difference would indicate that genetic population differences originated through early selection rather than very recent selection.


 

1. Changed to Linkage Disequilibrium.

2.Sentence changed.

3. The current environment.

4. Africans are the (omitted) reference level. Added to the paper.

5. Added header to figure.

6.I added a lollipop chart in figure 2. From there, you can see African Caribbean and US Blacks have slightly higher PGS than native Africans. You point out that the difference is very small. However, the 1KG group that is geographically closest to the ancestors of African Americans is the Mende from Sierra Leone, in the broad geographical area (including Liberia) where the slaves came from. The Mende, Sierra Leone sample, has the lowest PGS of the 1KG. This probably explains why despite white admixture, the African American PGS is only slightly above that of Nigerians.

Bot

Authors have updated the submission to version #13

Bot

Authors have updated the submission to version #14

Reviewer
Replying to Forum Bot

Authors have updated the submission to version #14

Good work. Should be accepted after some very minor revisions.

Some specifics:

Abstract, 5th paragraph:

Substantial Linkage Equilibrium (LD) Decay between Africans and Europeans …

This should be linkage disequilibrium, not linkage equilibrium.

 

Last sentence of abstract:

Finally, it is shown that PGS differences are more sensitive to the significance of GWAS loci than Fst,…

Better: Finally, it is shown that PGS differences are more sensitive than Fst to the significance of GWAS loci

Page 2, last sentence of first complete paragraph: …, which can inflate signals of selection due to co-variance between genes and the environment. Is this current or ancestral environment with which the genes covary?

Table 1: Are these coefficients from regression models with African as the omitted control? This should be mentioned.

Figure 6: I notice that EA polygenic scores for African American and Barbados are almost as low as those of the West African groups, although they have 20% - 25% European admixture. If this is not a fluke, it could mean that most of the admixture-related gains were eaten up by dysgenics. Perhaps something on these lines could be mentioned in the discussion.

Figure 15 is missing a proper heading.

Discussion: You find that European-African polygenic scores for educational attainment differ by more than 1.5 standard deviations. Some readers may find that strange because the actual difference between black and white Americans today is barely one standard deviation, and differences in educational attainment are even less. Such apparent discrepancies could possibly lead people to conclude that therefore there must be something wrong with the genetic results. Can you offer an explanation for this apparent inconsistency? What looks most obvious to me is that the polygenic scores include only the effects of common polymorphisms but not those of rare variants that are mostly subject to mutation-selection balance (aka “genetic garbage”). While polygenic scores reflect the cumulative effect of selection over more than 50,000 years, the genetic garbage is of more recent origin. If recent selection for educational attainment (last few centuries) was the same in different populations, the genetic garbage will be about the same. Therefore, the total genetic difference will be less, and possibly far less, than the 1.5 SD difference in polygenic scores. So, the large size of the PGS difference would indicate that genetic population differences originated through early selection rather than very recent selection.