Back to [Archive] Post-review discussions

1
[OQSPS] Does sub-European genomic ancestry predict outcomes for US states?
Admin
Title:
Does sub-European genomic ancestry predict outcomes for US states?

Authors:
Emil O. W. Kirkegaard

Abstract:
Estimates of sub-European ancestry among European Americans by US state were obtained from a recent study of customers of the personal genomics company 23andme (N=148,789). The ancestry estimates were used to attempt to predict cognitive ability and socioeconomic performance across US states (N=50). However, results indicated that they had no reliable predictive validity.

Keywords:
ancestry, admixture, European, White, United States, general socioeconomic factor, S-factor, NAEP, cognitive ability, intelligence, 23andme, LASSO regression

Files:
https://osf.io/3nghx/
Admin
A well-done, simple analysis of the relationship between sub-European genomic ancestry and social outcomes. Before approving the submission for publication, please address the following points:

1. It is not exactly clear whether the estimates presented in Tables 1 and 2 are from one multivariate model or multiple bivariate models. I assume the former?

2. It is not clear how one interprets the estimates presented in Tables 1 and 2. Are the dependent variables standardised and the independent variables unstandardised (i.e., given in percentage point units)?

3. If "yes" to the above question, consider providing standardised betas as well.

4. Unless I am mistaken, some sub-European ancestries are more closely related than others (e.g., Italian and Iberian are more closely related than Italian and Finnish). Might it therefore be worth combining closely-related ancestries in an attempt to achieve greater predictive power? For example, rather than just estimating the association between percentage Iberian and cognitive ability, consider also estimating the association between (say) percentage Southern European and cognitive ability.
Admin
Thanks for reviewing.

A well-done, simple analysis of the relationship between sub-European genomic ancestry and social outcomes. Before approving the submission for publication, please address the following points:

1. It is not exactly clear whether the estimates presented in Tables 1 and 2 are from one multivariate model or multiple bivariate models. I assume the former?


It is the former. I have rephrased the text in (3) to be:

The data were analyzed using multiple regression as done in the previous studies (Fuerst & Kirkegaard, 2016a, 2016b; Kirkegaard & Fuerst, 2016).

This should make it clear that it is one model.

2. It is not clear how one interprets the estimates presented in Tables 1 and 2. Are the dependent variables standardised and the independent variables unstandardised (i.e., given in percentage point units)?

3. If "yes" to the above question, consider providing standardised betas as well.


These are standardized betas. I have changed the caption to be:

Table 1: OLS regression standardized betas for cognitive ability among European American US states. N=50. R2 = .31. Cross-validated R2 = -3.16.1 CI = 95% analytic confidence intervals.

4. Unless I am mistaken, some sub-European ancestries are more closely related than others (e.g., Italian and Iberian are more closely related than Italian and Finnish). Might it therefore be worth combining closely-related ancestries in an attempt to achieve greater predictive power? For example, rather than just estimating the association between percentage Iberian and cognitive ability, consider also estimating the association between (say) percentage Southern European and cognitive ability.


One could do this in an attempt to boost power. (Every addition of a new predictor in general increases the standard error of the already included predictors. The more correlated they are, the larger the effect.)

The question is how to add them up. Perhaps:

Southern non-Eastern Europe = Iberian + Sardinian + Italian.
Nordic = Scandinavian + Finnish.

Then I reran the models. Same results. Added a new subsection about this.

---

Updated files on OSF. https://osf.io/3nghx/files/
Admin
At the moment I am satisfied, and therefore approve the paper for publication.
Southern non-Eastern Europe = Iberian + Sardinian + Italian.
Nordic = Scandinavian + Finnish.

Then I reran the models. Same results. Added a new subsection about this.

---


Sardinia is a genetic outlier. You cannot group it with other Southern European populations. Besides, it's more similar to Middle Eastern ones.
Admin
Davide is right.

http://www.nature.com/ncomms/journal/v3/n2/full/ncomms1701.html

[attachment=734]

Instead I formed the Southern group based on just Iberian and Italian, but this made no difference to the results. Still no detectable validity of ancestries.

I have updated the paper to reflect these small changes + some other minor edits.
Admin
I asked AJ Figueredo if he has time to be an external reviewer for this paper. AFJ previously commented on a similar paper that was a target article in Mankind Quarterly.
Admin
I have asked Meisenberg to review this paper because AJ was not replying.
The paper is completely a-theoretical (which I'm ok with, but many out there seem not). Why bother studying whether sub-European ancestry predicts U.S. state IQ? What were you expecting to find / what motivated the analyses beyond having the data?
Admin
No particular hypothesis were proposed aside from that based on known group means in cognitive ability. This is a weak prediction because immigrant selection affects the mean cognitive ability level of the immigrants who enter the country. Still, it was worth checking out because the data was available and it was a known limitation of our larger earlier admixture analyses. Would you like me to add a few words about this in the introduction?
No particular hypothesis were proposed aside from that based on known group means in cognitive ability. This is a weak prediction because immigrant selection affects the mean cognitive ability level of the immigrants who enter the country. Still, it was worth checking out because the data was available and it was a known limitation of our larger earlier admixture analyses. Would you like me to add a few words about this in the introduction?


Yes, I think that would improve the paper. For the same reason, the abstract seemed to end abruptly. I was expecting another sentence stating why there were no effects. Perhaps mention the sampling bias issue (23andme customers have a higher, more narrow range of IQs)???

Maybe it's for other papers, but does temperature/latitude correlate with sub-group ancestry? I was also wondering about religiosity. You don't have to include these; just curious.

Is 50 really a small sample size, if each data point is derived from 1000s of cases?

Footnote 5, change "data is" to "data are".

I think it would be useful to also include the simple correlation matrix with IQ, S, and the sub-group percents.
Admin
No particular hypothesis were proposed aside from that based on known group means in cognitive ability. This is a weak prediction because immigrant selection affects the mean cognitive ability level of the immigrants who enter the country. Still, it was worth checking out because the data was available and it was a known limitation of our larger earlier admixture analyses. Would you like me to add a few words about this in the introduction?


Yes, I think that would improve the paper. For the same reason, the abstract seemed to end abruptly. I was expecting another sentence stating why there were no effects. Perhaps mention the sampling bias issue (23andme customers have a higher, more narrow range of IQs)???

Maybe it's for other papers, but does temperature/latitude correlate with sub-group ancestry? I was also wondering about religiosity. You don't have to include these; just curious.

Is 50 really a small sample size, if each data point is derived from 1000s of cases?

Footnote 5, change "data is" to "data are".

I think it would be useful to also include the simple correlation matrix with IQ, S, and the sub-group percents.


Bryan,

The reason no such claim was made in the abstract is that.. I don't know. I suspect it's a power issue, but one cannot say with certainty for now. The matter is discussed in the discussion.

I read my introduction, but it seems fine to me. It mentions previous literature investigating the issue, sets the theoretical background. However, to make you happier, I have added a paragraph in the Discussion discussing the theoretical prediction in more detail and why it is not a strong prediction due to confounds.

As discussed in the paper, only relative differences between states in ancestry matter, not absolute levels. The effect of this kind of positive selection is to reduce the variation in ancestry, at least theoretically at the maximum selection level. However, depending on the ranking of the ancestries and the amount of selection, it might increase variance before decreasing it. It's not really possible to say which influence it has in the present study with certainty.

A simple size of 50 is small no matter how many people it is an average of. State-level confounding effects (such as location) do not care about how many persons went into an average used for that state. For models with many predictors, N=50 is very small.

Changed to "data are".

I have added a new subsection with the correlations between ancestries and outcomes as requested. Some odd results, presumably due to outliers. LASSO tends to get rid of outlier effects because it uses cross-validation (based on resampling) to estimate the shrinkage parameter.

I have rewritten a few sentences here and there.

And added your name to the Acknowledgements section. :)

New version uploaded, version 6.

https://osf.io/3nghx/files/
Thanks Emil,

The paper is pretty straightforward, and I think you've addressed the few concerns I had. I'm ok approving.

Bryan
Negative results, OK, but also negative results need to be published to avoid publication bias and needless duplication of effort. This is probably all that could be done with the available data. Throwing additional predictors into the already overfitted regression models (like measures of climate, natural resources, presence of non-European populations etc) is not an option. Also, it would capitalize on chance. I think the paper should be published, because it shows that more fine-grained measures, for example from counties, or temporal trend measures, are required.
Admin
This paper now has 3 approvals as required for publication (Noah Carl, Bryan Pesta, and Meisenberg).

I will work out a final version and Julius will make it pretty.