Although the submission editor has not asked me to be one of the peer reviewers, one of the authors has asked if I could provide additional comments for the paper. As such my suggestions can be taken at the authors’ discretion.
The authors suppose that if structural racism is causing SES differences between races then socially-defined race should predict lower SES outcomes for groups discriminated against, independent of the effect of genetic ancestry. The authors test this prediction by regressing two SES variables (household income and educational attainment) on measures of genetic ancestry and socially identified race. After finding no significant effect size of socially identified race, or effects in the opposite direction from what theories of structural racism predict, the authors conclude that their result is inconsistent with the theory of structural racism. The use of ancestry regression to tease apart the effects of discrimination and genetics on SES is novel and commendable.
I would like to have seen more discussion of potential problems with the methodology, their effects and whether or not they are important. Genetic ancestry is not randomly assigned and socially identified race may correlate with the SES outcomes for reasons other than racial discrimination (SES causing racial identification, collider bias etc.) In particular the problem of assortative mating is not discussed – who selects into mixed race relationships. It is probably worth being explicit that these variables are not exogenous and how potential confounds and colliders may or may not complicate the test of structural racism.
An unusual aspect of the approach is that offspring genetic ancestry is used as a proxy for the parents’ genetic ancestry. The authors highlight this as a “major limitation”, but don’t discuss what the effect this approach might have. Yes, the authors use the household’s SES instead of the respondent parent’s, ensuring that self-identified race does not independently tell us anything much about the genetic ancestry of the parents, but there are still problems with the approach.
Upon considering for the child’s genetic ancestry, the racial identification of the parent tells us whether or not they have miscegenated. I think will induce a bias on the estimated effect of socially identified race, especially since the respondent parent is (I suspect) most likely the mother. Thus self-reported Black mother, given mixed race offspring, may imply a higher SES of the household than if the mother reported being White. This might also be why the authors find black identification predicts higher SES conditional on genetic ancestry of the offspring.
Regardless of what causes black identification to predict higher SES, it is perhaps surprising and should be discussed e.g. do the authors think it shows structural racism in favour of Black people, or is it explained by some sort of bias like selection into mixed race relationships? An effect size of 0.24 does seem quite substantial and worth considering.
The authors do not seem to report the sex ratio of respondent parents. I suspect they will find it is mothers who respond to scientific surveys. This is important information since we know the correlates of mixed race relationships depends upon which partner is white and which is black. An interesting test would be to interact sex with identified race. This might be able to help rule out the weird biases created by using offspring genetic data when there is also selection into mixed race relationships.
Do the authors have the racial identification of both partners? I think not, but if so this information should be incorporated into the design. I think this might help remove some biases from sorting into mixed race relationships.
To conclude, making potential biases clearer to the reader could be helpful for interpreting the strength and importance of the study. Moreover, a tested interaction between sex and identified race might help exclude biases from selection into mixed race relationships. Neveertheless, the paper is an important methodological step forwards despite limitations of the design.
Below I list specific suggestions I had whilst going through the paper
Other Suggestions:
Analyses generally show that the association between socially-identified race/ethnicity and outcomes is mediated by genetic ancestry and that non-White race/ethnicity is unrelated to worse outcomes when controlling for genetic ancestry
I think the term confounded should be used here instead of mediated. Socially-identified race is a potential mediator of the effect of genetic ancestry rather than the other way around.
In families with multiple children, we used the genetic ancestry estimates of the first biological child
Only using data from one child per family is wasteful. If we are interested in the effect of the child’s ancestry it would be best to include each child and to cluster standard erros by family. If we are interested in using offspring ancestry as an indicator for the parents, I would suggest averaging genetic ancestry across individuals.
2.2.2 Educational attainment and income
In this section the authors describe their coding of education and income. If the coding scheme for education has been used before elsewhere, it may be useful to provide a citation to prove the coding is non-arbitrary. For example, stating that you have copied the coding scheme in the EA GWAS would provide more credibility.
Do the authorise windsorize high and low values? I can see why we might expect low values to be erroneous (ie. no school or no income), but high values (+$200,000 or PhD education) are probably credible? If high values are windsorized I would consider changing that. It might also be helpful to know what levels of income and education correspond to + or - 3 SD
In general it’s a good idea to take the logarithm of income. Income is typically log-normally distributed and causes of income tend to be linearly related to log income. It is also nice to have normally distributed error terms if possible. But in practice this transformation probably will not lead to substantially different results.
Firstly, as a robustness check, we reran the analyses excluding all cases with values of education and income 3 standard deviations (SDs) or more below the mean. This was done in order to ensure that our results were not primarily driven by the extremely low values of education and income
In the results it is suggested that windsorizing is only used as a robustness check. However, the methods suggest it is the standard approach. This contradiction should be resolved. I like the idea of just using windsorizing as a robustness test since there is little reason to suppose that observations +3SDs above the mean are anomalous.
A report of summary statistics or a histogram might help to justify whatever approach you take with unusual values.
This finding indicates that ancestry strongly mediates the effect of socially-defined race/ethnicity
Genetic ancestry is not caused by self-identification, thus genetic ancestry cannot be a mediator of the effect of racial identification. Racial identification is a mediator of genetic ancestry.
Table 2. Admixture Regression Results for Educational Attainment.
I would note that this is for parental educational attainment - like in table 3 where the dependent variable is written as “parental income”