Genetic ancestry and social race are nearly interchangeable
It has been claimed that social race and genetic ancestry are at best weakly related. Here we test this claim by applying predictive modeling in both directions, i.e., predicting genetic ancestry from social race(s), and predicting social race(s) from genetic ancestry. We utilize the public Pediatric Imaging, Neurocognition, and Genetics (PING) dataset (n = 1,391), so that others may examine the data as well. In the simple scenario where we are only concerned with self-identified white, black, and mixed (black-white) race individuals (571 whites, 140 blacks, 25 mixed), model accuracy was very high. Predicting social race from genetic ancestry resulted in an area under curve (AUC) of .994, an overall accuracy (concordance) of 98.0%, and a pseudo-R2 of .951. Conversely, predicting genetic ancestry from social race had a model R2 adjusted of .992.Using the full dataset, there are 8 census-type categories of social race. Using cross-validated multinomial regession to predict social race from 6 genetic ancestry variables, we find that the AUC is .89. Using Dirichlet regression to predict ancestries from social race, we find an overall correlation of .94 (R2 = 88.4%). Further analyses using more sophisticated methods (random forest, support vector machine) found similar results. In conclusion, social race and genetic ancestry are nearly interchangeable.
Supplemental materials link
Reviewer 2: Accept
Reviewer 4: Accept