Back to Submissions

1
Genetic ancestry and social race are nearly interchangeable

Submission status
Published

Submission Editor
Noah Carl

Author
Emil O. W. Kirkegaard

Title
Genetic ancestry and social race are nearly interchangeable

Abstract

It has been claimed that social race and genetic ancestry are at best weakly related. Here we test this claim by applying predictive modeling in both directions, i.e., predicting genetic ancestry from social race(s), and predicting social race(s) from genetic ancestry. We utilize the public Pediatric Imaging, Neurocognition, and Genetics (PING) dataset (n = 1,391), so that others may examine the data as well. In the simple scenario where we are only concerned with self-identified white, black, and mixed (black-white) race individuals (571 whites, 140 blacks, 25 mixed), model accuracy was very high. Predicting social race from genetic ancestry resulted in an area under curve (AUC) of .994, an overall accuracy (concordance) of 98.0%, and a pseudo-R2 of .951. Conversely, predicting genetic ancestry from social race had a model R2 adjusted of .992.Using the full dataset, there are 8 census-type categories of social race. Using cross-validated multinomial regession to predict social race from 6 genetic ancestry variables, we find that the AUC is .89. Using Dirichlet regression to predict ancestries from social race, we find an overall correlation of .94 (R2 = 88.4%). Further analyses using more sophisticated methods (random forest, support vector machine) found similar results. In conclusion, social race and genetic ancestry are nearly interchangeable.

Keywords
race, Ethnicity, genetics, proxy

Supplemental materials link
https://osf.io/qxvg8/

Pdf

Paper

Typeset Pdf

Typeset Paper

Reviewers ( 0 / 0 / 2 )
Anon Anonsen: Accept
Gerhard Meisenberg: Accept

Mon 15 Nov 2021 15:31

Reviewer

Overall good an important paper.

I have some minor comments. 

Most importantly I dislike the language in first line: "It has been claimed that social race and genetic ancestry are not closely related, or even unrelated."

It sounds unscientific. Also it is potentially confusing. (Are they even related or not even related?)

A suggestion is to change it to something like "It is often claimed that social race and genetic ancestry only weakly related". The title could also be "Social race and genetic ancestry are very highly related." or "Social race and genetic ancestry are close to identical."

"6 genetic ancestry variables" could be "genetic ancestry variables".

Perhaps it is possible to hammer home the point more clearly, using the simplest possible data. Eg for people who identify as Black, how often would the algorithm recognize this? Same question for whites. (Eg ignore white + black). Get the highest possible accuracy value.

 

Author | Admin
Replying to Reviewer 2

Overall good an important paper.

I have some minor comments. 

Most importantly I dislike the language in first line: "It has been claimed that social race and genetic ancestry are not closely related, or even unrelated."

It sounds unscientific. Also it is potentially confusing. (Are they even related or not even related?)

A suggestion is to change it to something like "It is often claimed that social race and genetic ancestry only weakly related". The title could also be "Social race and genetic ancestry are very highly related." or "Social race and genetic ancestry are close to identical."

"6 genetic ancestry variables" could be "genetic ancestry variables".

Perhaps it is possible to hammer home the point more clearly, using the simplest possible data. Eg for people who identify as Black, how often would the algorithm recognize this? Same question for whites. (Eg ignore white + black). Get the highest possible accuracy value.

 

Thanks for the review. Changes:

  1. Changed sentence to "It has been claimed that social race and genetic ancestry are at most weakly related." Now it is no longer ambiguous.
  2. "6 genetic ancestry variables" could be "genetic ancestry variables". Changed this.
  3. Added some more references, in particular, this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6516754/ It would appear there is a lot of opinion support for using social race, self-reported or doctor's assessment, as a proxy for genetic ancestry among actual genetic professionals. I don't find this very surprising given my own work in this sector. However the anonymous survey allows these opinions to come out easily. It's rare to see papers like those by Risch any longer. https://www.sciencedirect.com/science/article/pii/S0002929707625786 https://pubmed.ncbi.nlm.nih.gov/12646676/

Will wait for R3 to reply before unloading the new PDF.

Reviewer

A welcome paper, rationale well formulated in introduction. There really is a need for quantitative data here, and I think the paper gives clear answes. Only one minor suggestion: 

Paragraph after Fig. 1:  “…the remnants of the one-drop rule, wherein any amount of African ancestry would classify a person as African by some US state laws.”

An even simpler explanation is that most of these biracials have one white parent. If the other parent has the typical 80% African ancestry, you get the 40% that you observe. In Fig. 2, the smaller green bulge above the big one are likely those with one white grandparent. Those in the bulge below the big one, those with one black and 3 white grandparents. It helps to make the connection between the stats and the real world in which the research participants and everyone else lives.

Otherwise, this paper is pretty much ready to be accepted

Author | Admin
Replying to Reviewer 4

A welcome paper, rationale well formulated in introduction. There really is a need for quantitative data here, and I think the paper gives clear answes. Only one minor suggestion: 

Paragraph after Fig. 1:  “…the remnants of the one-drop rule, wherein any amount of African ancestry would classify a person as African by some US state laws.”

An even simpler explanation is that most of these biracials have one white parent. If the other parent has the typical 80% African ancestry, you get the 40% that you observe. In Fig. 2, the smaller green bulge above the big one are likely those with one white grandparent. Those in the bulge below the big one, those with one black and 3 white grandparents. It helps to make the connection between the stats and the real world in which the research participants and everyone else lives.

Otherwise, this paper is pretty much ready to be accepted

Thanks for the review. I added some changes more:

  1. I agree, your explanation is simpler than the one drop rule remnant. I added this giving you credit.
  2. I added some more statistics for the simple scenario, confusion matrix and so on. This should help interpretation of the results.

 

Bot

Author has updated the submission to version #2

Bot

The submission was accepted for publication.

Bot

Author has updated the submission to version #5

I looked at the data (https://osf.io/qxvg8/) but I didn't see the self-identified race data.  Where is it?