Journal:
ODP
Authors:
Emil O. W. Kirkegaard
Title:
The Black White IQ gap in the total sample of the Vietnam Experience Study
Abstract:
IQ differences were examined between Blacks and Whites in a large Vietnam era sample (Vietnam Experience Study; total n ≈ 10,800). Relative to a White IQ of 100 and SD of 15, Blacks obtained a mean IQ of 85 and an SD of 12.
An attempt to examine measurement invariance using multi-group confirmatory factor analysis failed with impossible estimates and very poor fits. A simpler approach of examining the correlation matrices by group showed that the White correlations were stronger, sometimes by a lot. Possible reasons for this were discussed.
Key words:
Black, White, Vietnam Experience Study, race difference, IQ, intelligence, cognitive ability, measurement invariance, multi-group confirmatory factor analysis, MGCFA
Length:
~1500 words, 6 pages.
Files:
https://osf.io/mnfex/
External reviewers:
None suggested.
Back to [Archive] Withdrawn submissions
The application of MGCFA to summary scale scores is methodologically problematic. Also, the summary of results from the unpublished HV meta-analysis is problematic as that analysis itself needs to be refined (for example it combined scholastic achievement composite scores with IQ scores). For these two reasons I can not approve this paper.
I suggest instead writing a 1-2 page short communications summarizing the B/W mean difference results for each test - providing code and variables, but do not conduct MCV or MGCFA on these.
Alternatively and or additionally conduct a more informative analysis. For example, use the twin data -- if available -- to compute ACE estimates by ethnic groups or attempt to replicate Hartmann et al.'s (2006) MCV analysis using MGCFA.
I suggest instead writing a 1-2 page short communications summarizing the B/W mean difference results for each test - providing code and variables, but do not conduct MCV or MGCFA on these.
Alternatively and or additionally conduct a more informative analysis. For example, use the twin data -- if available -- to compute ACE estimates by ethnic groups or attempt to replicate Hartmann et al.'s (2006) MCV analysis using MGCFA.
Couple of comments on the 21. December manuscript
Introduction:
What is d? It should be explained in more detail.
2.1:
What is SIRE?
2.2:
There are 6 variables in the dataset, but 3 out of the 5 variables are composites? Surely both of these statements cannot be correct.
What is the amount of total variance explained by the factor used?
The standardization was done with the white subsample. Does anything change if standardization is done on the full sample, or blacks+whites?
3.1:
what is SIRE?
3.2:
It is rather unclear what was done and what were the results.
Introduction:
What is d? It should be explained in more detail.
2.1:
What is SIRE?
2.2:
There are 6 variables in the dataset, but 3 out of the 5 variables are composites? Surely both of these statements cannot be correct.
What is the amount of total variance explained by the factor used?
The standardization was done with the white subsample. Does anything change if standardization is done on the full sample, or blacks+whites?
3.1:
what is SIRE?
3.2:
It is rather unclear what was done and what were the results.
Thanks for looking over the paper.
John,
I agree that MGCFA applied to summary scales is problematic. However, it is not clear how problematic, at least to me. In the spirit of the p-hacking/non-replication crisis/selective publishing problem, I thought it better to report the failed analysis rather than leave it out. This way readers will know the method was attempted but failed.
I agree that the Fuerst 2013 dataset is problematic. However, it is the largest open dataset of this kind. The dataset is good enough for a preliminary analysis such as the one presented here (no formal meta-analysis with meta-regression etc.). I just emailed Roth to see if he will share his dataset with us and hence the public.
As a compromise, I have computed the differences by test and included them. These were all close to 1 d (White norms).
There are no twins in this dataset or any other known family relationships, so behavioral genetic modeling cannot be done.
It is not fair to criticize a study for not doing something with another dataset. The point of this study is to add a large n datapoint to a future comprehensive meta-analysis of White-Black differences on cognitive tests. In this case it is a little complicated because the sample overlaps partially with other previously published studies, so one would have to choose either to include in the meta-analysis (or maybe average them).
[hr]
hvc,
d is a standard term and hardly needs an explanation. https://en.wikipedia.org/wiki/Effect_size#Cohen.27s_d
I have added a quick explanation in the introduction just in case.
The term is explained in the introduction.
You are right. I meant 6.
68%, but it is hardly worth reporting for this analysis.
Differences become smaller if one uses a larger sd. The sd is larger if one uses a mixed sample. It is customary to use the White sd/norms for the calculation of group differences.
What is unclear to you? I changed the wording slightly.
Paper updated.
John,
The application of MGCFA to summary scale scores is methodologically problematic. Also, the summary of results from the unpublished HV meta-analysis is problematic as that analysis itself needs to be refined (for example it combined scholastic achievement composite scores with IQ scores). For these two reasons I can not approve this paper.
I agree that MGCFA applied to summary scales is problematic. However, it is not clear how problematic, at least to me. In the spirit of the p-hacking/non-replication crisis/selective publishing problem, I thought it better to report the failed analysis rather than leave it out. This way readers will know the method was attempted but failed.
I agree that the Fuerst 2013 dataset is problematic. However, it is the largest open dataset of this kind. The dataset is good enough for a preliminary analysis such as the one presented here (no formal meta-analysis with meta-regression etc.). I just emailed Roth to see if he will share his dataset with us and hence the public.
I suggest instead writing a 1-2 page short communications summarizing the B/W mean difference results for each test - providing code and variables, but do not conduct MCV or MGCFA on these.
As a compromise, I have computed the differences by test and included them. These were all close to 1 d (White norms).
Alternatively and or additionally conduct a more informative analysis. For example, use the twin data -- if available -- to compute ACE estimates by ethnic groups or attempt to replicate Hartmann et al.'s (2006) MCV analysis using MGCFA.
There are no twins in this dataset or any other known family relationships, so behavioral genetic modeling cannot be done.
It is not fair to criticize a study for not doing something with another dataset. The point of this study is to add a large n datapoint to a future comprehensive meta-analysis of White-Black differences on cognitive tests. In this case it is a little complicated because the sample overlaps partially with other previously published studies, so one would have to choose either to include in the meta-analysis (or maybe average them).
[hr]
hvc,
Introduction:
What is d? It should be explained in more detail.
d is a standard term and hardly needs an explanation. https://en.wikipedia.org/wiki/Effect_size#Cohen.27s_d
I have added a quick explanation in the introduction just in case.
2.1:
What is SIRE?
The term is explained in the introduction.
2.2:
There are 6 variables in the dataset, but 3 out of the 5 variables are composites? Surely both of these statements cannot be correct.
What is the amount of total variance explained by the factor used?
The standardization was done with the white subsample. Does anything change if standardization is done on the full sample, or blacks+whites?
You are right. I meant 6.
68%, but it is hardly worth reporting for this analysis.
Differences become smaller if one uses a larger sd. The sd is larger if one uses a mixed sample. It is customary to use the White sd/norms for the calculation of group differences.
3.2:
It is rather unclear what was done and what were the results.
What is unclear to you? I changed the wording slightly.
Paper updated.
So, I managed to convince Helmuth to give me all the Vietnam data! So we will be releasing this as a dataset paper.
Looking at the full data is more interesting, and allows for novel analyses. Believe it or not, the complete dataset actually has item level data for many of the cognitive tests. This allows for direct IRT-based analysis of test functioning across groups. A much stronger test than Jensen's method at the test level.
It also means that this paper is currently being heavily revised, so there's no point in commenting more on it for now. Withdrawn for now.
Looking at the full data is more interesting, and allows for novel analyses. Believe it or not, the complete dataset actually has item level data for many of the cognitive tests. This allows for direct IRT-based analysis of test functioning across groups. A much stronger test than Jensen's method at the test level.
It also means that this paper is currently being heavily revised, so there's no point in commenting more on it for now. Withdrawn for now.