Back to Post-publication discussions

Does Mother’s Involvement Matter for The Cognitive Development of Interracial Children? Testing the Race of the Mother Hypothesis.

Submission status

Submission Editor
Emil O. W. Kirkegaard

Meng Hu

Does Mother’s Involvement Matter for The Cognitive Development of Interracial Children? Testing the Race of the Mother Hypothesis.


Extensive research has been conducted on the effect of mothers’ socialization on their children’s cognitive test scores. But less is known about the relation between mothers’ race/ethnicity and the performance of children from interracial families. It has been proposed by Willerman et al. (1974) that cognitive scores of interracial children will be more similar to those of the mother’s race/ethnic group. This is because the mother is the main agent of socialization in youth and adolescence and, as such, the mother provides most of the environmental stimulation. Using the Collaborative Perinatal Project (CPP) and the High School Longitudinal Study of 2009 (HSLS: 2009) data, the current study re-analyzes Willerman et al.’s (1974) observation that mother’s race is a strong determinant of the child’s cognitive ability. In both datasets, we did not find consistent support for the mother’s involvement hypothesis. Furthermore, in the CPP, which was analyzed prior by Willerman et al. (1974), it was found that the earlier superior IQ scores of interracial children of White mothers at age 4 eventually fade out at age 7. Alternative theories are considered.

cognitive ability, race, maternal involvement

Supplemental materials link



Typeset Pdf

Typeset Paper

Reviewers ( 0 / 0 / 2 )
George Francis: Accept
Emil O. W. Kirkegaard: Accept

Wed 27 Apr 2022 20:51

Reviewer | Admin | Editor
I also read your paper. My comments and suggestions:
There are some instances of "p=0.000", this is impossible. I think you mean "p<.0001".
I don't understand why you didn't rely on regression models. What about fitting this model: g ~ mother_race + father_race + interracial_couple. You can add mother_race * interracial_couple to specifically address the effect of the mother in interracial couples, and the same for the father.
Why control for SES or parental education? In the genetic model, controlling for these would be a sociologist fallacy. Note that sometimes you call it "SEI", sometimes "SES".
There is a strong expectation of equal importance of parents from a genetic perspective. See
Upload the output from SPSS directly. I think it generates some results files that include both syntax/code and results. Upload these so people can easily verify/examine without having to install SPSS.
Check out our new preprint, relevant to your discussion section about education and intelligence.
What about adding the NLSY children dataset? Does it report race of both parents? What about adding Add Health, which you cite a prior study of?
"Math" should not in general be capitalized.
Compare Table 4 to Table 5. They note the racial mixing in different ways. I suggest pick one of these approaches. I prefer the one in Table 5.
I think you should standardize the data to the White subset, so that their score is 0 (100) with SD 1 (15). This way the various groupings are easily comparable. Right now, you have used the total sample, so 1 SD does not mean 15 IQ, but rather about 17 IQ. E.g. Table 5 shows the within race SDs, all are below 1.
Have you considered merging the datasets to boost precision? I think this is quite helpful, especially in combination with the use of a regression model approach. An alternative would be to meta-analyze the findings across the studies, but meta-analysis based on summary statistics is weaker than merging datasets.
Upload full copies of both datasets used to OSF.
You write " None of these correlations reach significance due to the small sample sizes." but you seem to report standardize betas, not correlations. I cannot tell exactly what model you fit that these are from.
"Generally, these results indicate that the father’s race seems to determine respondents’ Math scores more than the mother’s race does." Why is that? I think they should be about even. If you fit the regression model proposed in (2), you could compute the variable importance metrics and see which variable is more important. E.g., you could convert the model into ANOVA to compute the omega² values.
Have you checked the citations of Willerman et al. (1974) to see if there's something relevant?,5&hl=en

First, thank you for your prompt review.

1), 8), 12), 13) Some oversights indeed. This will be fixed.

2) I believe the comparison of means provides enough information. I of course performed regression models as a complementary analysis. I included only one parent variable (mother or father, coded 0 for majority and 1 for minority group). It is redundant to use a second parent variable since I have restricted the sample to intermarried couples (not doing this may eventually lead to the same problem as in Arcidiacono's analysis). Mother variable in this case is the mirror of the father variable. They both show the same beta regression values but of opposite signs. 

3) SEI is a composite variable in the CPP, including income, parent education etc. But for more clarity I can replace both SEI and SES words with "socio economic" so as to avoid confusion. Also I believe there are merits in controlling for SES. For example, in an analysis of Black and White infants, adding control may eventually show in a better way the superiority of Black scores as reflecting their precocious motor skills. But more generally, Willerman assumes that the effect of mother's race will remain significant after SES or, put it otherwise, mother's race has an effect beyond SES-related factors. I mentioned it briefly in the paper. Should I explain it more thoroughly? Finally, because Arcidiacono and Willerman always consider SES factors, I believe it's only fair to also include results with controls for SES variables if I want to replicate earlier findings. Obviously, controlling for environment also confounds genetic effects but this is why I display uncontrolled results.

4) I have read this article now, thanks. The cultural hypothesis revolves around the mother's socialization and that is why they expect a positive effect, and it seems most research found confirmation for such effect, although almost none had studied the long-term effect of these cognitive/achievement gains.

5) Do you suggest I upload output on OSF ?

6) This was an interesting finding. I did mention the possibility that if the mother's effect is positive at early age, it may eventually fade out as multiple research suggested prior, especially Beaver et al (2014).

7) There is no such parents' race variables in the NLSY unfortunately. But if that is required I can analyze the Add Health, although I cannot access the full dataset. 

9) Fine. I deleted the 2nd column of Tables 1 and 4 and modified the top cells from "Father's education" to "Husband's education". 

10) Good point, I will change these numbers accordingly in the next version.

11) Only the CPP data required merging to be working properly for testing at both ages 4 and 7.

14) Many studies actually found a positive effect of the mother's variable on the children cognitive or achievement test scores. This study is meant to address the question with respect to interracial children. Perhaps the sentence must be rewritten. My point was merely saying that, instead of having the children score closely resembles the mother, now in the HSLS it seems their score more closely resemble the father.

15) I have read many of those studies citing Willerman (1974). At least among those I can access, only one study replicated Willerman, and it was Arcidiacono (2015). Others weren't insightful.


Thank you for your patience.

I have updated the paper, and added the analysis on the Add Health data.

The updated PDF and related materials can be found here:


Author has updated the submission to version #2

Reviewer | Admin | Editor

My comments on version 2.


Among the White-Black families, the Black mother variable shows a negative value at age 4
(β=-0.216, p=0.028) but a negative value that is not statistically significant anymore at age 7
(β=-0.154, p=0.122). Among the White-Hispanic families, the Hispanic mother variable
shows a modest positive value at age 4 (β=0.223, p=0.060) and at age 7 (β=0.175,
p=0.313). Among the White-Asian families, the Asian mother variable shows a sizable
positive value at age 4 (β=0.268, p=0.277) but a small positive value at age 7 (β=0.106,

So it appears the initial finding was just a p = .03, and it doesn't even replicate at follow-up. The other group comparisons are all non-sig, so shrug.

Among the White-Black families, the Black mother variable shows a modest negative value
at Wave I (β=-0.192, p<0.001) which increased very little at Wave III (β=-0.212, p<0.001).
Among the White-Hispanic families, the Hispanic mother variable shows a small negative
value at Wave I (β=-0.168, p<0.001) which did not change at Wave III (β=-0.160, p<0.001).
Among the White-Asian families, the Asian mother variable shows a stronger negative value
at Wave I (β=-0.286, p<0.001) which decreased at Wave III (β=-0.220, p<0.001).

The p values appear to be incorrect. The sample sizes listed are quite tiny, so p < .001 seems impossible 6 times in a row. Is this because you are using sampling weights, which in some contexts messes up the p values?

Among the White-Black families, the Black mother variable shows a strong positive value
(β=0.309, p<0.001). Among the White-Hispanic families, the Hispanic mother variable shows
a very small positive value (β=0.042, p<0.001). Among the White-Asian families, the Asian
mother variable shows a very small negative value (β=-0.056, p<0.001).

Same comment as above.

The result of the present study generally failed to replicate the findings of Willerman et al.
(1974) and Arcidiacono et al. (2015). The latter study found that having a Black (or a
Hispanic) mother is associated with lower verbal IQ in the Add Health. However, upon closer
inspection, their regressions analyses evaluated the mother's race effect in the combined
sample of the majority and minority groups. In other words, they didn’t restrict the sample to
interracial families in the same way as was done in the present study. This may have caused
biased estimates of the mother’s race effect.4

This seems an important observation. You can test this by repeating the regressions using the full sample as they did.

While the positive effect of mother’s involvement is a well documented finding, Beaver et al.
(2014) noted that often these studies fail to account for genetic confounding. Indeed, not
only it is known that family and home environments are substantially heritable (Kendler &
Baker, 2007) but GCTA studies showed there is a strong evidence that genes which account
for variances in intelligence and achievement are the same genes which account for
variances in family SES (Trzaskowski et al., 2014; Marioni et al., 2014; Krapohl & Plomin,
2016). Using adoption-based design to isolate any possible genetic overlap between family
variables and intelligence scores, Beaver et al. (2014) reported in the Add Health data that
while both the father and mother’s involvement positively affected children’s verbal IQ at
early age, such a positive effect disappeared when these children were examined seven
years later.

There must be newer studies than these, since 2014-16.


The samples are quite small. However, it is possible to integrate the findings with meta-analysis to obtain more precise results. For each beta estimate, get the standard error. If you are using weights and the SEs are wrong, you can use bootstrapping to obtain correct SEs without having to make assumptions for an analytic approach. With the estimates and the SEs for these, you can use standard inverse-variance meta-analysis with the metafor package. As you have separate regressions by race, you would then end up with 3 final meta-analyses: Black-White, Hispanic-White, Asian-White mixes, respectively. These would then be the best overall estimates given your samples. With regards to the repeated measurements in CPP, I suggest averaging values across years. Alternatively, you could try something else, e.g. 1) include repeated measurements but weigh them by 1/n, where n is the number of repeated measures. Or 2) a multi-level meta-analysis. I suggest the simpler approach because I doubt the complex methods will change results much.



 I think the topic of choice is interesting and I’m glad to see it being investigated. Here are my comments on your paper. Feel free to take them or leave them.



The tables are quite hard to follow because they look rather similar to how summary statistics might be presented. It might be easier if you present your results in regression tabes, with asterisks for p values and standard errors underneath coefficients. Then without using standardised betas, the coefficients will show the estimated IQ difference between children with and without white mothers.


But I’m not too fussed about presentation. 



I agree with reviewer 2 that the p values look too low and meta-analysis would be useful given some of the sample sizes are very low. Alternatively, it’s worth considering whether you can combine the samples to gain more power.




“A large body of evidence shows that educational induced gains often do not have a lasting effect on intelligence test scores (Brody, 1992, pp. 174-185; Besharov et al., 2011; te Nijenhuis et al., 2015; Protzko, 2015; Ritchie et al., 2015).”


Ritchie and Tucker-Drob (2018) find IQ gains find gains do not completely fade out in their meta-analysis.


It might be worth mentioning that these improvements don’t appear to be on general intelligence.



“Out of the 41,911 children who were followed and underwent neurological examination at age 7, those who had no or inadequate intelligence test results were excluded as well as children whose mothers did not report socioeconomic data.”


Do we know any details on which children had to be omitted from the sample? I’d guess selection bias would slightly reduce any true effect of the mother’s race. Might be worth commenting on this issue.



It might be worth considering whether pre-natal hypotheses for why the mother’s race could matter for the intelligence of mixed-race women. 


The race of the mother has some association with gestation time, even in mixed-race children.


And a  longer gestation period is associated with higher intelligence in the child.


If these associations are causal ie. mother’s race -> gestational time - > IQ, then we should expect a small effect of the mother’s race on intelligence. 


You do state that an association between mother’s race and IQ could indicate differences in “postnatal environments”, but it could also indicate differences in prenatal environments. I don’t know how good the literature is, but there’ll be some effect of the pregnant mother’s behaviour on the child’s IQ. There might be some data on this already, but I imagine black pregnant women are more likely to make unhealthy decisions than white pregnant women eg. drug use. 


Maybe you’d like to comment on prenatal environmental differences in your paper?



It is also worth considering whether parental race associations with IQ could be caused by assortative mating? My hunch is that black women who are smarter are more likely to marry white men, whilst IQ probably doesn’t have so much of an effect on the likelihood to miscegenate in black men. If so, this genetic confound could hide any detrimental environmental effect of having a black mother. I imagine data on this issue has probably been touched on in some of the Admixture papers or maybe one of Dr Dutton’s papers.


Regardless, it is probably worth noting this problem with the methodology and asking whether or not it is a serious issue.



Thank you both for your time to review the paper.

Reviewer #1

1. Right now I don't know how to improve the tables. I mainly followed Willerman's presentation of the tables, except I added extra columns (since I looked at age 7 as well). I think they are easy to read but of course I'm open to suggestions. Remember these are all summary statistics (table means) and not regression tables. I have means, N, SD as my columns. 

2. See my reply to Reviewer #2.

3. Thanks, I will add it and adjust accordingly the write-up.

4. I just reread Niswander & Gorden (1972) and I believe the CPP was well conducted. They mentioned for example that women who dropped out of the study before completion of their pregnancy didn't not bias estimates of mortality rates etc., but of course it's difficult to know whether these cases were missing at random when not having either IQ or education variables. As I use listwise deletion for all of my analyses, the problem pertains to all datasets. I'll add a note about that. 

5. This review study is interesting. I'll add it in the update. The children were tested at a young age (1, 4 and 6-yrs old) so we don't really know how much of the gains would sustain, but it's worth mentioning due to its biological correlates.

6. Yes, I did mention it briefly in the paper. See section 3.2. The Add Health and HSLS black mothers who intermarried had more education levels (therefore, likely higher IQ) than white mothers who intermarried, and the only exception was the CPP. I reported results controlling for SES, therefore, some of the IQ effect has been partialled out. But I will add a note about it too.

Reviewer #2

"The p values appear to be incorrect. The sample sizes listed are quite tiny, so p < .001 seems impossible 6 times in a row. Is this because you are using sampling weights, which in some contexts messes up the p values?"

It is true that sampling weights artifiially reduce p-values, but reporting p-values on the unweighted result isn't wise either, especially on the Add Health for which, as explained in a footnote, the result is sensitive to the use/non-use of sampling weight. Generally, since p-values are a function of sample size and effect size (and probably data spread as well), I usually ignore p-values for inflated samples. I will adjust the write-up.

"You can test this by repeating the regressions using the full sample as they did."

I cannot access the full Add Health data, but on the public one at least I failed to replicate their findings. 

"There must be newer studies than these, since 2014-16."

I'll look for them.

"The samples are quite small. However, it is possible to integrate the findings with meta-analysis to obtain more precise results."

This is indeed a great idea. I think the inverse-variance method however isn't best used on this data. It requires standard errors and those are affected by sample size. The Add health and HSLS had inflated N compared to the CPP and it wouldn't be wise to use the SEs on the unweighted result. While Bootstrap fixes the issue of biased SEs owing to distribution, it doesn't seem to handle biased estimates owing to inflated Ns. Even if it does, the sampling weights on both the Add Health and HSLS have non-integer values, which means I probably have to round them if Bootstrap has to be used, but some research reported this method being flawed.

Andreis, F., & Mecatti, F. (2015). Rounding Non-integer Weights in Bootstrapping Non-iid Samples: actual problem or harmless practice?. In Advances in complex data modeling and computational methods in statistics (pp. 17-35). Springer, Cham.

Andreis, F., Conti, P. L., & Mecatti, F. (2019). On the role of weights rounding in applications of resampling based on pseudopopulations. Statistica Neerlandica, 73(2), 160-175.

Is there another method ? I just looked at alternatives and I believe it's best to weight by the inverse of the sample size instead of the standard errors. I might end up doing the analysis but remember the data use cognitive tests which aren't comparable. In the CPP, we have the Wechsler, in the Add Health, a vocabulary test, in the HSLS, an achievement test (math) rather than cognitive test.

Finally, with respect to averaging waves, I believe it's more accurate not to aggregate, especially in the CPP. Willerman's main point was that black mother is associated with a very large cognitive deficit for the children at age 4, but I showed this wasn't the case at age 7. And the hereditarian argument is that the mother's involvement effect decreases over time. 

On the other hand, the Add Health respondents' age at Wave I and Wave III range from 12-19 to 18-26, respectively, there might be some rationale as for aggregating, if one suspects sampling errors. Some of these respondents are quite young but likely this won't affect the result too much. I can aggregate these result for the meta analysis however. 



Author has updated the submission to version #3


I uploaded the new version of the article. In blue are the changes to this version. I added footnotes on p-value issues, intermarried black mothers being of higher education and its potential issue, the meta-analysis (section 3.4) and added finally the studies suggested. I also updated the supplemental material at OSF (specifically, the spreadsheet with meta-analysis full displays, and syntax for the combined Wave I-III PVT of the Add Health). Let me know if you have any suggestion or any disagreement.


Author has updated the submission to version #4


Author has updated the submission to version #5

I fixed an issue with the formula and the reference. It's also much clearer now.
Reviewer | Admin | Editor

The meta-analysis is a good addition. The trouble with the findings is that there is no information about the confidence in this values. Given that there were 3 different meta-analyses, and all of them obtained values near 0, we might guess that the overall effect is near 0, but we don't know. The use of sample size weights is wise when standard errors cannot be obtained. I believe however that they can. I propose this method:

1. Bootstrap the data e.g. 10000 times. Do bootstrapping without regards to the sampling weights.
2. Inside each resample, fit the regression models using the sample weights. Save the coefficients.
3. Compute the bootstrap-based estimate of the standard error.
4. Fit a standard inverse-variance meta-analysis using the bootstrap-based SE.
5. This should now produce approximately correct results including confidence intervals.

The only doubtful step is disregarding sampling weights when resampling the data.

The same solution was suggested here. Another is to use some complex survey software to do the weighted resampling. Honestly, I don't think you should bother with this. These null results and the confidence intervals are not likely to change much based on more complex resampling methods.

For the meta-analysis, how about including forest plots? Readers might wonder how the different datasets compare in the betas, and whether their results having overlapping confidence intervals, or whethere there is strong heterogeneity in results.

ETA: I also would like to see the clearly erronous p values replaced with the proper ones from bootstrap method. It is confusing to the reader to be reading that you found no effects, when you then also report 6 p values with < .001.


I agree with Reviewer 2's comments and think his approach works perfectly well. But I'd like to suggest a simple alternative - don't use the sampling weights at all. Sampling weights are useful for dealing with endogeneity created by sampling bias. But the sampling bias of your data, as I understand it, is that it just oversamples ethnic minorities. It is not clear to me how having more ethnic minorities in the sample impacts your test of the effect of parental race on intelligence. As such, I think it is probably appropriate and simpler to not use the weights at all. 

That's my two cents.


Author has updated the submission to version #6

I updated the paper with the inverse variance analysis (I changed entirely the section 3.4, no other changes in other sections). Added Table 10 displaying mean weighted betas, CI, p-values, and Figure 1 displaying Forest plot. Also updated is the supplementary file (added syntax for Bootstrap regressions, and R syntax for the inverse variance result and its forest plot). 
Let me know if there is anything that requires modification.

The submission was accepted for publication.


Author has updated the submission to version #8


Author has updated the submission to version #9