Back to [Archive] Post-review discussions

[ODP] The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ
Title: The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ

Author: Dalliard

Abstract: Jonathan Michael Kaplan recently published a challenge to the hereditarian account of the IQ gap between whites and blacks in the United States (Kaplan, 2014). He argues that racism and “racialized environments” constitute race-specific “X-factors” that could plausibly cause the gap, using simulations to support this contention. I show that Kaplan’s model suffers from vagueness and implausibilities that render it an unpromising approach to explaining the gap, while his simulations are misspecified and provide no support for his model. I describe the proper methodology for testing for X-factors, and conclude that Kaplan’s X-factors would almost certainly already have been discovered if they did in fact exist. I also argue that the hereditarian position is well-supported, and, importantly, is amenable to a definitive empirical test.

Keywords: IQ, g factor, race, racism, measurement invariance

Kaplan's article can be downloaded here: http://ge.tt/api/1/files/3T4YYvK1/0/blob?download
Admin
Great paper. I have some comments on some parts, mostly unimportant stuff. This submission should quickly get thru review.
Kaplan wrote " But positing, without any evidence, systematic differences in
hundreds or thousands of genes of small effects is surely no more plausible than
positing multiple environmental differences with small effects! as noted above, we do know of some environmental differences between the populations that are verifiably associated with differences in performance on IQ tests and related measures; we know of no genes that are so-associated."
Obviously he's not read Ward et al.'s replication of Rietveld's 3 alleles, nor my work on polygenic selection (
https://drive.google.com/file/d/0B7hcznd4DKKQWWNKU3pPbW1lS2M/edit?usp=sharing)

I've also a new paper coming which gives a final blow to Kaplan's silliness

Ward, M.E., McMahon, G., St Pourcain, B., Evans, D.M., Rietveld, C.A., et al. (2014) Genetic Variation Associated with Differential Educational Attainment in Adults Has Anticipated Associations with School Performance in Children. PLoS ONE 9(7): e100248. doi:10.1371/journal.pone.0100248
I also have a few rather minor objections: 1) the dig at philosophers near the end is accurate but unnecessary and should be removed; 2) "HM" is close to "Herrnstein & Murray" and could lead to some confusion; 3) the review of what phenomena are Jensen effects could be more thorough. This is a very good paper, however.
I also have a few rather minor objections: 1) the dig at philosophers near the end is accurate but unnecessary and should be removed; 2) "HM" is close to "Herrnstein & Murray" and could lead to some confusion; 3) the review of what phenomena are Jensen effects could be more thorough. This is a very good paper, however.


I was typing a long reply, answering the article, but now for the moment, I wanted to react on this. Personally, I would not say HM, but better HH, for hereditarian hypothesis. Alternatively if Dalliard still sticks with HM, perhaps use "H Model" instead. But in my opinion, it's not necessarily annoying. But of course, Herrnstein & Murray was the first thing that come to my mind as well. That said, the intro and abstract say clearly what HM is.

--
--

By the way, Dalliard, I have allowed myself to contact Kaplan himself. I think when we target someone in our articles, we should contact the targeted person and his works. That's the least we should do. I hope he will come here.

I have personally a lot of articles in preparation (all semi-finished, that's why you haven't seen me published here yet) but I plan to contact everyone i cite in my articles the authors whom/whose works are close and related to the subject of the articles. That's how I encourage everyone to do their works.

(P.S. I put both the words "whom" and "whose" because I don't know the difference between the two)
Admin
"whom" is the indirect person object, i.e. the person something was given to or some such. "Who gave the present to whom?".
"whose" is a word used to make questions referring to ownership or possession of something. "Whose car is it?".
Related "who's" is a contraction for "who is". "Who's coming over later?".

Or for those who care about grammar:

Case: word
Nominative: who
Accusative: who
Genitive: whose
Dative: whom

(Unrelated contraction: who's)

Note that "whom" is rarely used and is passing out of the language and it is often considered somewhat pompous to use.

[friendly neighborhood linguist]
Admin
Some comments on the first draft.

I too favor HH (hereditarian hypothesis, which is also the phrase used by the target article) over HM. But author decides.

“Kaplan claims that the leading behavioral geneticist Robert Plomin has argued in his recent publications, such as Trzaskowski et al. (2013a), that the heritability of IQ is only in the range of 40 – 60 percent. In fact, Plomin has written that heritability in adults may be 80 percent, but that it is lower in children and adolescents (Plomin & Spinath, 2004). Trzaskowski et al. (2013a) analyzed a sample of 12 - year - olds.”

A more recent study is: http://www.nature.com/mp/journal/v15/n11/abs/mp200955a.html which found H² of 66% age 17. They cite more numbers in the 80% area in the introduction. Furthermore, as they note, the E (unique env.) includes measurement error and is usually not corrected for. Doing so increases the H² estimate a bit further (they cite a g reliability of .9). When discussing theoretical issues, it is important to correct for known errors in estimates cf. https://www.goodreads.com/book/show/895784.Methods_of_Meta_Analysis?ac=1

“It is important to understand that it follows from Turkheimer’s laws that proposed environmental effects on IQ are also expected to be confounded by genetic influences. Accor dingly, behavioral genetic research indicates that “environmental” factors , such as measures of family environment, child rearing style, and peer relations , are under substantial genetic control (Plomin et al., 1994 ; Kendler & Baker , 2007; Vinkhuyzen et al., 2010 ) .”

I would also cite this paper. I was under the impression that this was a major paper in this area because Gottfredson always refers to it as important.

Rowe, D. C., Vesterdal, W. J., & Rodgers, J. L. (1998). Herrnstein's syllogism: Genetic and shared environmental influences on IQ, education, and income. Intelligence, 26(4), 405-423.

“From these examples it is clear t hat racial disparities in encounters with the police do not constitute prima facie evidence of racial bias . Even if the police never relied on the ( generally reasonably accurate ) racial stereotypes about criminal offending, racial disparities in police scrutiny w ould arise because blacks are more likely than whites to engage in suspicious and illegal activities . The same inevitably applies to private security guards singling out seemingly disproportionate numbers of blacks for scrutiny . More generally, the observed black - white differences in crime rates are predictable from black - white differences in IQ and aggressiveness (Beaver et al. , 2013), and victim surveys indicate that the high arrest and conviction rates of blacks reflect their genuinely high rates of offending (New Century Foundation , 2005). T he common belief that a racially biased criminal justice system underlies the high black crime rate is difficult to reconcile with these findings .”

Beaver et al is a weak study to cite for this. Basically, they controlled for IQ and violent history and the race variable got a p value higher than .05. Big possibility of false negative here. But sure, not controlling for g is silly and is not evidence of any bias.

A good source for the accuracy of stereotypes is this chapter, which I highly recommend.

Jussim, L., Cain, T. R., Crawford, J. T., Harber, K., & Cohen, F. (2009). The unbearable accuracy of stereotypes. Handbook of prejudice, stereotyping, and discrimination, 199-227. http://emilkirkegaard.dk/en/wp-content/uploads/Todd_D._Nelson_Handbook_of_Prejudice_StereotypiBookos.org_.pdf

On the topic of pro-black discrimination. Jensen and others who have reviewed the bias literature (in the 80's) have noted that generally tests are not found to be biased against blacks, and sometimes found to be biased in favor of them (i.e. they don't perform as well as predicted). I don't know if still is still the case, but it is another small bias in their favor if it works on e.g. the SAT.

“American history offers an abundance of examples of how the prosperity of white - majority neighborhoods and even entire cities was destroyed as a consequence of large numbers of blacks moving in, indicating that white s’ (and other non - black s’ ) concern s about the character of their black neighbors are far from irrational.”

This is too inflammatory and doesn't even cite anything. I guess the most obvious example of this is Detroit, but I'm not aware of any actual study of it. The census data would enable one to calculate the average IQ and SD in IQ over time. So one just needs some historical socioeconomic data (crime, income, job, education) for the same area to compare with.

“Kaplan brings up stereotype threat (Steele & Aronson , 1995) as an example of a subtle environmental influence that can have a large effect on IQ scores, arguing that his X - factors could be similar in nature. However, stereotype threat is based on a causal theory of how anxiety about confirming a stereotype about intelligence hampers performance on intelligence tests. There is thus a direct and immediate link , supported by experimental evidence, between poor test performance and the proposed causal factor — so mething that certainly cannot be said of any of Kaplan’s far - fetched propositions . Furthermore, if Ka plan’s X - factors were real, their influence on IQ would have to be , for reasons discussed later in this article, far more subtle than that of stereotype threat .”

I would be more critical of accepting stereotype threat. Wikipedia lists quite a number of relevant papers here. https://en.wikipedia.org/wiki/Stereotype_threat#Criticism Stereotype threat is basically a priming effect in social psychology which has a replication crisis. e.g. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0056515

“James Flynn has criticized researchers for assuming that racism is a magical ambient force rather than one whose possible effects are manifested through such ordinary mechanisms as poverty and poor self - esteem. Kaplan rejects this argument. Ind eed, Kaplan’s racial X - factors resemble nothing so much as magic. He presents no evidence for the hypothesis that “ racialized environments ” have an effect on IQ, and his evidence for the very existence of these environment s is very weak. Nevertheless, his model pre supposes that such environments, no matter how heterogeneous, act like magic bullets, causing large, g - linked cognitive deficits in blacks from all backgrounds while miraculously bypassing all the brain systems that mediate emotional and motivatio nal processes. Furthermore, the racial X - factors do all this in such a subtle way that no statistical signals of their presence can ever be observed, making racism a causal force completely unlike all known environmental influences on IQ scores. The essent ially occult powers that Kaplan attributes to white racism take his arguments beyond the bounds of science.”

But then again, the paper was published in a philosophy of biology journal, so in a sense it already is beyond.

“The fact that Kaplan’s proposed X - factors turn out to be very elusive upon closer inspection attests to the wisdom of Jensen’s argument about the non - existence of X - factors in general . Considering that Kaplan is an associate professor of philosophy, a nother lesson that might be drawn from his very confidently presented yet completely unsuccessful challenge to HM is that a philosophical education alone is without value in a scientific dispute . A good command of the theories , methods , and evidence pertinent to the particular area of research is necessary for making productive scientific contributions .”

I completely agree with this, having spent 2 years studying philosophy academically. Sesardic also discusses this in the introduction of his book. It is another example of the disruptive influence of ill-informed philosophers on science.

“Kaplan is also oblivious to the fact that , perhaps uniquely among all the long - running disputes in social science, a definitive empirical resolution to the black - white IQ controversy is within the reach of contemporary science . The strong causal implications of DNA - based admixture studies have been frequently discussed in the literature, and t he ability gap between blacks and whites is widely recognized as one of the most significant social problems in America (Jencks & Phillips , 1998; Pai g e & Witty , 2010; Giles , 2011) . That there nevertheless has been no rush to use genomic methods to clarify the etiology of the gap test ifies to the taboo nature of the hereditarian model .”

Probably we already have the answer, some of us that is. The people who ran the big GWAS studies presumably used multi-race US samples with SNP data. Clearly, they already have the available data to decide the issue. There are two possible outcomes: 1) HH fails the test (no correlation with admixture and IQ), in which case I shall be very surprised. Anyone sitting on this kind of evidence would publish it immediately to become a scientific hero who finally disproved the nasty racists. This has not happened.
2) HH is confirmed in the test, making it very hard to come up with an even remotely plausible alternative hypothesis. It would also end the claim that there is no direct genetic evidence for HH (actually Piffer already ended this one). The author will certainly be vilified in the media and probably face academic sanctions and lose access to his data and so on. Any scientist sitting on data like this who isn't willing to face that reaction will do nothing, which is what we have seen.

From the lack of publications of type (1), I take as strong evidence for HH.
I will give you the maximum, 5 stars, for the presentation of your article. It's so nicely presented... that's great. It's what I call perfection. Yes, literally.

Regarding the text, in general, it's a tough one, but despite its quality I have some comments and little disagreements.

Indeed, the black-white income gap is often reversed after such equating (Johnson & Neal, 1998; Nyborg & Jensen, 2001; Heckman et al., 2006), which may signal the presence of discrimination in favor of blacks.


You can also cite "Intelligence, Genes, and Success: Scientists Respond to The Bell Curve" by Devlin, Fienberg, Resnick, Roeder (1997). In chapter 9, "The Hidden Gender Restriction: The Need for Proper Controls When Testing for Racial Discrimination" (pg 200) you'll see that when controlling for age, IQ and parental SES, the BW wage difference vanishes, but that when they look at the relationship within each gender group, they see that black women are expected to have more wages than white females, while black men have less than white men.

Phillips et al. (1998) found that even after controlling for more than 30 variables related the economic, educational, cognitive, emotional, and health characteristics of parents and grandparents, about a third of the IQ gap in children remained unexplained.


It's about the chapter 4 of Black-White Test Score Gap, and the IQ test studied was PPVT-R. A vocabulary test. Perhaps precise it.

However, there is little evidence of their being overrepresented in relation to the black share of the perpetrators of crime


Shouldn't it be "them" instead of "their" ?

Specifically, the g loadings and heritability coefficients of tests have been found to be intercorrelated very highly, at close to 1.00 (Rushton & Jensen, 2010).


The R&J 2010 you cite merely cited an unpublished study by Jongeneel-Brimen & te Nijenhuis. Not published yet. te Nijenhuis gave me the draft (but in exchange I made the promise not to share it, so you have to wait for the definite version). Good article indeed, but needs more samples (Panizzon's sample can be added). In any case, I prefer to cite published study, rather than unpublished, if possible.

The principally genetic nature of g has also been confirmed in multivariate behavior genetic analyses where genetic influences on different cognitive abilities have been found to be largely common rather than ability-specific (Plomin & Spinath, 2004; Trzaskowski et al., 2013b).


The two cited studies talk about genetic correlation, if my memory is correct. Of course, genetic correlation always suggest the presence of higher-order genetic g, but it needs to be definitely confirmed through CFA modeling. Like I say, those studies you cite merely "suggest" the presence of g. If g is not modeled explicitly it can't be confirmed.

Kaplan uses Levene’s test for the equality of variances to investigate whether his simulated X-factors inflate IQ variances to a statistically significant extent.


To be sure, Levene's test is flawed. See here.

Modern Robust Statistical Methods: An Easy Way to Maximize the Accuracy and Power of Your Research (Erceg-Hurn & Mirosevich 2008)
http://www.unt.edu/rss/class/mike/5700/articles/robustAmerPsyc.pdf

The predominant view among psychometricians and the one that is adopted in this article is that individual differences in intelligence can be conceptualized in terms of a factor hierarchy with a third-level general factor (g), second-level broad ability factors, and first-level test-specific variation (Deary, 2012). Higher-level sources of variation exert a causal influence on the lower levels of the hierarchy.


As for causal influence, it's only an assumption of the model, not exactly what it is tested. As I usually say, causality must involve a time trend, such as with longitudinal data. For example, Panizzon (2014) confirmed the existence of genetic g at the second-order level, but no causality can be inferred. Panizzon's study can't be used against van der Maas (2006) model for example. Although I (wrongly) used to believe that before. Because the others made silly comments about that they have found causal g or something like this. Just because they say it does not mean it's true.

As it happens, the variances of IQ scores in blacks are typically smaller than those of whites. Jensen (1998, p. 353) found that black standard deviations are usually in the range of 11–14 IQ points, with a mean of 12, compared to the white standard deviation of 15 points.


In Murray's (2007, see table 2) study of W-J cognitive battery, you see that SD of blacks is greater than that of whites. Why I am saying this is because Murray consider the sample he has studied to be the most representative of the US. Apparently, most (if not all) samples neglect lowest IQ portions of blacks, reducing their SD. But not in his samples. Of course, the samples amount to "only" 1669 blacks.

See:
The magnitude and components of change in the black–white IQ difference from 1920 to 1991: A birth cohort analysis of the Woodcock–Johnson standardizations

Accordingly, when Rowe & Cleveland (1996) fitted a biometric structural equation model to math and verbal test data from a genetically informative sample of black and white children, between 36 and 74 percent of the racial test score gaps was accounted for by genetic differences in the best-fitting model.21


No problem with R&C (1996). It's just that you should have added Brody's (2003) criticism to that study. See here. It's on pages 407-408.

it has consistently been found that environmental influences on test performance are negatively or not at all associated with g loadings, whereas genetic influences are associated strongly and positively with g loadings.


I will not be so sure. As you know already, see here, the evidence from CFA modeling is not compelling. It's only when you look at MCV and factor analysis or principal component that you see g is more heritable. But CFA is confirmatory in nature, which will differ with other methods, exploratory ones. Furthermore, unlike CFA, the MCV does not use latent variable approach. That's really unfortunate. That the 2nd biggest flaw in MCV. The first, biggest one, is that you can't test g-hypothesis against non-g hypotheses. There is indeed no way to assess model fit and see if the inclusion of latent g improves your outcome.

I say that for g-h2 correlation. However, that argument also applies for g and black-white difference, or any correlations between g and still other factors. As I know, the evidence in favor of Spearman hypothesis is positive but still pretty weak, because it relies mainly on MCV. And yet, MGCFA seems unable to confirm g hypothesis as the best model, but only that the g model fits the data no better (but not worse either) than non-g models (see e.g., Dolan 2000, Dolan & Hamaker 2001). Of course, when we look at the g/non-g factor(s) decomposition in g models, it appears that the weak version of Spearman hypothesis is true, thus confirming MCV. But this is still something different than model choice, which is based on model fit indices.

For g-h2 correlation however, we get the opposite. The genetic g model seems to have the best fit, but I don't see any proof from Panizzon that g is more heritable than the latent non-g factors. So, where's the Spearman effect ? Of course, there is still the problem of heritability dependent and independent of g, as you probably remember with our little conversation with Kees Jan Kan, but if we examine merely the h2, there is no evidence for Spearman hypothesis here.

It has been repeatedly shown that black-white differences on IQ test batteries generally satisfy the requirements of measurement invariance (Keith et al., 1999; Dolan, 2000; Dolan & Hamaker, 2001; Lubke et al., 2003; Edwards & Oakland, 2006; Trundt, 2013).


You should remove Keith, and Edwards/Oakland studies. They do not satisfy the basic requirements of MI, which require (1) pattern loadings, (2) factor loadings, (3) intercept. They have ignored testing the (3).

He could, of course, aver that his X-factors are so subtle that they would not violate


What is "aver" ?

As Wicherts et al. (2004) point out, the fact that black-white IQ differences are associated with measurement invariance while the Flynn effect is not indicates that the two phenomena are separate, and that one of them does not tell us anything about the other.


Disagree. MI, as I always said, is almost always misunderstood, probably because Wicherts and others do not describe MI very accurately. In fact, you can have violation of MI concerning FE gains at the subtest level and yet when you sum all of the subtests, you don't see test bias, because the subtests biases entirely cancel out (some biases against group1, others against group2). But in the work of Wicherts (2004) the subtest bias seems not cumulative, but cancel out instead (I reach this conclusion after my close reading, because this point is not even explicited by Wicherts). That unfortunately leads to several misleading comments on MI concerning the "role" of bias in accounting for the FE gains. If true, when "controlling" for bias, the FE gains should disappear. If not, all theories based on that assumption, such as the Brand's hypothesis or the Armstrong/Woodley rule-dependence model, all these theories should be discarded.

The best illustration you have, concerning Flynn gain, is from Must et al. (2009) "Comparability of IQ scores over time" (and Shiu 2013, who followed up the Must (2009) study). MI was violated, but the biases were not cumulative. Thus, illustrating the total irrelevance of bias to account for the Flynn effect.

If you want a proof of what you say above, you should better cite Ang et al. (2010, see figure 3), but certainly not Wicherts.
http://www.iapsych.com/iqmr/fe/LinkedDocuments/ang2010.pdf

Besides, if factor loading invariance is not respected, you can conclude that the differences are not directly comparable. For example, if in one group (1), the tests 1-4 load together, and form a verbal factor, and tests 5-9 load together to form a fluid reasoning factor, but in the other group (2), one or several tests, e.g., test 4 loads only on the reasoning factor, and the tests 8-9 load on reasoning as well as verbal factor, that is terrible. Because it means that tests 8-9 elicit verbal and fluid reasoning ability in group2 but elicit only fluid reasoning ability in group1. As for test 4, it means that group1 use verbal ability to resolve test 4 but that group2 use reasoning ability to resolve test 4. In case like that, the situation is just awful, and there is virtually nothing you can do about that. It's better to have intercept bias rather than factor loading or pattern loading bias.

One of the principal targets of Kaplan’s article are two studies by David Rowe and colleagues (1994, 1995).


One problem with Rowe's analysis is not, like Kaplan says, about statistical significance, but it is about measurement error. A lot of the reported correlations are close to zero, because they don't use latent variables. Is it possible that random errors also hide the race interaction ? Well, the problem is that I cannot reject that possibility. I think Rowe's need to be replicated using latent var approach. I don't remember if they have corrected for measurement errors or not.

I do not understand the " in your table 1.

Finally, I would like to say the discussion about measurement invariance is good. Lot of people who talk about it don't usually go in depth like that. Or they use very technical words and formulas that take time to understand and lot of people will not care trying to understand those formulas, even though it's not necessarily complicated (depending on what).

---------------------

Emil, thanks for the precision. I won't die silly.
Kaplan wrote " But positing, without any evidence, systematic differences in
hundreds or thousands of genes of small effects is surely no more plausible than
positing multiple environmental differences with small effects! as noted above, we do know of some environmental differences between the populations that are verifiably associated with differences in performance on IQ tests and related measures; we know of no genes that are so-associated."
Obviously he's not read Ward et al.'s replication of Rietveld's 3 alleles, nor my work on polygenic selection (
https://drive.google.com/file/d/0B7hcznd4DKKQWWNKU3pPbW1lS2M/edit?usp=sharing)


Kaplan's paper was submitted in July 2013 and accepted in January 2014, so he could not have taken the latest research into account. Furthermore, the variance in IQ attributable to specific alleles remains tiny.

I also have a few rather minor objections: 1) the dig at philosophers near the end is accurate but unnecessary and should be removed; 2) "HM" is close to "Herrnstein & Murray" and could lead to some confusion; 3) the review of what phenomena are Jensen effects could be more thorough.


1) I think it needs to be said. It's not a nasty thing to say, just a fact. Kaplan's not the only offender by any means.

2) HM is defined at the start of the article, so I don't see how it could be confused with anything else.

3) What in particular do you think is missing from my discussion?

I was typing a long reply, answering the article, but now for the moment, I wanted to react on this. Personally, I would not say HM, but better HH, for hereditarian hypothesis. Alternatively if Dalliard still sticks with HM, perhaps use "H Model" instead. But in my opinion, it's not necessarily annoying. But of course, Herrnstein & Murray was the first thing that come to my mind as well. That said, the intro and abstract say clearly what HM is.


I think 'hypothesis' refers to clearly circumscribed scientific propositions, while theories and models are larger, comprising many hypotheses. What I call a model might perhaps also be called a theory, but I think that's a bit too grand in this context. I contrast 'HM' with 'Kaplan's model', and I think both should be referred to with the same term, and 'model' is the best one.

By the way, Dalliard, I have allowed myself to contact Kaplan himself. I think when we target someone in our articles, we should contact the targeted person and his works. That's the least we should do. I hope he will come here.


I got Kaplan's reply, and it was exactly what I expected it to be. I don't think a productive back-and-forth exchange is possible with someone like him.
[hr]
“Kaplan claims that the leading behavioral geneticist Robert Plomin has argued in his recent publications, such as Trzaskowski et al. (2013a), that the heritability of IQ is only in the range of 40 – 60 percent. In fact, Plomin has written that heritability in adults may be 80 percent, but that it is lower in children and adolescents (Plomin & Spinath, 2004). Trzaskowski et al. (2013a) analyzed a sample of 12 - year - olds.”

A more recent study is: http://www.nature.com/mp/journal/v15/n11/abs/mp200955a.html which found H² of 66% age 17. They cite more numbers in the 80% area in the introduction. Furthermore, as they note, the E (unique env.) includes measurement error and is usually not corrected for. Doing so increases the H² estimate a bit further (they cite a g reliability of .9). When discussing theoretical issues, it is important to correct for known errors in estimates cf. https://www.goodreads.com/book/show/895784.Methods_of_Meta_Analysis?ac=1


I think Plomin and Spinath's 2004 review that I cite is sufficient, because nothing that has been published since has challenged it. The measurement error issue is obvious, and I don't think there's a need to discuss it here.

“It is important to understand that it follows from Turkheimer’s laws that proposed environmental effects on IQ are also expected to be confounded by genetic influences. Accor dingly, behavioral genetic research indicates that “environmental” factors , such as measures of family environment, child rearing style, and peer relations , are under substantial genetic control (Plomin et al., 1994 ; Kendler & Baker , 2007; Vinkhuyzen et al., 2010 ) .”

I would also cite this paper. I was under the impression that this was a major paper in this area because Gottfredson always refers to it as important.

Rowe, D. C., Vesterdal, W. J., & Rodgers, J. L. (1998). Herrnstein's syllogism: Genetic and shared environmental influences on IQ, education, and income. Intelligence, 26(4), 405-423.


OK.

“From these examples it is clear t hat racial disparities in encounters with the police do not constitute prima facie evidence of racial bias . Even if the police never relied on the ( generally reasonably accurate ) racial stereotypes about criminal offending, racial disparities in police scrutiny w ould arise because blacks are more likely than whites to engage in suspicious and illegal activities . The same inevitably applies to private security guards singling out seemingly disproportionate numbers of blacks for scrutiny . More generally, the observed black - white differences in crime rates are predictable from black - white differences in IQ and aggressiveness (Beaver et al. , 2013), and victim surveys indicate that the high arrest and conviction rates of blacks reflect their genuinely high rates of offending (New Century Foundation , 2005). T he common belief that a racially biased criminal justice system underlies the high black crime rate is difficult to reconcile with these findings .”

Beaver et al is a weak study to cite for this. Basically, they controlled for IQ and violent history and the race variable got a p value higher than .05. Big possibility of false negative here. But sure, not controlling for g is silly and is not evidence of any bias.


I agree that Beaver et al. is not a strong study but it does support my point that racial differences in negative social outcomes can plausibly be explained by individual differences without recource to race-specific processes. Beaver's argument is at least as well supported as Kaplan's.

A good source for the accuracy of stereotypes is this chapter, which I highly recommend.

Jussim, L., Cain, T. R., Crawford, J. T., Harber, K., & Cohen, F. (2009). The unbearable accuracy of stereotypes. Handbook of prejudice, stereotyping, and discrimination, 199-227. http://emilkirkegaard.dk/en/wp-content/uploads/Todd_D._Nelson_Handbook_of_Prejudice_StereotypiBookos.org_.pdf


I'll take a look at it.

On the topic of pro-black discrimination. Jensen and others who have reviewed the bias literature (in the 80's) have noted that generally tests are not found to be biased against blacks, and sometimes found to be biased in favor of them (i.e. they don't perform as well as predicted). I don't know if still is still the case, but it is another small bias in their favor if it works on e.g. the SAT.


The SAT and various grad school tests continue to be biased in favor of blacks when used as predictors of performance. The best explanation for this is that it paradoxically results from the fact that the tests are internally unbiased: http://wicherts.socsci.uva.nl/borsboom2008.pdf I could mention this in the article, but I think it's a marginal issue.

[quote]
“American history offers an abundance of examples of how the prosperity of white - majority neighborhoods and even entire cities was destroyed as a consequence of large numbers of blacks moving in, indicating that white s’ (and other non - black s’ ) concern s about the character of their black neighbors are far from irrational.”

This is too inflammatory and doesn't even cite anything. I guess the most obvious example of this is Detroit, but I'm not aware of any actual study of it. The census data would enable one to calculate the average IQ and SD in IQ over time. So one just needs some historical socioeconomic data (crime, income, job, education) for the same area to compare with.


White flight and the resulting urban decay are such well known phenomena in 20th century US history that I regard what I wrote as common knowledge not requiring a source citation.

[quote]
“Kaplan brings up stereotype threat (Steele & Aronson , 1995) as an example of a subtle environmental influence that can have a large effect on IQ scores, arguing that his X - factors could be similar in nature. However, stereotype threat is based on a causal theory of how anxiety about confirming a stereotype about intelligence hampers performance on intelligence tests. There is thus a direct and immediate link , supported by experimental evidence, between poor test performance and the proposed causal factor — so mething that certainly cannot be said of any of Kaplan’s far - fetched propositions . Furthermore, if Ka plan’s X - factors were real, their influence on IQ would have to be , for reasons discussed later in this article, far more subtle than that of stereotype threat .”

I would be more critical of accepting stereotype threat. Wikipedia lists quite a number of relevant papers here. https://en.wikipedia.org/wiki/Stereotype_threat#Criticism Stereotype threat is basically a priming effect in social psychology which has a replication crisis. e.g. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0056515


I note later in the article that ST appears to be just a laboratory curiosity without broader importance. I think that's damning enough.

“James Flynn has criticized researchers for assuming that racism is a magical ambient force rather than one whose possible effects are manifested through such ordinary mechanisms as poverty and poor self - esteem. Kaplan rejects this argument. Ind eed, Kaplan’s racial X - factors resemble nothing so much as magic. He presents no evidence for the hypothesis that “ racialized environments ” have an effect on IQ, and his evidence for the very existence of these environment s is very weak. Nevertheless, his model pre supposes that such environments, no matter how heterogeneous, act like magic bullets, causing large, g - linked cognitive deficits in blacks from all backgrounds while miraculously bypassing all the brain systems that mediate emotional and motivatio nal processes. Furthermore, the racial X - factors do all this in such a subtle way that no statistical signals of their presence can ever be observed, making racism a causal force completely unlike all known environmental influences on IQ scores. The essent ially occult powers that Kaplan attributes to white racism take his arguments beyond the bounds of science.”

But then again, the paper was published in a philosophy of biology journal, so in a sense it already is beyond.


He's attacking a scientific model using seemingly scientific arguments. I don't think he consciously attributes magical properties to his X-factors, it's just that when you unpack his assumptions, the model looks like magic. This is true of many claims in which group differences are attributed to racism.
I will give you the maximum, 5 stars, for the presentation of your article. It's so nicely presented... that's great. It's what I call perfection. Yes, literally.


Thanks. I did spend a lot of time on it, probably too much...

Indeed, the black-white income gap is often reversed after such equating (Johnson & Neal, 1998; Nyborg & Jensen, 2001; Heckman et al., 2006), which may signal the presence of discrimination in favor of blacks.


You can also cite "Intelligence, Genes, and Success: Scientists Respond to The Bell Curve" by Devlin, Fienberg, Resnick, Roeder (1997). In chapter 9, "The Hidden Gender Restriction: The Need for Proper Controls When Testing for Racial Discrimination" (pg 200) you'll see that when controlling for age, IQ and parental SES, the BW wage difference vanishes, but that when they look at the relationship within each gender group, they see that black women are expected to have more wages than white females, while black men have less than white men.


Yes, the Neal & Johnson study I cite found the same thing about sex differences. In general, depending on what kinds of individuals you are looking at (e.g., women or men, highly educated or poorly educated, low IQ or high IQ), the black-white earnings gap is either largely, fully, or more than fully explained by a few covariates that are plausibly causally prior to job performance (IQ, education, etc.). I don't think that controlling for two or three variables fully removes differences between blacks and whites, so the studies I cite are probably still comparing blacks to whites who are superior in some relevant respects. For example, I suspect that controlling for criminal background would explain much of the residual gap between black and white men, but I don't know of any studies that have done that.

As labor market differences is not the focus of my article, I don't think it makes sense for me to discuss this issue in more detail. I am not trying to explain all the black-white labor market differences. I have a more limited goal which is just to show that the evidence Kaplan presents in favor of labor market discrimination is weak and subject to alternative explanations, and that you could easily make the case that whites rather than blacks are being discriminated against.

Phillips et al. (1998) found that even after controlling for more than 30 variables related the economic, educational, cognitive, emotional, and health characteristics of parents and grandparents, about a third of the IQ gap in children remained unexplained.


It's about the chapter 4 of Black-White Test Score Gap, and the IQ test studied was PPVT-R. A vocabulary test. Perhaps precise it.


OK.

However, there is little evidence of their being overrepresented in relation to the black share of the perpetrators of crime


Shouldn't it be "them" instead of "their" ?


Both are correct.

Specifically, the g loadings and heritability coefficients of tests have been found to be intercorrelated very highly, at close to 1.00 (Rushton & Jensen, 2010).


The R&J 2010 you cite merely cited an unpublished study by Jongeneel-Brimen & te Nijenhuis. Not published yet. te Nijenhuis gave me the draft (but in exchange I made the promise not to share it, so you have to wait for the definite version). Good article indeed, but needs more samples (Panizzon's sample can be added). In any case, I prefer to cite published study, rather than unpublished, if possible.


What published study would you suggest? R&J 2010 is a sort of review article, so I think it's appropriate.

The principally genetic nature of g has also been confirmed in multivariate behavior genetic analyses where genetic influences on different cognitive abilities have been found to be largely common rather than ability-specific (Plomin & Spinath, 2004; Trzaskowski et al., 2013b).


The two cited studies talk about genetic correlation, if my memory is correct. Of course, genetic correlation always suggest the presence of higher-order genetic g, but it needs to be definitely confirmed through CFA modeling. Like I say, those studies you cite merely "suggest" the presence of g. If g is not modeled explicitly it can't be confirmed.


I disagree. A genetic correlation indicates that two phenotypes share genetic influences. There is no need for a higher-order factor model to confirm this. Of course, genetic correlations can be spurious (e.g., they may not reflect shared causal influences but common ancestry) but as they have been confirmed for g in multivariate GCTA with unrelated individuals, too, that's extremely unlikely.

Kaplan uses Levene’s test for the equality of variances to investigate whether his simulated X-factors inflate IQ variances to a statistically significant extent.


To be sure, Levene's test is flawed. See here.


I could add a note about that, although it makes no difference to my argument.

The predominant view among psychometricians and the one that is adopted in this article is that individual differences in intelligence can be conceptualized in terms of a factor hierarchy with a third-level general factor (g), second-level broad ability factors, and first-level test-specific variation (Deary, 2012). Higher-level sources of variation exert a causal influence on the lower levels of the hierarchy.


As for causal influence, it's only an assumption of the model, not exactly what it is tested. As I usually say, causality must involve a time trend, such as with longitudinal data. For example, Panizzon (2014) confirmed the existence of genetic g at the second-order level, but no causality can be inferred. Panizzon's study can't be used against van der Maas (2006) model for example. Although I (wrongly) used to believe that before. Because the others made silly comments about that they have found causal g or something like this. Just because they say it does not mean it's true.


The causal g model is a theoretical position that undergirds much of my argument. A higher-order factor model makes no sense if you don't think g is causal, for example. This is my theoretical stance, and that of many others, and I don't see why I should spend time justifying it, or write in a theoretically non-committal language. I acknowledge that other models are possible (see footnote 13), but I favor the g model.

As it happens, the variances of IQ scores in blacks are typically smaller than those of whites. Jensen (1998, p. 353) found that black standard deviations are usually in the range of 11–14 IQ points, with a mean of 12, compared to the white standard deviation of 15 points.


In Murray's (2007, see table 2) study of W-J cognitive battery, you see that SD of blacks is greater than that of whites. Why I am saying this is because Murray consider the sample he has studied to be the most representative of the US. Apparently, most (if not all) samples neglect lowest IQ portions of blacks, reducing their SD. But not in his samples. Of course, the samples amount to "only" 1669 blacks.


Jensen's claim is based on a wider range of tests, so it's probably more credible, although I agree that it's possible that many studies, especially older ones, either do not sample the low end of blacks, or have floor effects. I could add a footnote about this.

Accordingly, when Rowe & Cleveland (1996) fitted a biometric structural equation model to math and verbal test data from a genetically informative sample of black and white children, between 36 and 74 percent of the racial test score gaps was accounted for by genetic differences in the best-fitting model.21


No problem with R&C (1996). It's just that you should have added Brody's (2003) criticism to that study. See here. It's on pages 407-408.


Yeah, I think that study is more interesting for its methods than its results because of sampling issues. It should be replicated now that better data are available. I will remove the reference because it's really a rather equivocal study.

it has consistently been found that environmental influences on test performance are negatively or not at all associated with g loadings, whereas genetic influences are associated strongly and positively with g loadings.


I will not be so sure. As you know already, see here, the evidence from CFA modeling is not compelling. It's only when you look at MCV and factor analysis or principal component that you see g is more heritable. But CFA is confirmatory in nature, which will differ with other methods, exploratory ones. Furthermore, unlike CFA, the MCV does not use latent variable approach. That's really unfortunate. That the 2nd biggest flaw in MCV. The first, biggest one, is that you can't test g-hypothesis against non-g hypotheses. There is indeed no way to assess model fit and see if the inclusion of latent g improves your outcome.

I say that for g-h2 correlation. However, that argument also applies for g and black-white difference, or any correlations between g and still other factors. As I know, the evidence in favor of Spearman hypothesis is positive but still pretty weak, because it relies mainly on MCV. And yet, MGCFA seems unable to confirm g hypothesis as the best model, but only that the g model fits the data no better (but not worse either) than non-g models (see e.g., Dolan 2000, Dolan & Hamaker 2001). Of course, when we look at the g/non-g factor(s) decomposition in g models, it appears that the weak version of Spearman hypothesis is true, thus confirming MCV. But this is still something different than model choice, which is based on model fit indices.

For g-h2 correlation however, we get the opposite. The genetic g model seems to have the best fit, but I don't see any proof from Panizzon that g is more heritable than the latent non-g factors. So, where's the Spearman effect ? Of course, there is still the problem of heritability dependent and independent of g, as you probably remember with our little conversation with Kees Jan Kan, but if we examine merely the h2, there is no evidence for Spearman hypothesis here.


As I said above, the article is based on the theoretical conviction that the g model is correct. The fact that g and non-g models may fit data equally well in CFA is immaterial, because the evidence for the superiority of the g model comes not only from CFA but from studies that are independent of CFA. For example, the various Jensen and anti-Jensen effects can be readily explained by the g model, but not by non-g models.

As I explained in that exchange with Kan, the h2xg correlation is due to the fact that g explains by far the largest portion of genetic variance in any test battery. Even if some non-g factor had 100% heritability, this would not invalidate the fact that g is by far the largest source of genetic variance. For example, in Figure 2a in my article, specific genetic effects on Ability 1, if there are any, explain variance in only four tests (1-4). In contrast, genetic effects on g explain variance in all 16 tests.

It has been repeatedly shown that black-white differences on IQ test batteries generally satisfy the requirements of measurement invariance (Keith et al., 1999; Dolan, 2000; Dolan & Hamaker, 2001; Lubke et al., 2003; Edwards & Oakland, 2006; Trundt, 2013).


You should remove Keith, and Edwards/Oakland studies. They do not satisfy the basic requirements of MI, which require (1) pattern loadings, (2) factor loadings, (3) intercept. They have ignored testing the (3).


MI comprises many stages, and it's still an analysis of MI even if not all stages are tested. But I can remove those two if you insist on it.

He could, of course, aver that his X-factors are so subtle that they would not violate


What is "aver" ?


<i>aver verb \ə-ˈvər\
: to say (something) in a very strong and definite way</i>

As Wicherts et al. (2004) point out, the fact that black-white IQ differences are associated with measurement invariance while the Flynn effect is not indicates that the two phenomena are separate, and that one of them does not tell us anything about the other.


Disagree. MI, as I always said, is almost always misunderstood, probably because Wicherts and others do not describe MI very accurately. In fact, you can have violation of MI concerning FE gains at the subtest level and yet when you sum all of the subtests, you don't see test bias, because the subtests biases entirely cancel out (some biases against group1, others against group2). But in the work of Wicherts (2004) the subtest bias seems not cumulative, but cancel out instead (I reach this conclusion after my close reading, because this point is not even explicited by Wicherts). That unfortunately leads to several misleading comments on MI concerning the "role" of bias in accounting for the FE gains. If true, when "controlling" for bias, the FE gains should disappear. If not, all theories based on that assumption, such as the Brand's hypothesis or the Armstrong/Woodley rule-dependence model, all these theories should be discarded.

The best illustration you have, concerning Flynn gain, is from Must et al. (2009) "Comparability of IQ scores over time" (and Shiu 2013, who followed up the Must (2009) study). MI was violated, but the biases were not cumulative. Thus, illustrating the total irrelevance of bias to account for the Flynn effect.

If you want a proof of what you say above, you should better cite Ang et al. (2010, see figure 3), but certainly not Wicherts.
http://www.iapsych.com/iqmr/fe/LinkedDocuments/ang2010.pdf

Besides, if factor loading invariance is not respected, you can conclude that the differences are not directly comparable. For example, if in one group (1), the tests 1-4 load together, and form a verbal factor, and tests 5-9 load together to form a fluid reasoning factor, but in the other group (2), one or several tests, e.g., test 4 loads only on the reasoning factor, and the tests 8-9 load on reasoning as well as verbal factor, that is terrible. Because it means that tests 8-9 elicit verbal and fluid reasoning ability in group2 but elicit only fluid reasoning ability in group1. As for test 4, it means that group1 use verbal ability to resolve test 4 but that group2 use reasoning ability to resolve test 4. In case like that, the situation is just awful, and there is virtually nothing you can do about that. It's better to have intercept bias rather than factor loading or pattern loading bias.


You will have explain this in more detail. When Wicherts et al. say that the b-w gap and the Flynn effect have nothing to do with each other, I take them to mean that because of measurement non-invariance cohort differences cannot be attributed to latent factors, whereas the b-w gap can be attributed to latent factors, given that MI is not violated. Do you disagree with this?

One of the principal targets of Kaplan’s article are two studies by David Rowe and colleagues (1994, 1995).


One problem with Rowe's analysis is not, like Kaplan says, about statistical significance, but it is about measurement error. A lot of the reported correlations are close to zero, because they don't use latent variables. Is it possible that random errors also hide the race interaction ? Well, the problem is that I cannot reject that possibility. I think Rowe's need to be replicated using latent var approach. I don't remember if they have corrected for measurement errors or not.


I agree that Rowe's method is less sensitive than the latent factor approach, but I don't think measurement errors have much to do with it.

I do not understand the " in your table 1.


It's a <a href="http://en.wikipedia.org/wiki/Ditto_mark">ditto mark</a>.
Admin
White flight and the resulting urban decay are such well known phenomena in 20th century US history that I regard what I wrote as common knowledge not requiring a source citation.


Recall that not all readers are from the US. I am a case in point (Danish). Recall also that for most of recent history, north European countries have been rather racially homogeneous countries, so concepts of white flight and deterioration of cities due to immigration of lower g peoples are entirely new here. To be sure, it is common now also in Denmark.

Rangvid, B. S. (2009). "School Choice, Universal Vouchers and Native Flight from Local Schools". European Sociological Review 26 (3): 319–335. doi:10.1093/esr/jcp024

Denmark has traditionally lacked the kind of ethnically self-aware groups that would produce histories of this kind of thing. They (white nationalist groups) are emerging now as a reaction to the massive immigration.

The comment will need a cite for a general history of this phenomena, a review or at least a study of it. The inflammatory remarks must be rewritten to neutral language. This is the only part of the 40-ish page article that I think must be changed before I can approve it.
What published study would you suggest? R&J 2010 is a sort of review article, so I think it's appropriate.


Unfortunately, there is still nothing, unless te Nijenhuis decides to publish the entire thing. He told me before (one year ago) that he still needs to collect more sample. I don't know where this is going. You can cite R&J 2010, but precise it's about an unpublished study. And I'll be ok.

Of course, genetic correlations can be spurious (e.g., they may not reflect shared causal influences but common ancestry) but as they have been confirmed for g in multivariate GCTA with unrelated individuals, too, that's extremely unlikely.


I'm not talking about confoundings. Just that g must be modeled explicitly. What the studies are doing here is just a model with 2, 3 or 4, correlated factor models. Not g models.

Yeah, I think that study is more interesting for its methods than its results because of sampling issues. It should be replicated now that better data are available. I will remove the reference because it's really a rather equivocal study.


The sampling bias can be an explanation, but I remember several times I saw something like that. See Jensen's Educability & Group Differences, page 182, he reports some weird numbers on black twin heritabilities (ranging from 0.02 to 1.76). He said it's possibly due to low sample size. But too many times I read Rushton, Jensen, and Lynn, and others arguing about sampling issues in ethnic samples when white-group data differ from minority-group data. Of course, they are indeed smaller. But another possibility could be that the pattern of relationships found among whites are not necessarily generalizeable. This latter possibility seems to be somewhat obscured, more often than it deserves.

For example, the various Jensen and anti-Jensen effects can be readily explained by the g model, but not by non-g models.


Ok with that. But usually, the model fit has a purpose. It helps us to select the best model. So unless they are wrong in doing it like this, I'm not convinced that a g model should be selected. But if g model cannot be selected as the best model, how can we make claims such as "MCV shows that g1-loading correlate positively with group difference and non-g loading correlate negatively with group differences, and thus it proves that a g model fits better" ? Of course, if anything, this finding is more supportive of Jensen's view than otherwise. Like I said, I agree with that. But the selection of the best model is not a question for which i have found an answer already. But if I should trust what they (e.g., Dolan, Wicherts and others) said, then we still don't have a proof that g model is superior than non-g model, even though the parameter values of the g model confirm a Spearman-Jensen effect.

That said...
Model fit is sometimes a difficult concept to grasp. I saw many times people calling "model" a path analysis (or SEM) with a full model versus reduced model. In the full, they have all structural path coefficients freely estimated. In the reduced model, they constrain one or two path corelations to be equal to zero because model fit indicates not decrement. And they call it "hypothesis testing". I'm not sure about that. I think it's more appropriate to call "competitive models" such as a model with 4 correlated factors vs 4 correlated factors + second-order g, or a bifactor g model, etc. These are what I can truly call models: it is how you build your latent variables (how many indicators per latent var, etc.) and whether or not you include higher-order factors, etc. Of course, we can say the only difference between a correlated 4 factors vs non-correlated 4 factors is that only the (structural) paths correlations are removed, and yet I can consider them as models because they are theoretically-based (Jensen talked a lot about them). But I doubt many practioners see it like this. Each time they remove a path, they immediately call it "alternative model". And yet, Dolan did something like that too. In a 2nd-order g model, he has specified different freely estimated parameters, so that for g model, you can have 5 or 6 submodels, and for non-g models, you can also have 5 or 6 submodels.

MI comprises many stages, and it's still an analysis of MI even if not all stages are tested. But I can remove those two if you insist on it.


It really depends on what you're referring to. Intercepts means that 2 groups equalized on total score should have equal probability of answering correct. If not, that means they have different knowledge levels as required by the test(s). On the other hand, violation of loading invariance means the 2 groups use different abilities to resolve the same subtests/items. Which one is crucial for the discrimination/racism hypothesis, in your opinion ? I think loading invariance is more relevant here, because it has less (perhaps not at all) to do with knowledge. So, depending on the context (i.e., what you are attempting to show by citing them), you may cite them.

I take them to mean that because of measurement non-invariance cohort differences cannot be attributed to latent factors, whereas the b-w gap can be attributed to latent factors, given that MI is not violated. Do you disagree with this?


Wicherts makes no distinction (at least, not clearly) between subtest bias and test bias. The first does not imply the second, if there is DIF or DBF cancellation (DBF stands for differential bundle functioning, bundle is sometimes referred to subtest; yes, I know, it's truly annoying that people use different words and names to talk about the same thing). So, reading Wicherts, it says something like subtests 1-4 are biased against old cohorts, but 5-8 are biased against recent cohorts. Thus the subtests biases cancel out at the test level.

It seems to me that a lot of people using MGCFA are not aware of DBF cancellation/amplification. See here.

Roznowski, M., & Reith, J. (1999). Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement?. Educational and psychological measurement, 59(2), 248-269.

They are not the first who make this argument. That said, it's only when I learned and read about DIF studies that I come to understand how users of MGCFA poorly understand the concept of bias. These are different things :

1. loading non-invariance
2. intercept non-invariance (difference statistically significant, through model fit indices)
3. intercept non-invariance (magnitude of the difference, needs to be calculated given the observed and expected mean difference)
4. intercept non-invariance (direction or "sign" of the bias)

Most of the time, when reading MGCFA studies, people just stop at 1. Sometimes, they consider 2. But almost never 3 and 4. They don't care or just don't know that "significant" difference is not equivalent to "meaningful" difference. Let alone the direction of the bias and the concept of amplification and cancellation DIF/DBF that is almost never discussed in their papers. But this concept is usually discussed in DIF studies.

This being said, things can be still different if Wicherts (2004) discovered in most studies that loading invariance is violated. But except for one study (the estonian data) it's usually the intercept that is violated (sometimes with residual invariance as well). See the models 2 with the lambda greek letter "Λ" (factor loadings). It's clear that it is not "Λ" that is source of problem.
http://wicherts.socsci.uva.nl/wicherts2004.pdf

I would say that it is only if most studies show violation of MI at factor loading level, that this claim can be more or less justified. Because it means that BW differences are comparable (groups use same ability to resolve the same test), but cohort differences are not (groups don't rely on the same ability), and thus any attempt to link BW gap and FE gains sits on a shaky ground.

Indeed, when they say stuff like "cohort differences cannot be attributed to latent factors" it's extremely confusing. They probably understand it as Lubke (2003) explained, i.e., it's about differences of the factors at work, as you also say in your paper. But the sentence seems to be more inclusive than just this idea. Still, I can agree if it concerns loading invariance. But if it's about intercept invariance, I cannot make sense of their sentence.

To illustrate, just look at Beaujean/Osterlind (2008). When "adjusting" for item bias, PPVT-R gains vanish but PIAT-math gain has been only cut in half. In the first case, you would say bias is an important factor, and in the latter case, you can say that bias can account for 50% of the FE gains. In the series of analyses performed by Wicherts, it's more like bias accounts for 0% of the FE gains, because the biases cancel out. And yet in both cases, because of intercept bias, the claim that "cohort differences cannot be attributed to latent factors" is entirely correct. For practical purposes however, it's completely irrelevant. It falsely gives the impression that bias fully or meaningfully accounts for Flynn gains.

Nonetheless, you can still ignore this comment here. It's not sufficient for disapproving your paper. I just wanted to make clear my views on this problem here. Perhaps it's me who's wrong. If you agree and add a note, that's excellent. Otherwise, no problem.
Admin
I will ask te Nijenhuis about the situation with that study. He presented the findings at the London conference that I attended.
A new version of the article is attached. I made the following changes:

- Added reference to Rowe et al. (1998) (Herrnstein's syllogism)
- Specified that Phillips et al. (1998) used a verbal IQ test.
– Rewrote the part of the housing discrimination section that Emil objected to. I don't think it needs a source though.
- Removed reference to Rowe & Cleveland's (1996) biometric SEM study.
- Removed references to the two MI studies where equal intercepts were not tested.
- Plus some minor rewordings.

Of course, genetic correlations can be spurious (e.g., they may not reflect shared causal influences but common ancestry) but as they have been confirmed for g in multivariate GCTA with unrelated individuals, too, that's extremely unlikely.


I'm not talking about confoundings. Just that g must be modeled explicitly. What the studies are doing here is just a model with 2, 3 or 4, correlated factor models. Not g models.


The genetic correlations indicate that the same genes explain most of the heritability of seemingly unrelated different abilities, e.g. verbal and perceptual ability. This is consistent with the g model. It does not prove the validity of the model, just increases its plausibility. Modelling g explicitly would not prove the g model, either, just potentially increase its plausibility even more.

Yeah, I think that study is more interesting for its methods than its results because of sampling issues. It should be replicated now that better data are available. I will remove the reference because it's really a rather equivocal study.


The sampling bias can be an explanation, but I remember several times I saw something like that. See Jensen's Educability & Group Differences, page 182, he reports some weird numbers on black twin heritabilities (ranging from 0.02 to 1.76). He said it's possibly due to low sample size. But too many times I read Rushton, Jensen, and Lynn, and others arguing about sampling issues in ethnic samples when white-group data differ from minority-group data. Of course, they are indeed smaller. But another possibility could be that the pattern of relationships found among whites are not necessarily generalizeable. This latter possibility seems to be somewhat obscured, more often than it deserves.


There's certainly a need for more data, but what we do have is consistent with there not being race interactions, see the meta-analysis by John and yours truly. Also, if the genetic and environmental determinants were very different across races, I don't think the factor structures could be so similar.

For example, the various Jensen and anti-Jensen effects can be readily explained by the g model, but not by non-g models.


Ok with that. But usually, the model fit has a purpose. It helps us to select the best model. So unless they are wrong in doing it like this, I'm not convinced that a g model should be selected. But if g model cannot be selected as the best model, how can we make claims such as "MCV shows that g1-loading correlate positively with group difference and non-g loading correlate negatively with group differences, and thus it proves that a g model fits better" ? Of course, if anything, this finding is more supportive of Jensen's view than otherwise. Like I said, I agree with that. But the selection of the best model is not a question for which i have found an answer already. But if I should trust what they (e.g., Dolan, Wicherts and others) said, then we still don't have a proof that g model is superior than non-g model, even though the parameter values of the g model confirm a Spearman-Jensen effect.


You will never have a single test that will determine what the correct model is. The are always alternative models in CFA that fit equally well. You will have to look at the big picture, all the evidence.

I'll reply to the rest of your comments later.
Admin
What about Piffer's method for checking population differences in g? Surely, the current known SNPs for IQ fit rather well with the world phenotypic IQ scores. The good thing about this method is that it does not actually require that someone conducts an admixture study which is expensive and extremely politically incorrect. One merely has to use the SNPs found in GWAS studies which seem more politically palatable.

Surely, if we found that after we have 100 SNPs for g, use the Piffer method or some descendant, and find that there is no such correlation across-races with frequencies, this would be a severe blow to a global hereditarianism i.e. the genetic model about group differences at the world level.
Admin
Comments about the second revision.

The Flynn 1980 is missing from the reference list.

The PDF file is strange in that copying text from it is somewhat buggy. What is the PDF creation procedure?

“Kaplan’s epistemological double standard cannot be explained away by the fact that he only briefly and cursorily reviews research on racism in America.”

This reminds me of this Gottfredson paper. Gottfredson, L. S. (2007). Applying double standards to "divisive" ideas. Perspectives on Psychological Science, 2(2), 216-220. http://www.udel.edu/educ/gottfredson/reprints/2007doublestandards.pdf

“One of the most important discoveries made by Arthur Jensen in his research on the black-white IQ gap was the finding that its magnitude is not invariant across different tests but tracks their g loadings, or correlations with the latent general factor of intelligence. In a meta-analysis of 149 tests, he found an average correlation of 0.63 between the magnitudes of black- white gaps and g loadings (Jensen, 1998, pp. 377–378). 9 What this means is that the better a measure of the g factor a given cognitive test is, the greater the black-white gap on it usually is.”

This is somewhat unclear. There were 149 subtests in total fifteen studies combined into one study.

“However, this has not been universally found to be the case (e.g., Murray, 2007). It is
conceivable that insufficient sampling of black individuals from the full range of ability or
floor effects in tests have artificially lowered the variance of black IQ in many studies.”

Someone should do a meta-analysis. Chuck?

Aside from the missing reference, I approve of publication.
Perhaps I am egotistic but I cannot approve of a paper on race differences in intelligence which doesn't even cite Piffer's work. This paper is not in touch with the most recent developments in this field such as Piffer, 2013 (https://drive.google.com/file/d/0B7hcznd4DKKQUTgyQXhwRmZTLVU/edit?usp=sharing)
or Piffer, 2014 (http://biorxiv.org/content/early/2014/08/14/008011).
I agree (and I wish I had remembered that myself). This paper ought to cite Piffer's work. Also, though I forgot to mention this, Jensen's estimate for r (B-W difference x g loading) = 0.63 is much too low. Applying meta-analytical corrections raises the figure substantially.
Admin
Jensen partialled out reliabilities though.

And yes, when it comes to the testability of HH. The obvious options are:
- Admixture studies, perhaps the strongest possible type is the within siblings version because it controls for shared environment.
- SNP counting á la Piffer's method

All the other methods I can think of are more indirect and have been used, e.g. cross-racial adoptions.
The Flynn 1980 is missing from the reference list.


No, it isn't:

Flynn, J. R. (1980). Race, IQ and Jensen. London: Routledge.

The PDF file is strange in that copying text from it is somewhat buggy. What is the PDF creation procedure?


The PDF was made with Adobe PDF Maker from a Word document, so I don't think that's the problem. The culprit is probably the font I use, Cambria, which I rather like but which does not always come out properly when converting from Word to PDF. However, I don't notice any problems when copy-pasting from the file.

“One of the most important discoveries made by Arthur Jensen in his research on the black-white IQ gap was the finding that its magnitude is not invariant across different tests but tracks their g loadings, or correlations with the latent general factor of intelligence. In a meta-analysis of 149 tests, he found an average correlation of 0.63 between the magnitudes of black- white gaps and g loadings (Jensen, 1998, pp. 377–378). 9 What this means is that the better a measure of the g factor a given cognitive test is, the greater the black-white gap on it usually is.”

This is somewhat unclear. There were 149 subtests in total fifteen studies combined into one study.


I'll rewrite it.

Perhaps I am egotistic but I cannot approve of a paper on race differences in intelligence which doesn't even cite Piffer's work. This paper is not in touch with the most recent developments in this field such as Piffer, 2013 (https://drive.google.com/file/d/0B7hcznd4DKKQUTgyQXhwRmZTLVU/edit?usp=sharing)
or Piffer, 2014 (http://biorxiv.org/content/early/2014/08/14/008011).


I don't refer to your work because my article is not an exhaustive review of arguments in favor of hereditarianism, and I don't think your arguments are among the most important and convincing of the various lines of evidence that exist, certainly not in the US context.

Also, though I forgot to mention this, Jensen's estimate for r (B-W difference x g loading) = 0.63 is much too low. Applying meta-analytical corrections raises the figure substantially.


I pointed this out in footnote 9.

Wicherts makes no distinction (at least, not clearly) between subtest bias and test bias. The first does not imply the second, if there is DIF or DBF cancellation (DBF stands for differential bundle functioning, bundle is sometimes referred to subtest; yes, I know, it's truly annoying that people use different words and names to talk about the same thing). So, reading Wicherts, it says something like subtests 1-4 are biased against old cohorts, but 5-8 are biased against recent cohorts. Thus the subtests biases cancel out at the test level.

It seems to me that a lot of people using MGCFA are not aware of DBF cancellation/amplification. See here.

Roznowski, M., & Reith, J. (1999). Examining the measurement quality of tests containing differentially functioning items: Do biased items result in poor measurement?. Educational and psychological measurement, 59(2), 248-269.

They are not the first who make this argument. That said, it's only when I learned and read about DIF studies that I come to understand how users of MGCFA poorly understand the concept of bias. These are different things :

1. loading non-invariance
2. intercept non-invariance (difference statistically significant, through model fit indices)
3. intercept non-invariance (magnitude of the difference, needs to be calculated given the observed and expected mean difference)
4. intercept non-invariance (direction or "sign" of the bias)

Most of the time, when reading MGCFA studies, people just stop at 1. Sometimes, they consider 2. But almost never 3 and 4. They don't care or just don't know that "significant" difference is not equivalent to "meaningful" difference. Let alone the direction of the bias and the concept of amplification and cancellation DIF/DBF that is almost never discussed in their papers. But this concept is usually discussed in DIF studies.

This being said, things can be still different if Wicherts (2004) discovered in most studies that loading invariance is violated. But except for one study (the estonian data) it's usually the intercept that is violated (sometimes with residual invariance as well). See the models 2 with the lambda greek letter "Λ" (factor loadings). It's clear that it is not "Λ" that is source of problem.
http://wicherts.socsci.uva.nl/wicherts2004.pdf

I would say that it is only if most studies show violation of MI at factor loading level, that this claim can be more or less justified. Because it means that BW differences are comparable (groups use same ability to resolve the same test), but cohort differences are not (groups don't rely on the same ability), and thus any attempt to link BW gap and FE gains sits on a shaky ground.

Indeed, when they say stuff like "cohort differences cannot be attributed to latent factors" it's extremely confusing. They probably understand it as Lubke (2003) explained, i.e., it's about differences of the factors at work, as you also say in your paper. But the sentence seems to be more inclusive than just this idea. Still, I can agree if it concerns loading invariance. But if it's about intercept invariance, I cannot make sense of their sentence.

To illustrate, just look at Beaujean/Osterlind (2008). When "adjusting" for item bias, PPVT-R gains vanish but PIAT-math gain has been only cut in half. In the first case, you would say bias is an important factor, and in the latter case, you can say that bias can account for 50% of the FE gains. In the series of analyses performed by Wicherts, it's more like bias accounts for 0% of the FE gains, because the biases cancel out. And yet in both cases, because of intercept bias, the claim that "cohort differences cannot be attributed to latent factors" is entirely correct. For practical purposes however, it's completely irrelevant. It falsely gives the impression that bias fully or meaningfully accounts for Flynn gains.

Nonetheless, you can still ignore this comment here. It's not sufficient for disapproving your paper. I just wanted to make clear my views on this problem here. Perhaps it's me who's wrong. If you agree and add a note, that's excellent. Otherwise, no problem.


I'm still not sure I understand your point. My point of citing Wicherts on the Flynn vs. b-w gap is that the causal processes behind these gaps are different, as indicated by MI analyses. Whether the Flynn effect can be explained by somehow adjusting for the bias is beside the point for my concerns.

BTW, te Nijenhuis et al. have <a href="http://www.sciencedirect.com/science/article/pii/S0160289614001007">published</a>; a new meta-analysis of g x h2 in Japan. The correlation is on the low side (0.38), perhaps reflecting the smallness of the test batteries analyzed, but the results don't contradict those from Western studies. They also review Western studies, so I'll cite this one for the link between g loadings and h2.