Back to [Archive] Post-review discussions

[ODP] The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ
Here's the latest version. I fixed the problems with alphabetical order and how Jensen's g x race meta-analysis was described and added a reference to te Nijenhuis's new study about g x h2 correlations. I also cite Rietvald's and Ward's studies at the very end of the paper, although I still think mentioning them is unnecessary.
I accept your reason for not including my study only on the grounds that you're not familiar with population genetics. Otherwise, I think it's very relevant to your paper and for refuting Kaplan's argument.

Apart from this, I approve publication
Admin
This ups the approval count to 3. Perhaps Meng Hu will agree to publication too?
This ups the approval count to 3. Perhaps Meng Hu will agree to publication too?


I wanted to look at your study first because it came before Dalliard's, but what you're trying to do is becoming more and more complicated. Perhaps I should deal with Dalliard first. Do the simplest thing first. (Sorry about that, you have to wait a little bit for your paper...)

1. Environmental effects such as schooling tend to be most pronounced on the least g-loaded sub-tests.
2. The B/W gap shows the reverse pattern.
Ergo: The B/W gap is not due to these types of effects.


Flynn dealt with that argument already. He computes correlation with MCV, then he calculates the g-score by summing the subtests weighted by their g-loadings. He compares the g-score with the IQ score (the mere sum of the subtests). He concludes that the difference in g-score and IQ score is minor with regard to FE gains and B-W IQ gap narrowing. You see this in Dickens & Flynn (2006) study of B-W IQ gap over time. That means the "g argument" is flawed. And it is true. MCV correlations imply that when the g-loading of the test increases, the BW gap is stronger, the educational gain is lower, the Flynn gain is lower, and so on. But we know already that the IQ tests are already very highly g-loaded. That means you can't increase g-loadings by much anymore. From Jensen's (1998) The g Factor, there is a (bivariate) regression analysis that almost no one has ever cited. It's on page 377. The dependent var is B-W gap, the independent var is g-loadings. The intercept was -.163. Remember, the intercept is the value of the dependent var when all independent var is(are) zero. In other words, the B-W gap is negative, i.e., in favor of blacks when g-loading is zero (assuming linearity assumption holds, that is, there is no floor or ceiling effects). Now the regression slope seems to be 1.47. So, 1.47-0.16=1.31. To which he concludes that when the g-loading of the IQ tests are at their maximum (100%), the expected B-W gap should be 1.31 SD difference, compared to what we see today, mostly around 1.0 or 1.1 SD. What does it tell us ? That 1.1 SD is less real than 1.3 SD ? Of course not. Or that increasing the amount of g-loading makes lot of difference ? Not even so. And that's what Flynn attempted to show in his book "Where Have All the Liberals Gone? Race, Class, and Ideals in America" page 87 box 14. There is an apparent negative correlation between IQ gap of blacks in 2002 versus whites in 1947-48 and the g-loadings. The IQ of blacks was 104.42 and their GQ was 103.53, which is lower, thus confirming MCV but at the same time killing the "g argument" you both make. This can be seen by the trivial (1 point) difference between IQ and GQ. This confusion concerning the idea that g and Flynn gain have different properties just because they load on different factors, through PCA, is similar to what I have pointed it elsewhere about the distinction we should make between correlation and means. If PCA "group" the variables and show you a pattern on which education/FE gains is not on the component with g-loading but on the other hand heritability and B-W gap load on the component along with g-loadings, it cannot prove that educ/FE gains are unreal gains. Back to Jensen's (1998) regression analysis, if the best we can have is to widen the gap by a mere 0.2 SD, this is a pretty weak argument.

If you want to show Flynn gain or educational gain to be devoid of g, there is only and only one way to do that : by way of transfer effect. Such as the Nettelbeck & Wilson (2004) or the Ritchie et al. (2012) for non-transferability of education gain to reaction times. Every other methods are flawed in their purpose of showing if the score change is g or not g. Even the MGCFA decomposition of g/non-g gains is irrelevant here.

As for the B-W gap, there is nothing I can say. If you're not looking for any pattern of score changes, it's clear that transfer effect studies can't be of help. At least, you can rely on MGCFA g/non-g decomposition along with its model fit.

Also, I strongly disagree with his characterization of the evidential status of SH. I consider SH to be well supported.


Then, I suppose you disagree with the fact that a g model ought to be preferred over non-g model(s) only on the basis of better model fit. In social sciences it is a well known fact that a model is to be preferred when and only when the model in question fits better than others. If you're not familiar with that, I can tell you a lot of scientists are not aware of that either. Or maybe they are, but they don't show it in their work. In economics in particular, I have read econometricians saying that lot of studies attempt to test a particular model but they don't oppose the studied model against the alternative ones. They say that if other models can predict the data as well, it's not obvious which one is the loser or the winner.

Based on that, I remain with my argument. There is no clear winner or no loser among g and non-g models. You only see g to be winner because you put more weight on the worst methodologies (MCV and PCA) but not on the best and recommended methods (CFA modeling). That' why I said to Dalliard earlier that the evidence for g is positive but only weak.

Also, I explained that different models in fact can be tested with MCV here.


Model testing should involve "model fit indices" but it's not what you did.

-----
-----

Concerning Piffer's method, I don't understand why some of you here reject it without giving any argumentation whatsoever. Just because it's not "vetted" does not mean the method is flawed. To prove it flawed, you should explain what's wrong in there. I always disliked argument from authority, and you know that.

By the way, you keep saying "Rietvald" but it's "Rietveld" ! There is no mispelling in the name among the list of references given the last version of the paper (Perhaps precise at the top of the first page it's a DRAFT) but there is a mispelling at page 35.

Also, when someone makes changes, especially if the article is lengthy, try to make explicit which pages have been modified, changed, or made them in color, or whatever. I don't want to re-read the entire article again.

And I'm not even sure what's being changed here. From what Dalliard says, he has added several studies (even though I don't see Ang et al. 2010). I found these passages already (CTRL+SHIFT+F is helpful sometimes) but what about my comments on measurement invariance and g models ? I want to be sure about what the author think of this issue, and he is planning to rewrite the relevant passages according to my comments, before I give my final opinion.
Flynn dealt with that argument already. He computes correlation with MCV, then he calculates the g-score by summing the subtests weighted by their g-loadings. He compares the g-score with the IQ score (the mere sum of the subtests)....If you want to show Flynn gain or educational gain to be devoid of g, there is only and only one way to do that : by way of transfer effect.


The argument doesn't deal with the magnitude of true score differences. It simply concerns itself with the pattern of effects on subtests. Thus Flynn's argument is irrelevant. Which premise, specially, do you disagree with:

1. Environmental effects such as schooling tend to be most pronounced on the least g-loaded sub-tests.
2. The B/W gap shows the reverse pattern.
Ergo: The B/W gap is not due to these types of effects.

Then, I suppose you disagree with the fact that a g model ought to be preferred over non-g model(s) only on the basis of better model fit. ]In social sciences it is a well known fact that a model is to be preferred when and only when the model in question fits better than others.


This sounds equivocal. Based on the totality of the data, Spearman's hypothesis should be preferred to the alternative(s). The MGCFA results are inconclusive; they don't speak for or against SH. Thus, one looks at other lines of evidence, which do speak for SH. So, yes you would be agnostic about SH based on MGCFA, but you would prefer it based on the totality of the evidence.

Based on that, I remain with my argument. There is no clear winner or no loser among g and non-g models. You only see g to be winner because you put more weight on the worst methodologies (MCV and PCA) but not on the best and recommended methods (CFA modeling).


Ok, let's weight the lines of evidence:

(-1 = against, 0 = neutral, 1 = for)

MGCFA x 50 = 0
Everything else x 50 = 1
......
evidential weight > 0

When I give MGCFA equal weight, SH is favored. If I double the weight, SH is still favored. If I triple the weight....

Model testing should involve "model fit indices" but it's not what you did.


Get out of here.

Concerning Piffer's method, I don't understand why some of you here reject it without giving any argumentation whatsoever. Just because it's not "vetted" does not mean the method is flawed. To prove it flawed, you should explain what's wrong in there. I always disliked argument from authority, and you know that.


A method not being well tested is a "flaw" when it comes to quality of evidence. The results are more uncertain because the method is.
Concerning Piffer's method, I don't understand why some of you here reject it without giving any argumentation whatsoever. Just because it's not "vetted" does not mean the method is flawed. To prove it flawed, you should explain what's wrong in there. I always disliked argument from authority, and you know that.


My papers have been published on peer reviewed journals (Mankind Quarterly, OBG, Interdisciplinary Bio Central) so they have been "vetted" by experts. All Chuck is doing is bringing discredit to the reviewers of OpenPsych in general (and as a reflection to himself) because my paper (Opposite selection pressures...) has been approved for publication on OpenPsych forum. This has made me doubt that he's qualified to be a reviewer for OBG, since obviously he doesn't consider himself expert enough in genetics to be able to evaluate my work on his own (but he's got to rely on the opinion of a team of people - the "signatories"- that he has consulted via email).
Admin
Flynn dealt with that argument already. He computes correlation with MCV, then he calculates the g-score by summing the subtests weighted by their g-loadings. He compares the g-score with the IQ score (the mere sum of the subtests). He concludes that the difference in g-score and IQ score is minor with regard to FE gains and B-W IQ gap narrowing. You see this in Dickens & Flynn (2006) study of B-W IQ gap over time. That means the "g argument" is flawed. And it is true. MCV correlations imply that when the g-loading of the test increases, the BW gap is stronger, the educational gain is lower, the Flynn gain is lower, and so on. But we know already that the IQ tests are already very highly g-loaded. That means you can't increase g-loadings by much anymore. From Jensen's (1998) The g Factor, there is a (bivariate) regression analysis that almost no one has ever cited. It's on page 377. The dependent var is B-W gap, the independent var is g-loadings. The intercept was -.163. Remember, the intercept is the value of the dependent var when all independent var is(are) zero. In other words, the B-W gap is negative, i.e., in favor of blacks when g-loading is zero (assuming linearity assumption holds, that is, there is no floor or ceiling effects). Now the regression slope seems to be 1.47. So, 1.47-0.16=1.31. To which he concludes that when the g-loading of the IQ tests are at their maximum (100%), the expected B-W gap should be 1.31 SD difference, compared to what we see today, mostly around 1.0 or 1.1 SD. What does it tell us ? That 1.1 SD is less real than 1.3 SD ? Of course not. Or that increasing the amount of g-loading makes lot of difference ? Not even so. And that's what Flynn attempted to show in his book "Where Have All the Liberals Gone? Race, Class, and Ideals in America" page 87 box 14. There is an apparent negative correlation between IQ gap of blacks in 2002 versus whites in 1947-48 and the g-loadings. The IQ of blacks was 104.42 and their GQ was 103.53, which is lower, thus confirming MCV but at the same time killing the "g argument" you both make. This can be seen by the trivial (1 point) difference between IQ and GQ. This confusion concerning the idea that g and Flynn gain have different properties just because they load on different factors, through PCA, is similar to what I have pointed it elsewhere about the distinction we should make between correlation and means. If PCA "group" the variables and show you a pattern on which education/FE gains is not on the component with g-loading but on the other hand heritability and B-W gap load on the component along with g-loadings, it cannot prove that educ/FE gains are unreal gains. Back to Jensen's (1998) regression analysis, if the best we can have is to widen the gap by a mere 0.2 SD, this is a pretty weak argument.

If you want to show Flynn gain or educational gain to be devoid of g, there is only and only one way to do that : by way of transfer effect. Such as the Nettelbeck & Wilson (2004) or the Ritchie et al. (2012) for non-transferability of education gain to reaction times. Every other methods are flawed in their purpose of showing if the score change is g or not g. Even the MGCFA decomposition of g/non-g gains is irrelevant here.

As for the B-W gap, there is nothing I can say. If you're not looking for any pattern of score changes, it's clear that transfer effect studies can't be of help. At least, you can rely on MGCFA g/non-g decomposition along with its model fit.


I have been thinking about this. I think Flynn is wrong here, not the MCV folks. Here's why. Look at the Nijenhuis test-retest training paper. The MCV gives gives close to -1, indicating no gain in g. I think we can safely postulate theoretically too, that there is no gain in general intelligence from taking the same test twice (and we have tested this with transfer tests too). However, using Flynn's method would show gains. Absurd conclusion, so his method must be flawed.

Transfer testing is the best test of whether a real change in GCA has occurred. MCV is a useful test because it does not require a new study, but it can be wrong. I was going to write up a small communication paper about this.
Admin
Concerning Piffer's method, I don't understand why some of you here reject it without giving any argumentation whatsoever. Just because it's not "vetted" does not mean the method is flawed. To prove it flawed, you should explain what's wrong in there. I always disliked argument from authority, and you know that.


My papers have been published on peer reviewed journals (Mankind Quarterly, OBG, Interdisciplinary Bio Central) so they have been "vetted" by experts. All Chuck is doing is bringing discredit to the reviewers of OpenPsych in general (and as a reflection to himself) because my paper (Opposite selection pressures...) has been approved for publication on OpenPsych forum. This has made me doubt that he's qualified to be a reviewer for OBG, since obviously he doesn't consider himself expert enough in genetics to be able to evaluate my work on his own (but he's got to rely on the opinion of a team of people - the "signatories"- that he has consulted via email).


You know quite well that these journals are not top journals. Vetting would require examination of top experts, none of which are found on any of those three journals. If you want it vetted (this is a good idea), then I think you should perhaps contact some authors yourself.

For practical reasons, it is perhaps best to ask them only about the method based on the height data, since there are few feels involved in that. If they concur that it works well for height, we can be pretty sure it works well for any kind of highly polygenic trait.

PS. You really ought to stop the personal stuff in the forums. Stuff like that belongs in the private or comment section on the DailyMail.
Another version of the article is attached. It's the same as the previous one except that I added another variable (Positive affect) to Table 1, based on my analysis of Add Health data. This change has no substantial effect on any of my conclusions. I also fixed some language issues (e.g., Rietveld's name).

Then, I suppose you disagree with the fact that a g model ought to be preferred over non-g model(s) only on the basis of better model fit. In social sciences it is a well known fact that a model is to be preferred when and only when the model in question fits better than others.


No such thing is "well known" in social science. Model fit is only one piece of evidence, to be judged against the backdrop of other evidence. If the fit of two models is similar, then you use other evidence to choose between them.

As they say, nothing is as practical as a good theory. That's the point of my basing my arguments on the g model: it is the only theory that makes sense of the mess of evidence in this area. What you seem to advocating is blind, theory-free data analysis where you fetishize fit indices and ignore everything else. You won't get anywhere that way.

but what about my comments on measurement invariance and g models ? I want to be sure about what the author think of this issue, and he is planning to rewrite the relevant passages according to my comments, before I give my final opinion.


I'm not going to address them as I don't think they challenge my arguments in any way. Perhaps instead of these walls of text you could formulate your criticisms as syllogisms as Chuck does above, so I could discern what you are actually protesting against.
The argument doesn't deal with the magnitude of true score differences. It simply concerns itself with the pattern of effects on subtests. Thus Flynn's argument is irrelevant.


That's why I had the impression you misunderstand the implication of a correlation between g-loadings and score differences. A positive r does not make the score difference real, and a negative r does not make it less real or unreal. You shouldn't stop at the correlation. You must go further. What you must calculate is the impact of these correlational patterns on the score difference. A correlation is really abstract unless you can derive an effect size from it. As I said, Jensen's bivariate regression tells you why and where you're wrong. The B-W gap actually turns around 1 SD, and the expected B-W gap is 1.3 SD for g-loading at its maximum. Thus, increasing the g-loading simply adds a modest score difference.

If you want to prove Flynn effect has zero gain on g, you should better use (bivariate) regression and look at the magnitude of the cohort difference when the g-loading peaks at 100%.

This sounds equivocal. Based on the totality of the data, Spearman's hypothesis should be preferred to the alternative(s).


No. You have to distinguish between good and bad methods, between exploratory and confirmatory studies. I prefer the last ones. But most of the studies you cite are exploratory. Look at Dolan (2000) and you'll see there is no strong evidence of SH, just a meager one. I do not say MCV is bad. When you have large data points and combine with it meta-analysis and corrections, it's an acceptable one, but not like MGCFA. What is bad with MCV is that you do not test g-model against first-factor latent model. This is unsatisfactory because we know that below the level of 2nd-order g, you have some first-order factors. And yet, I don't see how you can test it with MCV. It seems to me that the "models" within MCV are extremely ill-specified. g model is not made explicit and the first order factors (below g) are totally absent. Given the absence of the latter, how is it possible to test g versus non-g models with MCV ? That is why I remain convinced you have not provided a good defense for MCV. And no one here did. I honestly think it not possible to defend MCV anymore. I once believed it was a good method, but the more I read Dolan studies, the more that belief has been shaken. MCV is a method you should use when you do not have any other things.

I have been thinking about this. I think Flynn is wrong here, not the MCV folks. Here's why. Look at the Nijenhuis test-retest training paper. The MCV gives gives close to -1, indicating no gain in g. I think we can safely postulate theoretically too, that there is no gain in general intelligence from taking the same test twice (and we have tested this with transfer tests too). However, using Flynn's method would show gains. Absurd conclusion, so his method must be flawed.

Transfer testing is the best test of whether a real change in GCA has occurred. MCV is a useful test because it does not require a new study, but it can be wrong. I was going to write up a small communication paper about this.


That correlation of -1 was from white-white comparison if I'm not mistaken, whereas Flynn (2008) compared blacks and whites. Also, given his page 87, I calculate the correlation between g-loadings and black(2002)-white(1947) gap and it was -0.537. Yet, the difference IQ-GQ was 1 point. Honestly, I don't think a correlation of 100% will make it very different. But it's not clear to me what Flynn wanted to show, I admit. He did (with Dickens) a better job in his (2006) meta-analysis. See the link below, table 2.
http://www.brookings.edu/views/papers/dickens/20060619_iq.pdf

You have 3 columns of g gains :

1.17, 2.57, 4.67

You have 3 columns of IQ gains :

1.20, 2.82, 4.93

And the corresponding r(g-loadings*gains) :

-0.28, -0.73, -0.38

Look closely at that big value of -0.73. And yet, the difference between 2.57 (GQ) and 2.82 (IQ) is meaningless. What will be the value of the g gains when the correlation hit 100% ? Will it become zero g gains ? I don't think so. Then, what about the te Nijenhuis (2007) correlation of -1.00 ? I can't make sense of it. I don't think we can prove there is zero gain with that analysis. The Nettelbeck & Wilson (2004) study by way of comparison suffers no doubt about that, except its small sample size.

Also, te Nijenhuis has conducted other subsequent analysis after the (2007) meta-analysis, right ? I remember the 2013 study "Is the Flynn effect on g?: A meta-analysis". A fairly large sample, but a negative corr of -0.38.

Furthermore, Flynn's method is similar in its logic to the Jensen's bivariate regression. And I accept both, because, as I said to John, you should not stop at the correlation, but you must translate it into score difference, d gap, etc. That's what Jensen and Flynn did, using different methods. I think they can be complementary.

No such thing is "well known" in social science. Model fit is only one piece of evidence, to be judged against the backdrop of other evidence. If the fit of two models is similar, then you use other evidence to choose between them.


When you have difficulties choosing between models, you have to look elsewhere, I agree with this. But if the other evidence come from bad methods, you should understand that the evidence in favor of SH is far from being definitive, but only "encouraging" and "suggestive" for SH. I can change my mind if you or John or Emil or someone else here can answer my objection, which I quote here :

What is bad with MCV is that you do not test g-model against first-factor latent model. This is unsatisfactory because we know that below the level of 2nd-order g, you have some first-order factors. And yet, I don't see how you can test it with MCV. It seems to me that the "models" within MCV are extremely ill-specified. g model is not made explicit and the first order factors (below g) are totally absent. Given the absence of the latter, how is it possible to test g versus non-g models with MCV ?


Trust me. If you beat me on this, I will accept the argument that when MGCFA fails badly we should look at alternative methods such as MCV. And I will remove what I said about MCV. For real.
What is bad with MCV is that you do not test g-model against first-factor latent model. This is unsatisfactory because we know that below the level of 2nd-order g, you have some first-order factors. And yet, I don't see how you can test it with MCV. It seems to me that the "models" within MCV are extremely ill-specified. g model is not made explicit and the first order factors (below g) are totally absent. Given the absence of the latter, how is it possible to test g versus non-g models with MCV ?

Trust me. If you beat me on this, I will accept the argument that when MGCFA fails badly we should look at alternative methods such as MCV. And I will remove what I said about MCV. For real.


Thank you for concisely explicating your position.

Ordinary MCV tests a g-model, understood statistically, versus a non-g-model in which group differences don't, for some other reason, happen to correlate with subtest g-loading. It can do this because a Jensen Effect is an explanandum that, when consistently found, can only be explained by between group differences in g or by between group differences in non-g factors that happen, for some other reason, to be larger on more g-loaded factors. Now, I agree that that ordinary MCV can't tests such non-g models. I have argued, though, that multivariate MCV can test some of them. You can just see if the Jensen effect is driven by e.g., cultural load or by a verbal factor differences as in the case of the deaf-normal difference. It can't test all, but it's not true that it can't test none.

I will stop there. If you agree with the above I will continue to make my argument. If not, I will have to clarify some more.
In MCV, when you correlate non-g residuals with another variable, these residuals are from individual subtests. The first-factor level is not specified, while in CFA, you have the residuals of both the subtests and first factors. Furthermore, the multivariate MCV (let's call it MMCV) has two problems. You do a multiple regression with 2 indep. variables, but only 10-11 subtests. I remembered sometimes that when I enter the second independent variables, my SPSS removed it and put that second variable into the box "excluded variable". I still dont' know why, but I can suspect it's because of too few variables. Yet, you can say it's possible to apply it for, say, 15 or more variables. Maybe, but now, we look at the second problem. The thing is that MMCV merely control for whatever you can measure with MCV. Once again this means that the first-factor level is not incorporated. Lastly, you should be careful about interpretations of a multiple regression. I remember you show that cultural loading do not predict B-W difference once both culture loading and g loading are incorporated in the regression model. The coefficient is much stronger for g-loading, of course, but that was because culture is held constant. Above culture, there is a g effect, but above g there is no culture effect. But these regression coefficient cannot help you to see which one is the best "predictor". The g and cultural loadings are so highly correlated (0.85-0.90) that it renders even more ambiguous the interpretation of these multiple regression coefficients. Remember what Jensen (Bias in Mental Testing) said about partial correlation analyses :

But the interpretation of such partial correlations is very tricky. They are easily misleading. The high partial correlation of education and occupation, for example, would seem to imply that almost anyone, given the necessary amount of education, could attain the corresponding occupational status more or less regardless of his or her IQ, as the partial correlation of IQ and occupation is quite low. But this would be a false inference, because not everyone can attain the educational thresholds required by the higher occupations. Holding IQ constant statistically, as a partial correlation, only means that, among those whose IQs are above the threshold required for any given occupation, educational attainment then becomes the chief determinant of occupational level. The low partial correlation between IQ and occupation does not contradict the importance of the threshold property of IQ in relation to occupational status. If the true relationship between IQ and occupation were as low as the partial correlations would seem to suggest, we should find every level of IQ in every type of occupation. But of course this is far from true, even in occupations to which entry involves little or no formal education. Moreover, not all high-IQ persons choose to enter the professions or other high-status occupations, but those who do so work to attain the required educational levels; and hence educational level is more highly correlated with occupational level than is IQ per se.


If the independent variables can be truly viewed as "competitors" I should agree with you that it is possible to test these "models" but that is not exactly what MR is doing here.

By the way, a classical MR is a bad thing to do when you want to test g versus non g models. See here for why. What you should apply is a dominance (or relative weight) analysis.
In MCV, when you correlate non-g residuals with another variable, these residuals are from individual subtests. The first-factor level is not specified, while in CFA, you have the residuals of both the subtests and first factors.


Hmmm...this doesn't seem to be going anywhere. Let's start again. What's the point of MCV? The point is to determine if g-differences are driving group differences. Obviously, if group differences are being driven by g differences, then g differences must be involved and heavily so. What can't MCV do? MCV can say little about the involvement of g in group differences when the correlations are modest positive to modest negative. Contra Jensen, an r=0 doesn't imply that group differences are g "hollow". Nor does an r = - 0.50. It just implies that differences are not being driven by g. I have no doubt, for example, that the Flynn Effect involves g differences; it's just the the g-differences are being dragged behind the broad and narrow factor ones, kicking and screaming. I trust that you agree.

So what's the limitation of MCV given its purpose? As Gordon pointed out 30 years ago strong, positive correlation -- the supposed evidence of g driven differences -- (a) could be completely spurious. As Wichert's pointed out, such correlations (b) could be the results of a depressive effect on a non-g factor which happened to have a higher g-loading. What's the remedy? For (a) meta-analysis. For (b) some form of multivariate MCV, where one checks if proposed non-g factor A is instead driving the Jensen Effect. I trust that you agree.

So where do we disagree?

Now, you don't explain why MMCV can't test (b). You say:

"Furthermore, the multivariate MCV (let's call it MMCV) has two problems. You do a multiple regression with 2 indep. variables, but only 10-11 subtests."

My solution was aggregation. I explained the justification for this numerous times.

"Yet, you can say it's possible to apply it for, say, 15 or more variables. Maybe, but now, we look at the second problem."

You can just test one model at a time. For example, recall the deaf-normal comparison. A (correct) non-g account of the Jensen effect would be that differences were loaded on a verbal factor which happened to have a higher g-loading than others. You test this by coded verbal =1, non verbal = 0. Alternatively, you can just conduct MCV exclusively on the verbal or non verbal subtests and avoid using MR. This tests this specific theoretically plausible non-g model.

"If the independent variables can be truly viewed as "competitors" I should agree with you that it is possible to test these "models" but that is not exactly what MR is doing here."

I don't understand the problem, honestly. A Jensen effect is something the needs to be explained. g differences is one specific explanation. This is specific in that you are specifying the factorial cause of the Jensen effect. Saying that a Jensen Effect could be caused by the depression of a g-loaded non g factor is not a specific explanation. Now if you specify a non-g explanation of a Jensen Effect i.e., a depressive effect on verbal comprehension you can test it with MCV; if you don't like regression just exclude the sub-tests that load on this factor.

...
You know quite well that these journals are not top journals. Vetting would require examination of top experts, none of which are found on any of those three journals. If you want it vetted (this is a good idea), then I think you should perhaps contact some authors yourself.

For practical reasons, it is perhaps best to ask them only about the method based on the height data, since there are few feels involved in that. If they concur that it works well for height, we can be pretty sure it works well for any kind of highly polygenic trait.

PS. You really ought to stop the personal stuff in the forums. Stuff like that belongs in the private or comment section on the DailyMail.


I regard my new free GWAS project as a better test of vetting than the opinion of top experts. These are not gonna spend their time carefully examining any minor implication of my work, unless it's in their interest. So I do not regard their opinion very much, as it would only be an opinion and not God's final word. My free GWAS project if successful will confirm my method, if not it'll make it weaker. But as you rightly said, this is not the place to talk about my method so anyone is welcome to continue this discussion in a place other than this (e.g. email).In any case, I'll keep you guys updated on the outcome of my new project (http://openpsych.net/forum/showthread.php?tid=131&pid=1291#pid1291)
Perhaps you should better explain your concepts. I spent nearly 30 minutes (probably more, in fact) to understand just 2 or 3 of your sentences. Without success.

What's the point of MCV? The point is to determine if g-differences are driving group differences. Obviously, if group differences are being driven by g differences, then g differences must be involved and heavily so. What can't MCV do? MCV can say little about the involvement of g in group differences when the correlations are modest positive to modest negative. Contra Jensen, an r=0 doesn't imply that group differences are g "hollow". Nor does an r = - 0.50. It just implies that differences are not being driven by g.


You don't understand that MCV has no "model". You can keep talking about g and non-g models, but you did not prove they are clearly specified. The models in MCV are vague. There is not even the first-order factor level. Given a very large battery of test (such as in the MISTRA) you can even have a 3rd-order g factor, and here again, I don't know how you can specify the lower order first and second factor levels. My point is that you can't test g vs non-g model, if to begin with, you're incapable of defining the first-order factor level within MCV, and incorporate it in MCV. In MGCFA, studies generally find that male-female difference is mainly due to some first-order factors. How can you test that within MCV ? You simply can't. Because you only look at the level of individual subtests, completely ignoring the first-order cognitive dimensions. In fact, a correlation of g with something else (heritability, group differences, ...) within MCV looks like a model of first-order g, which we all know does not fit the data well, compared to the correlated group factors and second-order g factor models. I still haven't an answer to my objection.

So what's the limitation of MCV given its purpose? As Gordon pointed out 30 years ago strong, positive correlation -- the supposed evidence of g driven differences -- (a) could be completely spurious. As Wichert's pointed out, such correlations (b) could be the results of a depressive effect on a non-g factor which happened to have a higher g-loading. What's the remedy? For (a) meta-analysis. For (b) some form of multivariate MCV, where one checks if proposed non-g factor A is instead driving the Jensen Effect. I trust that you agree.


I don't understand (a) and (b). I would like to explain these points. Spurious because of what ? And for (b) it's even worse (non-g factor having higher g-loading ?). I struggle, but I don't see what he is trying to say.

My solution was aggregation. I explained the justification for this numerous times.


And I explained several times it's a method that should never be employed. When the tests are too heterogenous, as was the case for our meta-analysis of g*h2, the g-loadings are not comparable. Here's an illustration of the problem. Remember what I said about Jensen's bivariate regression. Imagine we did this kind of analysis on the WISC, and everyone knows it's biased toward crystallization. Imagine that the actual BW gap is 1SD, and with g-loading 100% the expected BW gap, given the intercept and the slope, become 1.3 SD. You can't conclude that the IQ gap is 1.3 SD when g-loading is 100% because in your regression, the most g-loaded subtests are the crystallized subtests. In other words, it's not the IQ gap 1SD becomes IQ gap 1.3 SD. The correct formulation is "IQ(unweighted) 1 SD becomes IQ(crystallized-weighted) 1.3 SD". In our meta-analysis, the subtests were completely different, the size as well. Regardless of PCA or PAF, a battery with 2 verbal + 8 fluid tests will necessarily have the fluid subtests having higher g-loadings and verbal with low loadings. But you know from our results that the verbal tests generally had large loadings. That's because the batteries have mainly verbal tests. Worse, in our batteries of tests, a large portion of tests were designed for people having some sort of language problems or even hearing problems (in the most extreme case). I don't even think that controlling (through dummy variable regressions) for subtest "quality" will remove that problem. The verbal subtests from different batteries can still have different correlations with non-verbal subtests of still other batteries. Without knowing this, there is no possibility to be sure about the strength of these loadings in a very representative battery of tests.

You can just test one model at a time. For example, recall the deaf-normal comparison. A (correct) non-g account of the Jensen effect would be that differences were loaded on a verbal factor which happened to have a higher g-loading than others. You test this by coded verbal =1, non verbal = 0. Alternatively, you can just conduct MCV exclusively on the verbal or non verbal subtests and avoid using MR. This tests this specific theoretically plausible non-g model.


What is "non-g account of the Jensen effect" ? A jensen effect is by definition a g-effect. I don't even know what kind of test you have in mind. And what if you conduct MCV separately on verbal/non-verbal tests ? The number of subtests will be cut in half, but you already know that.

This is specific in that you are specifying the factorial cause of the Jensen effect


I never specified anything about the causes. MCV and MR don't answer the question of causes, although MR is only "suggestive" of it.

Saying that a Jensen Effect could be caused by the depression of a g-loaded non g factor is not a specific explanation.


I don't understand the sentence.
You don't understand that MCV has no "model". You can keep talking about g and non-g models, but you did not prove they are clearly specified. The models in MCV are vague.


Let's take this step by step.

First to clarify, by model I mean a conceptual model in the sense of. MCV obviously don't model different scenarios, rather it tests some different models.

a. In a g-difference model, the difference would largely be in g, the general factor; this would induce a Jensen Effect.

There are two types of non-g difference models.

b. In one, the difference is not largely in g and is not for some reason unrelated to g larger on more g-loaded tests. This would not produce a Jensen Effect.

c. In another, the difference is not largely in g but is for some reason unrelated to g larger on more g-loaded tests. This would produce a Jensen Effect.

Normal MCV pits (a) versus (b).

Do we agree so far? (Yes or no)

To be clear I am saying something very trivial and commonsensical here.
I have little to add what Chuck has said. The importance of Spearman's hypothesis is that it represents the only credible way to account for the pattern of black-white gaps that is observed on different tests. You may try to explain the pattern by reference to some non-g factor that is highly collinear with g, but this doesn't work for two reasons. Firstly, evidence for the existence of any such factor is scarce (the cultural load theory is basically the only alternative but it's very underdeveloped and understudied). Secondly, Spearman's hypothesis means that the black-white gap can be understood in terms of the g model which is the consensus theory in psychometrics. The explanatory power of g theory is much greater than that of any alternative, and if you reject it, much of the data in psychometrics and beyond cannot be explained.

In MGCFA, studies generally find that male-female difference is mainly due to some first-order factors. How can you test that within MCV ? You simply can't.


Of course you can. Jensen found in an analysis of five batteries that the correlation between g loadings and sex gaps was ~0. This indicates that sex differences cannot be due to g (in the main). MGCFA studies have corroborated this finding. Additionally, it's possible to use MCV with non-g factor loadings to find the first-order factor(s) that are most strongly associated with sex differences.

And I explained several times it's a method that should never be employed. When the tests are too heterogenous, as was the case for our meta-analysis of g*h2, the g-loadings are not comparable.


That seems like a misguided criticism of meta-analyses in general, regardless of the data analyzed. The point of meta-analysis is to reduce sampling variance, and in the case of MCV it reduces psychometric sampling variance, minimizing the problem you describe.
MCV obviously don't model different scenarios, rather it tests some different models.

MCV can't model different scenarios because it can't evaluate the probability of model A vs model B, which MGCFA can do. MCV thus is only a partial test of SH. But let's put this aside.

b. In one, the difference is not largely in g and is not for some reason unrelated to g larger on more g-loaded tests. This would not produce a Jensen Effect.

c. In another, the difference is not largely in g but is for some reason unrelated to g larger on more g-loaded tests. This would produce a Jensen Effect.


I still do not understand your sentences.

Firstly, evidence for the existence of any such factor is scarce (the cultural load theory is basically the only alternative but it's very underdeveloped and understudied).


Ok, then. How can you get this vector of non-g first-order factors ? Such as verbal, or fluid, or some other first-order factors. I recommend you to read again Dolan (2000), in table 8 you have what I meant by g/non-g decomposition. But the best g-model was B4 (g+performance) because the 2 other group factors (verbal, memory) don't explain the B-W gap meaningfully above the g. How can you test this within MCV ? Show me you can do this, and I'll reconsider my opinion on MCV.

Of course you can. Jensen found in an analysis of five batteries that the correlation between g loadings and sex gaps was ~0. This indicates that sex differences cannot be due to g (in the main). MGCFA studies have corroborated this finding. Additionally, it's possible to use MCV with non-g factor loadings to find the first-order factor(s) that are most strongly associated with sex differences.


The passage from Jensen's book (1998) was one page 539. He merely correlates g-loading with sex difference on the subtests. He did that for 5 different IQ batteries. That does not answer the question of where you see Jensen testing the magnitude of sex difference on group factors or its correlation with group factors. I don't see it.

That seems like a misguided criticism of meta-analyses in general, regardless of the data analyzed. The point of meta-analysis is to reduce sampling variance, and in the case of MCV it reduces psychometric sampling variance, minimizing the problem you describe.


We were not talking about meta-analyses, but aggregation method. I said meta-analysis, because we have added another test of g*h2 within our big stuff of meta-analysis.

Aggregation works like that :

0.5 0.3
0.5 0.4
0.6 0.7
0.4 0.5
0.8 0.4

0.6 0.5
0.6 0.6
0.8 0.5
0.7 0.4
0.8 0.4
0.8 0.3

The first series is battery A (5 tests), the second series battery B (6 tests). First column is vector of g-loading, the second column is vector of h2. The aggregated correlation is that you correlate the entire column of this total of this 11 subtests (2 batteries). It's what Jensen did in The g Factor (p.378).
I still do not understand your sentences.


I can't explain this in a more basic way.

Ok, then. How can you get this vector of non-g first-order factors ? Such as verbal, or fluid, or some other first-order factors. I recommend you to read again Dolan (2000), in table 8 you have what I meant by g/non-g decomposition. But the best g-model was B4 (g+performance) because the 2 other group factors (verbal, memory) don't explain the B-W gap meaningfully above the g. How can you test this within MCV ? Show me you can do this, and I'll reconsider my opinion on MCV.


Offer a plausible non-g explanation for the B/W SH and I will see if I can test it using MCV. It not, I'll grant your point.
Admin
Is this discussion really relevcant to the review of Dalliard's paper? Can we stick to the topic?