Back to [Archive] Post-review discussions

[ODP] The international general socioeconomic factor: Factor analyzing international
Admin
This version has fixed intercorrelation results. They are hardly any different, e.g. mean intercor with factor scores in SPI was given as 0.9923575 before, now it's corrected to 0.990829. Similarly for the mean intercor loading, and mean factor congruence coefficient.
Can you add a table to the paper with the factor scores for the countries?
Admin
Yes. The question is more which of the factors specifically you want. They correlate very highly for sure, but do you want PCA, ML, minres, GLS, WLS, PA, oblimin third order and which dataset?
Both datasets. You choose the scores obtained with the factor analytic approach you think produced the better results.
Concerning rotation in PC/FA, I have always considered Promax to be the best, see here, although Oblimin should produce similar results.

Also, if you want a test (requested by Dalliard) for knowing which number of factors to retain in EFA, you should better avoid eigenvalue>1 and scree plot. You should better use Parallel analysis. It's a Monte Carlo simulation technique which evaluates what minimum eigenvalues are needed to reject the null hypothesis by subjecting a data set of random numbers to analysis (repeated many times, e.g., 1000 replications) when the data is adjusted for sample size and the number of variables.

See below for why you should prefer this one :

http://pareonline.net/pdf/v12n2.pdf
Determining the Number of Factors to Retain in EFA: an easy-to-use computer program for carrying out Parallel Analysis

In R, it's nFactors package you need.
http://cran.r-project.org/web/packages/nFactors/nFactors.pdf
http://www.statmethods.net/advstats/factor.html

It should also give you a plot that helps you to determine the n factors to be retained.
Admin
MH,

I used four different methods to determine number of factors to retain. They were all always in agreement except for the author's custom method which always gave the answer 1. Read more here: http://www.er.uqam.ca/nobel/r17165/RECHERCHE/COMMUNICATIONS/2006/IMPS/nFactors/chm/nScree.html

This is the function from the nFactors package. :)

I will try with promax too.
So, i will wait for your final version before approving. I don't have lot of things to say, that's bad. It was only about methods and stats. Not related with the subject of the article. But I don't disagree with you here. In case like that, I generally remain silent.

I almost forget. If you add new analyses on R, I recommend you to add these new lines of coding in your appendix .doc files.
Admin
New lines of code? But I have >500 lines of R code for this project, and two scripts for Python (200 lines approx.)

For the final version, I will of course post a final version of all the supplementary material including any new code.
Admin
Oddly, promax does not produce similar results.

r = 0.63 for SPI and 0.92 for DR, using ML as the extraction method. Very strange.

And there seems not to be much agreement regarding which method to use.

http://pareonline.net/pdf/v10n7.pdf

There is no widely preferred method of oblique rotation; all tend to produce similar results (Fabrigar et al., 1999), and it is fine to use the default delta (0) or kappa (4) values in the software packages. Manipulating delta or kappa changes the amount the rotation procedure “allows” the factors to correlate, and this appears to introduce unnecessary complexity for interpretation of results. In fact, in our research we could not even find any explanation of when, why, or to what one should change the kappa or delta settings.


The cited paper is: http://search.proquest.com.ez.statsbiblioteket.dk:2048/docview/222858621?pq-origsite=summon

Anyone have any ideas how to interpret this? Clearly the promax is at odds with all the other results. I will ask A. Beaujean. I have asked on /r/statistics too. http://www.reddit.com/r/statistics/comments/2db0hf/exploratory_factor_analysis_promax_vs_oblimin/
Admin
So, it seems that promax was mostly used because it gives similar results to oblimin but requires less computing. That was relevant back when computers were weak or rotation was done in hand, but not very relevant any more.

To further explore the idea of oblique rotations in higher levels, I did the following:

Used schmid() in every combination of factor method and rotation. Correlated the loadings (factor congruence didn't work here) with the loadings from the first unrotated factor using the same extraction method. Did for both datasets. Scores were not available, so it wasn't possible to correlate them.

Results:
> y.sl.df
oblimin simplimax promax
minres 1.00 1.00 1.00
pa 0.99 0.90 0.98
ml 0.98 0.85 0.97
> z.sl.df
oblimin simplimax promax
minres 0.97 0.97 0.99
pa 0.97 1.00 0.99
ml 0.98 0.99 0.99


Results were remarkable similar.

Given uncertainty about how schmid works precisely, I wanted to compare it with a manual extraction of the 3rd order general factor. Concretely, I extracted the first unrotated factor. Then I extracted the first 8/9 factors. Then I used those to extract 3 factors. Then I used those to extract 1 general factor. Loadings were not available because the last factor is extracted from 2nd order factors, not the manifest variables. Factor scores were available so I used them.

Results:
> y.oblique.df
promax oblimin simplimax bentlerQ geominQ biquartimin
minres 0.80 0.87 0.04 0.86 0.63 0.05
wls 0.55 0.87 -0.01 -0.86 0.80 0.10
gls 0.61 0.85 -0.01 -0.86 0.78 0.14
pa 0.64 0.92 0.01 -0.85 0.66 -0.04
ml 0.63 0.97 0.11 0.99 0.83 0.34
minchi 0.83 0.97 -0.12 0.84 0.99 0.09
> z.oblique.df
promax oblimin simplimax bentlerQ geominQ biquartimin
minres 0.55 0.97 -0.05 0.91 0.34 0.00
wls 0.90 0.86 0.00 0.92 0.16 -0.04
gls 0.90 0.88 0.00 0.93 0.03 -0.01
pa 0.87 0.68 -0.07 0.50 0.86 0.03
ml 0.59 0.98 0.12 0.95 -0.27 0.00
minchi 0.94 0.97 0.01 0.92 0.76 -0.11


What to make of this? Average by rotation method:

> round(apply(y.oblique.df,2,mean),2)
promax oblimin simplimax bentlerQ geominQ biquartimin
0.68 0.91 0.00 0.02 0.78 0.11

> round(apply(z.oblique.df,2,mean),2)
promax oblimin simplimax bentlerQ geominQ biquartimin
0.79 0.89 0.00 0.86 0.31 -0.02


Average by factor method:
> round(apply(y.oblique.df,1,mean),2)
minres wls gls pa ml minchi
0.54 0.24 0.25 0.22 0.64 0.60
> round(apply(z.oblique.df,1,mean),2)
minres wls gls pa ml minchi
0.45 0.47 0.46 0.48 0.39 0.58


We see that oblimin consistently gives the highest correlations with promax somewhat behind. The strongly divergent results for bentlerQ is due to the factor being reversed half of the time in SPI. As for factor method, we see varied results, perhaps chance flukes.

Thoughts about what to make of this?

The minchi method is apparently another method of extraction that I either missed when I looked the first time, or was recently added. I have rerun the earlier analyses with it and they are more of the same.
In the newest revision, the discussion on the number of factors and variable reverse coding is quite satisfactory, although I would reorganize it so that the number of factors is discussed in section 3, but that's up to the author.

However, in section 12, it is still not made clear that the Johnson studies used CFA which will inherently lead to higher correlations than factor score comparisons, regardless of the number of variables. When this is corrected, I will approve publication.
Admin
Here's a new draft.

There are lots of changes in this one, including:
  • Many language changes for better cross-referencing and coherence.
  • A new subsection discussing Schmid-Leiman results.
  • A new section discussing other methods for measuring the strength of the general factor.
  • Rewritten Discussion section in light of comments.
  • New tables that show off data that was before mentioned merely in the text.
  • The inclusion of a new factor method in analyses that use all the available methods.
  • A new table in the Appendix with a list of S factor scores for all available countries, N=142. These are from unrotated PCA. When both sources have a value, they are averaged.
  • It is clearly indicated that it is a draft with a big fat grey DRAFT over the text.
  • Paper now runs for 21 PDF pages, with 45 references, 13 tables and 6 figures.
Perhaps your analysis would benefit from eliminating continental origin as a confound. I did a similar thing in my paper (http://dx.doi.org/10.1101/008011). This is to see if the correlation persists also within continent or is just mediated by them. It looks like East Asian countries have large positive residuals, because none of them is in the top 10 of the S rankings. This deserves further investigation and I think a continent-level analysis would provide it.
From the book "Intelligence, Genes, and Success: Scientists Respond to The Bell Curve", there is the following paragraph from Carroll's chapter (p143) :

It is necessary to reject two proposals that have sometimes been made to support the notion of a general intelligence factor: (1) The finding of uniformly positive intercorrelations ofcognitive variables as an indicator of the presence of a general factor. The mere fact that cognitive variables are positively correlated does not adequately validate the presence of a single general factor. It might indicate the presence of multiple general factors. (2) General factors identified as first principal components or principal factors in the factor analysis of a dataset. The first principal component, or eigenvector, is that vector that produces (under appropriate constraints) the maximal variance obtainable from a linear combination of the variables. The major problem with this proposal is that even if no general factor underlies the variables, the size of the first eigenroot is necessarily still relatively large, as compared with other eigenroots. The first principal component derived from a matrix of randomly generated correlations is necessarily larger than the remaining components. The first principal component is therefore not a valid indicator of the presence of a general factor. The same applies to the first principal factor (computed in such a way as to estimate the communalities of the variables).

In view of these considerations, only the complete factor analysis (either exploratory or confirmatory, or both) of a set of variables should be used to judge the presence of a general factor. We now give tentative conceptual and operational definitions of a general factor.


I thought he could be right. EFA as well as CFA should be used, as they are complementary. But you note that Revelle seems to disagree (this practice was apparently used to find a general factor of personality, by Rushton & Irwing, and others) and prefer the use of the hierarchical "Omega" which is the method you use in your section 12, to evaluate the strength of g. Honestly, the paper "The general factor of personality: A general critique." (Revelle & Wilt 2013) you linked was not easy to read. The authors do not specify clearly the advantage of Omega, unless it's just me.

I would say a more meaningful summary of the advantage of Omega is the paragraph here (from the pdf package psych) :

The omega function uses exploratory factor analysis to estimate the omega_h coefficient. It is important to remember that “A recommendation that should be heeded, regardless of the method chosen to estimate ω_h, is to always examine the pattern of the estimated general factor loadings prior to estimating ω_h. Such an examination constitutes an informal test of the assumption that there is a latent variable common to all of the scale's indicators that can be conducted even in the context of EFA. If the loadings were salient for only a relatively small subset of the indicators, this would suggest that there is no true general factor underlying the covariance matrix. Just such an informal assumption test would have afforded a great deal of protection against the possibility of misinterpreting the misleading ω_h estimates occasionally produced in the simulations reported here." (Zinbarg et al., 2006, p 137).


From Revelle & Wilt (2013) table 1, it is said that the sets S1 and S5 have a pattern consistent with the existence of g, but not the sets S4 and S8. For S4 and S8, this conclusion is explicited by the fact that a portion of the correlations are just zero while some others are strong. Do you agree with this interpretation of g/no g ?

Furthermore, I would like to know if you have found the cause of the problem with the Promax rotation and its low loadings.

Also, you can, if you want, talk a little bit about the consequences of finding such general socioeconomic factor. The mail you sent me was fine, for example.

And finally :

in the Schmid-Leiman transformation and divide it by the number of variables (Λg(Λ/N)). This is the amount of variance accounted


Shouldn't it be Λg/N instead ? (see Revelle & Wilt 2013 p 496)

The strength of the S factor in the international data is quite similar to, perhaps a little stronger than, the g factor in the classic datasets, while the general factor of personality is clearly weaker.


You don't need to put a comma after "stronger than".

P.S. there is a problem with reference 23. The length of the link goes out of bound. I think you already know it, but is there is no possibility to correct for this ?
Admin
Hi Meng Hu.

Thank you for reviewing my paper.

I thought he could be right. EFA as well as CFA should be used, as they are complementary. But you note that Revelle seems to disagree (this practice was apparently used to find a general factor of personality, by Rushton & Irwing, and others) and prefer the use of the hierarchical "Omega" which is the method you use in your section 12, to evaluate the strength of g. Honestly, the paper "The general factor of personality: A general critique." (Revelle & Wilt 2013) you linked was not easy to read. The authors do not specify clearly the advantage of Omega, unless it's just me.


As I wrote before, it is an exploratory study. I didn't have a ready made model or alternative model to use CFA on to begin with.

Well, as you can see in the RW paper, they use many different methods. I used all the same methods, except the cluster one which I don't know how they used.

It is a hard paper yes, their methods aren't quite clear, which is why I wasn't more clear. I am simply following their lead and showing that their methods too confirm the strength of the general factor.

From Revelle & Wilt (2013) table 1, it is said that the sets S1 and S5 have a pattern consistent with the existence of g, but not the sets S4 and S8. For S4 and S8, this conclusion is explicited by the fact that a portion of the correlations are just zero while some others are strong. Do you agree with this interpretation of g/no g ?


Yes.

Shouldn't it be Λg/N instead ? (see Revelle & Wilt 2013 p 496)


The parentheses shows the eigenvalue and variable numbers from which the value was calculated. It is somewhat unclear, but how else to write it? It is common to put e.g. SD's in parentheses too.

You don't need to put a comma after "stronger than".


It seems right to me. http://en.wikipedia.org/wiki/Subordinate_clause#English_punctuation

P.S. there is a problem with reference 23. The length of the link goes out of bound. I think you already know it, but is there is no possibility to correct for this ?


I see. However, I don't think I can fix it. It is a bug in bibtex apparently. However, I note that if you click the link, it works fine. It is only visually broken thus only a minor problem.

--

I will add a new version shortly with some more stuff in the discussion.
I think I'm fine with the last version, it's just that I don't know why you have such low correlation for promax when you compute schmid transformation by hand. Have you contacted others and see what they could think of it ?
Admin
Revelle told me I was incompetent.

However, I noted that the schmid() function only uses the 1st order factors to get the general factor. When I did it manually, I extracted 2/3 2nd order factors to extract a general factor from. Perhaps this extra level causes some factor instability.

I searched for reasons to prefer promax over oblimin or reversely, but didn't find much. Most seemed to say that promax was developed because it was a more computationally effective method that gave similar results to oblimin. This effectiveness criteria has no relevance with today's computer for these analyses.

A friend is currently proofreading my next draft. I will attach it once it is done. The major change besides language fixes is that I added two more paragraphs to the conclusion. One about the nascent field of psychoinformatics, and one about whether we should care about the existence of an S factor.
Whatever the case, I will prefer obviously the schmid function incorporated in R. That's common sense. I just wanted to know why it behaves like this when you do it manually. Concerning promax/oblimin, my question was that some people did say they slightly prefer promax and I see promax more often in use than oblimin, although the latter is also very widely used. As you say, there is no logical reason that we should dismiss one or the other.

I will wait your final version.
Admin
This version has:
- Small language fixes
- Two new sections in the Discussion
So, the only modification is on the discussion section. I don't have anything to add, or to complain.

I, of course, will give my acceptance for its publication.

By the way, it's funny when I look at the last version, I am redirected to page 2, not page 1, as is usually the case. It's rare but sometimes this happens, when I open pdf articles. I never understood why.

One last remark (or request). Don't forget to publish the updated syntax for R. I am interested in everything related to R. I'm trying to move, from SPSS to R. But that's not easy. Recoding variable in R is impossible for me, even after spending hours on the web and finding examples. They are all inapplicable for my General Social Survey data. Then, I have to do the data preparation on SPSS, and the analysis on R. Ridiculous, indeed, but I don't have a better option yet.