Back to [Archive] Post-review discussions

1
[OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Admin
Moved this submission to the submission forum, now that we have a journal and a review team. -Emil


Since we don't have a sociology journal, I cannot really submit this paper.

However, I'd like to get some comments on my paper before publication.

----

Title:
Some new methods for exploratory factor analysis of socioeconomic data

Abstract:
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets.
A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were robust when using rank-ordered data.
The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.

Key words:
general socioeconomic factor, S factor, exploratory factor analysis, research methods, Switzerland, reliability, intelligence, cognitive ability, IQ

Files:
https://osf.io/3npj8/files/
Some potential improvements:

1. The introduction could be improved by better signaling the structure of the paper. The paper's title indicates that the paper will deal with new methods for exploratory factor analysis, but the reader is not clued in on whether the section on Jensen's method is the first such method or whether all of these new methods for exploratory factor analysis will be based on Jensen's method. After reading the paper, the structure is clearer, but it would help a reader to be more aware of the structure up front.

2. The paper ends abruptly after discussing one of the methods. The lack of a conclusion suggests that the presented methods share nothing more than the quality of being methods for exploratory factor analysis, but it might be worth considering whether there is a stronger theme that can tie the methods together.

3. On at least two occasions, the discussion assumes knowledge that a reader cannot be assumed to know at that point in the paper. One example is the "all 54/42 indicators" on page 13. For another example, Table 2 has an "S_noGe" variable, but there is no indication at that point in the paper what S_noGe represents; the table note indicates that the notation will be explained below, but, as far as I can tell, the reader must surmise on his or her own that S_noGe represents the S factor purged of the Geneva case. There does not appear to be any reason why the method that introduces the S_noGe concept cannot be discussed before the Jensen method; if so, that might improve the paper by starting with a simpler method and introducing the S_noGe notation before its appearance in the correlation table.

4. It might be worth discussing findings in more detail, to make the paper more accessible to readers unfamiliar with certain concepts. For example, comparison of Figures 4 and 5 indicates that the slopes of the two lines are similar, but the y-axes are substantially different: does this difference in y-axes indicate anything and, if so, if that difference important?

5. It might be worth discussing the different slope of lines in Figures 2 and 3 in terms of whether the negative loadings make sense theoretically. The problem for the Jensen method caused by researcher flexibility in coding appears to be a problem only in cases in which there is no clear theoretical direction for the coding, but the paper has no discussion of whether this is the case with the S x Catholic% analysis.

6. The robust factor analysis in section 4 indicates that the rank correlation method can be used to include outliers in an analysis without much affecting overall patterns, but there is no discussion about when -- if ever -- the benefits of including an outlier do not outweigh the cost of the loss of information from reducing data to ranks. For example, it is not clear why it is a good idea to code the large S difference between the top ranked and second ranked region the same as the much smaller S difference between the third and fourth ranked regions.

7. The paper could use a check for grammar. Instances include: "This is because that when", "we multiple factor loadings and scores by -1", "in line with results many other", "negative loadings of of", "negatively loadings", "data does" (if data are presumed plural), "if one choose", and the sentence about Figure 19.

8. The upside-down U-shape of the S and S-rank vs. Catholic% is quite interesting and hopefully receives more attention and explication in the future.
Admin
Good comments. I will work on a revision.
Admin
I have uploaded a new draft based on Zigerell's comments above.

Changes:
- Added more text to the introduction concerning the structure of the paper.
- Added discussion section.
- Added link to code and data repository.
- Added explanation of S_noGe.
- Added reference to the explanation of the "all 54/42 indicators" phrase.

4. It might be worth discussing findings in more detail, to make the paper more accessible to readers unfamiliar with certain concepts. For example, comparison of Figures 4 and 5 indicates that the slopes of the two lines are similar, but the y-axes are substantially different: does this difference in y-axes indicate anything and, if so, if that difference important?


The reason the y-axis is different is that when the indicators with negative loadings are reversed (multiplied by -1), so are their correlations with the criteria variable.

My general problem is that I don't have enough time or patience to write papers that always explain everything from the bottom up in all papers. I have tried to keep it understandable to people who have read at least one of the recent S factor papers and who are familiar with Jensen's work. After all, it is his idea, sort of: "the g nexus" (Jensen 1998).

5. It might be worth discussing the different slope of lines in Figures 2 and 3 in terms of whether the negative loadings make sense theoretically. The problem for the Jensen method caused by researcher flexibility in coding appears to be a problem only in cases in which there is no clear theoretical direction for the coding, but the paper has no discussion of whether this is the case with the S x Catholic% analysis.


Right, the problem for Jensen's method is only when there is no agreement on which way to code variables. One cannot use desirability because researchers will disagree (e.g. economic inequality, population density) and one can always try to argue desirability for a given variable that inflates the results.

For the Swiss dataset, there will be disagreement about whether to reverse fertility. Lower fertility could be seen as better (demographic transition; female liberty) or worse (dysgenics; replacement migration).


6. The robust factor analysis in section 4 indicates that the rank correlation method can be used to include outliers in an analysis without much affecting overall patterns, but there is no discussion about when -- if ever -- the benefits of including an outlier do not outweigh the cost of the loss of information from reducing data to ranks. For example, it is not clear why it is a good idea to code the large S difference between the top ranked and second ranked region the same as the much smaller S difference between the third and fourth ranked regions.


I can't offer a good answer to this because it depends on scientific intuition and unresolved methodological questions. As you say (indirectly), converting from interval to rank order data loses information about the distance between datapoints. Perhaps one mid-way would be to calculate the correlation matrix using ranks, but use the interval data for scoring the cases. I have not tried this yet.

I added this question to the discussion section.

7. The paper could use a check for grammar. Instances include: "This is because that when", "we multiple factor loadings and scores by -1", "in line with results many other", "negative loadings of of", "negatively loadings", "data does" (if data are presumed plural), "if one choose", and the sentence about Figure 19.


I normally check grammar as the last step before publication, not during the revise-reevaluate process.

I have fixed the ones you mentioned.

8. The upside-down U-shape of the S and S-rank vs. Catholic% is quite interesting and hopefully receives more attention and explication in the future.


If we can find some Swiss natives then we could do a new S study of Switzerland at the same level to see if the pattern is still there. It is odd, so it warrants further study. It could tell us something about religious tolerance in cities or something.
Thanks, Emil. These revisions address my comments. I think the current draft is satisfactory.
Admin
I found an error, namely that there was a claim that ranked factor analysis was used in a previous paper, but it wasn't actually used in that paper.
Admin
I have updated this paper to be more readable, making a number of smaller changes and fixing language errors.

I have asked Davide Piffer to review it because he is familiar with the factor analytic methodology discussed in the paper.
I attach the PDF with the comments. I also report the comments here but refer to the PDF file for the specific position of the comments. I also made some minor corrections (see PDF).
Introduction: "What is the general goal of this paper? You should state precisely the general goal in the introduction. Explain why you chose these particular themes (and not others) out of all possible factor analytic methods and add-ons. Explain which particular problems they tackle and the general problem too (unreliability of factor analysis?). You also have to explain if these methods apply generally to factor analysis or are restricted to factor analysis of socioeoconomic or aggregate datasets because it's not clear from the paper."
Section 4: "Which scatterplot are you referring to? Also, add references to back this negative claim."
Section 5: "You should also note that using correlated indicators inflates the factor loadings"
Section 5: "I am skeptical about this method, it seems post-hoc. Two indicators could be intercorrelated for a variety of reasons, not necessarily because they tap the same underlying construct. The selection of indicators should be conducted a priori, based on theoretical expectations."
Section 5: "What threshold do you propose?"
Discussion and conclusion: "Discussion should discuss the themes covered in the paper, not just those not presently dealt with."
Admin
Davide,

Thanks for reviewing. I will work on a revision.
Admin
Davide,

I have a new revision ready. Below is the last of changes made.

Regarding the introduction, I have added this paragraph:

The purpose of this paper is to review and introduce a number of new methods that were invented for the factor analysis of socioeconomic data. Some of these were presented in various earlier papers and some are new. The reason to have a single review paper of the methods is that otherwise the reader would be forced to read a number of other papers to learn about the respective methods. While the methods were invented for the purpose of studying socioeconomic data, they may be useful for other types of data as well.

Table 1 now has table borders.

For Section 2, added some citations about how other people have used Jensen's method:

Likewise, if a criterion variable is related related to the g factor, but to other parts of the variance, then the correlation should be negative. A large and growing number of studies have used this method to examine the relationship between the g factor and variables such as brain size (Rushton & Ankney, 2009), group differences (Frisby & Beaujean, 2015; Jensen, 1985; McDaniel & Kepes, 2014; te Nijenhuis, van den Hoek, & Armstrong, 2015), the Flynn effect (te Nijenhuis & van der Flier, 2013), test training/re-taking gains (te Nijenhuis, van Vianen, & van der Flier, 2007), and education related gains (te Nijenhuis, Jongeneel-Grimen, & Kirkegaard, 2014). For criticism of the method, see e.g. Ashton and Lee (2005).

Added cross-references to the Figures. I don't know how you want me to cite not having seen this pattern before. Do you want me to cite every paper were I did not see such a pattern in?

I have fixed the images which were on top of each other. The present PDF is a temporary PDF because the final version will be typeset in LaTeX.

Rewrote the broken sentence about group factors in Section 5.

I am skeptical about this method, it seems post-hoc. Two indicators could be intercorrelated for a variety of reasons, not necessarily because they tap the same underlying construct. The selection of indicators should be conducted a priori, based on theoretical expectations.


It's another theory-based vs. data-based decision making issue. In general, I prefer data-based decisions, whereas others prefer theory-based. In my use of the algorithm, the decisions have almost always been sensible:

http://openpsych.net/OQSPS/2016/05/inequality-across-us-counties-an-s-factor-analysis/
31 variables input, output:
At.Least.High.School.Diploma (reverse coded version of Less.Than.High.School, r=-.993)
Child.Poverty.living.in.families.below.the.poverty.line (redundant with Poverty.Rate.below.federal.poverty.threshold, r=.924)
Graduate.Degree (redundant with At.Least.Bachelor.s.Degree, r=.922).“

https://www.researchgate.net/publication/290430496_IQ_and_Socioeconomic_Variables_in_French_Departements_Reanalysis_and_New_Data
45 variables input, output:
Disabled.adults.allowance and Benefits.use.AAH r = .972
Upper.profession.pct and Percent.higher.degree r = .968
Income.women and Income.men r = .946
Circulatory.death and Cancer.death r = .927
Employment.rate.25.54 and Children.unemployed.parents r = -.920

A similar method is used here for genomic prediction: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006288 They used .95 cutoff.

What threshold do you propose?


Added:
Previous studies have used a threshold of .90 which seems to work well.

Discussion should discuss the themes covered in the paper, not just those not presently dealt with.


Added:
A number of the methods reviewed in this paper are still in the experimental stage and lack large simulation studies that back up their effectiveness. Still, the methods have been used in a number of analyses seemingly without major issues, which does give some confidence in them.

Rewrote the discussion slightly.

Edited out some of the rather excessive amount of citations regarding the GFP.

Added author affiliation + email.

---

Updates files on OSF.
Revision is OK but there are some standing issues.
I have doubts regarding this sentence: "This means that when a single factor is extracted, all the factor loadings are positive".
Is that so? If so, please provide references. Not necessarily all cognitive tests load positively. For example, reaction time can load negatively when used in a battery since longer RT is associated with lower g. Perhaps you're not referring to ALL cognitive tasks but only IQ tests.

"The purpose of this paper is to review and introduce a number of new methods that were invented for the factor analysis of socioeconomic data.Some of these were presented in various earlier papers and some are new."
This sentence is a bit sloppy. I'd write: The purpose of this paper is to review methods presented in earlier papers and to introduce new ones that were developed for the factor analysis of socioeconomic data".

"new methods that were invented for the factor analysis of socioeconomic data". I would use "developed" instead of invented.

"Some of these were presented in various earlier papers and some are new". Accordingly, delete this and specify which methods were presented in the earlier papers and the new ones.
Please provide references for each method.
Admin
Forgot to say. I read this and am working on a revision.
Admin
Revision is OK but there are some standing issues.
I have doubts regarding this sentence: "This means that when a single factor is extracted, all the factor loadings are positive".
Is that so? If so, please provide references. Not necessarily all cognitive tests load positively. For example, reaction time can load negatively when used in a battery since longer RT is associated with lower g. Perhaps you're not referring to ALL cognitive tasks but only IQ tests.


Often they reverse the reaction time data, so to preserve the positive manifold. This is easy to do because cognitive test items by definition (Jensen's) must have an agreed upon better performance direction. For reaction time tests, shorter = better, so they are sometimes reversed. I checked a few papers on Google and most of them did not reverse the tests. I have changed the wording a bit and added a footnote explaining this exception.

"The purpose of this paper is to review and introduce a number of new methods that were invented for the factor analysis of socioeconomic data.Some of these were presented in various earlier papers and some are new."
This sentence is a bit sloppy. I'd write: The purpose of this paper is to review methods presented in earlier papers and to introduce new ones that were developed for the factor analysis of socioeconomic data".


Fair enough.

"new methods that were invented for the factor analysis of socioeconomic data". I would use "developed" instead of invented.


Yes, that's better.

"Some of these were presented in various earlier papers and some are new". Accordingly, delete this and specify which methods were presented in the earlier papers and the new ones.
Please provide references for each method.


For each section, I have inserted a sentence in parentheses in the beginning if the method has been used in a previous paper.

--

Updated files on OSF.
https://osf.io/3npj8/files/
The paper has been considerably improved. I approve publication.
Admin
Good paper, overall. Some comments/suggestions:

1. Full stop needed on p. 2: "the factor analysis of socioeconomic data Some of these were presented..."

2. Word repeated on p. 3: "related related to the g factor, but to other parts of the variance..."

3. "One problem with this is that the Jensen coefficient (the resulting correlation from applying the method) is sensitive to whether there are variables with negative loadings or not [...] A simple solution is to recode the variables such that higher values correspond to desirable outcomes. This works well in many cases, but not all." (p. 3).

Couldn't one, as a matter of methodology, just take the absolute value (modulus) of the loading, rather than bothering to recode variables?

4. "Such patterns are often seen for cases that consist mostly of one large city (Kirkegaard, 2015e)" (p. 7).

Perhaps cite Carl (2015) here as well, given that it was noted in his original analysis of UK regions that London qualified as such an outlier:

"The correlations were estimated with and without London because in a number of cases, particularly the associations with log weekly earnings and log gross value added (GVA) per capita, London was a clear outlier. This should not be surprising given that London is a large capital city, whereas all the other regions encompass both urban and rural areas."

5. "In the meanwhile, we might try". (p. 13).

In English, one says either "In the meantime..." or "Meanwhile...", not "In the meanwhile".

6. Unnecessary "from" on p. 17: "it does not matter whether scores are derived from using unit weights".

7. Please justify the text.

8. In Section 3, please include a bit more discussion (e.g., a couple of sentences) of which of the proposed methods is to be preferred, particularly in relation to the analysis you perform. I have to say, the second metric––"change in factor size"––seems to me to be the most intuitive. 

9. Also in Section 3, might one consider some kind of cluster analysis (e.g., based on Euclidean distance) to identify outlying cases?
Admin
Thanks for the review. I will work on a revision.
Admin
Noah,


1. Full stop needed on p. 2: "the factor analysis of socioeconomic data Some of these were presented..."



Fixed.

2. Word repeated on p. 3: "related related to the g factor, but to other parts of the variance..."


Fixed.

3. "One problem with this is that the Jensen coefficient (the resulting correlation from applying the method) is sensitive to whether there are variables with negative loadings or not [...] A simple solution is to recode the variables such that higher values correspond to desirable outcomes. This works well in many cases, but not all." (p. 3).

Couldn't one, as a matter of methodology, just take the absolute value (modulus) of the loading, rather than bothering to recode variables?


This is what is done. But recall that one also has to reverse the indicators if one reverses the loading. Otherwise, the direction of association changes.

4. "Such patterns are often seen for cases that consist mostly of one large city (Kirkegaard, 2015e)" (p. 7).

Perhaps cite Carl (2015) here as well, given that it was noted in his original analysis of UK regions that London qualified as such an outlier:

"The correlations were estimated with and without London because in a number of cases, particularly the associations with log weekly earnings and log gross value added (GVA) per capita, London was a clear outlier. This should not be surprising given that London is a large capital city, whereas all the other regions encompass both urban and rural areas."


Yes, appropriate to cite the primary study too. Done.

5. "In the meanwhile, we might try". (p. 13).

In English, one says either "In the meantime..." or "Meanwhile...", not "In the meanwhile".


Fixed.

6. Unnecessary "from" on p. 17: "it does not matter whether scores are derived from using unit weights".


Fixed.

7. Please justify the text.


I hate that. No thanks. But Julius will typeset/prettify the paper.

8. In Section 3, please include a bit more discussion (e.g., a couple of sentences) of which of the proposed methods is to be preferred, particularly in relation to the analysis you perform. I have to say, the second metric––"change in factor size"––seems to me to be the most intuitive.


I have added a new subsection, 3.3:

Conceptually, the metrics are somewhat distinct, so depending on the goal of the analysis, a particular metric may be the best suited for the task. But if the goal is to identify structural outliers in general, it's not clear which one is to be preferred. My current practice is using all of them and then comparing their results. Sometimes, one indicator may give divergent results and so one has to pay extra attention to see if one can figure out why. A general approach is to factor analyze the indicators to get a single structural outlierness score for each case. When doing this, it's important to choose only one of the variants of a given metric. E.g. do not use both mean and maximum absolute loading change, pick one of them.

Lastly, a nice feature of the mean absolute residuals method is that it allows one to see which indicators cause a given case to be a structural outlier. The other methods do not allow for this possibility.


9. Also in Section 3, might one consider some kind of cluster analysis (e.g., based on Euclidean distance) to identify outlying cases?


Cluster analysis and other classification methods try to classify cases into groups. I don't see how one could use this for measuring structural outlierliness as a continuous variable.

However, it did give me an idea. It's possible to use cluster analysis on the residuals for each case (those computed as part of the MAR method). This would allow one to see if there are clusters in the way cases fail to be predicted by the summary score(s). This could be interesting, but it's not something I've tried yet.

--

Files updated.
Admin
I approve the paper for publication.