[OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
2015-Sep-07, 10:01:30, (This post was last modified: 2015-Nov-02, 16:55:59 by Emil.)
#1
[OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Moved this submission to the submission forum, now that we have a journal and a review team. -Emil


Since we don't have a sociology journal, I cannot really submit this paper.

However, I'd like to get some comments on my paper before publication.

----

Title:
Some new methods for exploratory factor analysis of socioeconomic data

Abstract:
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets.
A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were robust when using rank-ordered data.
The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.

Key words:
general socioeconomic factor, S factor, exploratory factor analysis, research methods, Switzerland, reliability, intelligence, cognitive ability, IQ

Files:
https://osf.io/3npj8/files/
Reply
2015-Nov-04, 13:13:20,
#2
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Some potential improvements:

1. The introduction could be improved by better signaling the structure of the paper. The paper's title indicates that the paper will deal with new methods for exploratory factor analysis, but the reader is not clued in on whether the section on Jensen's method is the first such method or whether all of these new methods for exploratory factor analysis will be based on Jensen's method. After reading the paper, the structure is clearer, but it would help a reader to be more aware of the structure up front.

2. The paper ends abruptly after discussing one of the methods. The lack of a conclusion suggests that the presented methods share nothing more than the quality of being methods for exploratory factor analysis, but it might be worth considering whether there is a stronger theme that can tie the methods together.

3. On at least two occasions, the discussion assumes knowledge that a reader cannot be assumed to know at that point in the paper. One example is the "all 54/42 indicators" on page 13. For another example, Table 2 has an "S_noGe" variable, but there is no indication at that point in the paper what S_noGe represents; the table note indicates that the notation will be explained below, but, as far as I can tell, the reader must surmise on his or her own that S_noGe represents the S factor purged of the Geneva case. There does not appear to be any reason why the method that introduces the S_noGe concept cannot be discussed before the Jensen method; if so, that might improve the paper by starting with a simpler method and introducing the S_noGe notation before its appearance in the correlation table.

4. It might be worth discussing findings in more detail, to make the paper more accessible to readers unfamiliar with certain concepts. For example, comparison of Figures 4 and 5 indicates that the slopes of the two lines are similar, but the y-axes are substantially different: does this difference in y-axes indicate anything and, if so, if that difference important?

5. It might be worth discussing the different slope of lines in Figures 2 and 3 in terms of whether the negative loadings make sense theoretically. The problem for the Jensen method caused by researcher flexibility in coding appears to be a problem only in cases in which there is no clear theoretical direction for the coding, but the paper has no discussion of whether this is the case with the S x Catholic% analysis.

6. The robust factor analysis in section 4 indicates that the rank correlation method can be used to include outliers in an analysis without much affecting overall patterns, but there is no discussion about when -- if ever -- the benefits of including an outlier do not outweigh the cost of the loss of information from reducing data to ranks. For example, it is not clear why it is a good idea to code the large S difference between the top ranked and second ranked region the same as the much smaller S difference between the third and fourth ranked regions.

7. The paper could use a check for grammar. Instances include: "This is because that when", "we multiple factor loadings and scores by -1", "in line with results many other", "negative loadings of of", "negatively loadings", "data does" (if data are presumed plural), "if one choose", and the sentence about Figure 19.

8. The upside-down U-shape of the S and S-rank vs. Catholic% is quite interesting and hopefully receives more attention and explication in the future.
Reply
2015-Nov-04, 21:41:06,
#3
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Good comments. I will work on a revision.
Reply
2015-Nov-12, 00:22:06,
#4
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
I have uploaded a new draft based on Zigerell's comments above.

Changes:
- Added more text to the introduction concerning the structure of the paper.
- Added discussion section.
- Added link to code and data repository.
- Added explanation of S_noGe.
- Added reference to the explanation of the "all 54/42 indicators" phrase.

Zigerell Wrote:4. It might be worth discussing findings in more detail, to make the paper more accessible to readers unfamiliar with certain concepts. For example, comparison of Figures 4 and 5 indicates that the slopes of the two lines are similar, but the y-axes are substantially different: does this difference in y-axes indicate anything and, if so, if that difference important?

The reason the y-axis is different is that when the indicators with negative loadings are reversed (multiplied by -1), so are their correlations with the criteria variable.

My general problem is that I don't have enough time or patience to write papers that always explain everything from the bottom up in all papers. I have tried to keep it understandable to people who have read at least one of the recent S factor papers and who are familiar with Jensen's work. After all, it is his idea, sort of: "the g nexus" (Jensen 1998).

Zigerell Wrote:5. It might be worth discussing the different slope of lines in Figures 2 and 3 in terms of whether the negative loadings make sense theoretically. The problem for the Jensen method caused by researcher flexibility in coding appears to be a problem only in cases in which there is no clear theoretical direction for the coding, but the paper has no discussion of whether this is the case with the S x Catholic% analysis.

Right, the problem for Jensen's method is only when there is no agreement on which way to code variables. One cannot use desirability because researchers will disagree (e.g. economic inequality, population density) and one can always try to argue desirability for a given variable that inflates the results.

For the Swiss dataset, there will be disagreement about whether to reverse fertility. Lower fertility could be seen as better (demographic transition; female liberty) or worse (dysgenics; replacement migration).


Zigerell Wrote:6. The robust factor analysis in section 4 indicates that the rank correlation method can be used to include outliers in an analysis without much affecting overall patterns, but there is no discussion about when -- if ever -- the benefits of including an outlier do not outweigh the cost of the loss of information from reducing data to ranks. For example, it is not clear why it is a good idea to code the large S difference between the top ranked and second ranked region the same as the much smaller S difference between the third and fourth ranked regions.

I can't offer a good answer to this because it depends on scientific intuition and unresolved methodological questions. As you say (indirectly), converting from interval to rank order data loses information about the distance between datapoints. Perhaps one mid-way would be to calculate the correlation matrix using ranks, but use the interval data for scoring the cases. I have not tried this yet.

I added this question to the discussion section.

Zigerell Wrote:7. The paper could use a check for grammar. Instances include: "This is because that when", "we multiple factor loadings and scores by -1", "in line with results many other", "negative loadings of of", "negatively loadings", "data does" (if data are presumed plural), "if one choose", and the sentence about Figure 19.

I normally check grammar as the last step before publication, not during the revise-reevaluate process.

I have fixed the ones you mentioned.

Zigerell Wrote:8. The upside-down U-shape of the S and S-rank vs. Catholic% is quite interesting and hopefully receives more attention and explication in the future.

If we can find some Swiss natives then we could do a new S study of Switzerland at the same level to see if the pattern is still there. It is odd, so it warrants further study. It could tell us something about religious tolerance in cities or something.
Reply
2015-Nov-15, 03:36:44,
#5
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Thanks, Emil. These revisions address my comments. I think the current draft is satisfactory.
Reply
2015-Nov-27, 21:13:12,
#6
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
I found an error, namely that there was a claim that ranked factor analysis was used in a previous paper, but it wasn't actually used in that paper.
Reply
2016-Sep-04, 06:08:38,
#7
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
I have updated this paper to be more readable, making a number of smaller changes and fixing language errors.

I have asked Davide Piffer to review it because he is familiar with the factor analytic methodology discussed in the paper.
Reply
2016-Sep-14, 11:43:06,
#8
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
I attach the PDF with the comments. I also report the comments here but refer to the PDF file for the specific position of the comments. I also made some minor corrections (see PDF).
Introduction: "What is the general goal of this paper? You should state precisely the general goal in the introduction. Explain why you chose these particular themes (and not others) out of all possible factor analytic methods and add-ons. Explain which particular problems they tackle and the general problem too (unreliability of factor analysis?). You also have to explain if these methods apply generally to factor analysis or are restricted to factor analysis of socioeoconomic or aggregate datasets because it's not clear from the paper."
Section 4: "Which scatterplot are you referring to? Also, add references to back this negative claim."
Section 5: "You should also note that using correlated indicators inflates the factor loadings"
Section 5: "I am skeptical about this method, it seems post-hoc. Two indicators could be intercorrelated for a variety of reasons, not necessarily because they tap the same underlying construct. The selection of indicators should be conducted a priori, based on theoretical expectations."
Section 5: "What threshold do you propose?"
Discussion and conclusion: "Discussion should discuss the themes covered in the paper, not just those not presently dealt with."


Attached Files
.pdf   Kirkpaper_review1.pdf (Size: 2.63 MB / Downloads: 68)
Reply
2016-Sep-14, 12:45:47,
#9
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Davide,

Thanks for reviewing. I will work on a revision.
Reply
2016-Sep-15, 21:10:28, (This post was last modified: 2016-Sep-15, 21:53:18 by Emil.)
#10
RE: [OQSPS] Some new methods for exploratory factor analysis of socioeconomic d
Davide,

I have a new revision ready. Below is the last of changes made.

Regarding the introduction, I have added this paragraph:

The purpose of this paper is to review and introduce a number of new methods that were invented for the factor analysis of socioeconomic data. Some of these were presented in various earlier papers and some are new. The reason to have a single review paper of the methods is that otherwise the reader would be forced to read a number of other papers to learn about the respective methods. While the methods were invented for the purpose of studying socioeconomic data, they may be useful for other types of data as well.

Table 1 now has table borders.

For Section 2, added some citations about how other people have used Jensen's method:

Likewise, if a criterion variable is related related to the g factor, but to other parts of the variance, then the correlation should be negative. A large and growing number of studies have used this method to examine the relationship between the g factor and variables such as brain size (Rushton & Ankney, 2009), group differences (Frisby & Beaujean, 2015; Jensen, 1985; McDaniel & Kepes, 2014; te Nijenhuis, van den Hoek, & Armstrong, 2015), the Flynn effect (te Nijenhuis & van der Flier, 2013), test training/re-taking gains (te Nijenhuis, van Vianen, & van der Flier, 2007), and education related gains (te Nijenhuis, Jongeneel-Grimen, & Kirkegaard, 2014). For criticism of the method, see e.g. Ashton and Lee (2005).

Added cross-references to the Figures. I don't know how you want me to cite not having seen this pattern before. Do you want me to cite every paper were I did not see such a pattern in?

I have fixed the images which were on top of each other. The present PDF is a temporary PDF because the final version will be typeset in LaTeX.

Rewrote the broken sentence about group factors in Section 5.

Davide Wrote:I am skeptical about this method, it seems post-hoc. Two indicators could be intercorrelated for a variety of reasons, not necessarily because they tap the same underlying construct. The selection of indicators should be conducted a priori, based on theoretical expectations.

It's another theory-based vs. data-based decision making issue. In general, I prefer data-based decisions, whereas others prefer theory-based. In my use of the algorithm, the decisions have almost always been sensible:

http://openpsych.net/OQSPS/2016/05/inequ...-analysis/
31 variables input, output:
At.Least.High.School.Diploma (reverse coded version of Less.Than.High.School, r=-.993)
Child.Poverty.living.in.families.below.the.poverty.line (redundant with Poverty.Rate.below.federal.poverty.threshold, r=.924)
Graduate.Degree (redundant with At.Least.Bachelor.s.Degree, r=.922).“

https://www.researchgate.net/publication...d_New_Data
45 variables input, output:
Disabled.adults.allowance and Benefits.use.AAH r = .972
Upper.profession.pct and Percent.higher.degree r = .968
Income.women and Income.men r = .946
Circulatory.death and Cancer.death r = .927
Employment.rate.25.54 and Children.unemployed.parents r = -.920

A similar method is used here for genomic prediction: http://journals.plos.org/plosgenetics/ar...en.1006288 They used .95 cutoff.

Quote:What threshold do you propose?

Added:
Previous studies have used a threshold of .90 which seems to work well.

Quote:Discussion should discuss the themes covered in the paper, not just those not presently dealt with.

Added:
A number of the methods reviewed in this paper are still in the experimental stage and lack large simulation studies that back up their effectiveness. Still, the methods have been used in a number of analyses seemingly without major issues, which does give some confidence in them.

Rewrote the discussion slightly.

Edited out some of the rather excessive amount of citations regarding the GFP.

Added author affiliation + email.

---

Updates files on OSF.
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)