[OQSPS] Inequality across US counties: an S factor analysis
2016-Apr-04, 07:01:31, (This post was last modified: 2016-Apr-04, 07:03:46 by Emil.)
#1
[OQSPS] Inequality across US counties: an S factor analysis
Journal:
Open Quantitative Sociology and Political Science.

Authors:
Emil O. W. Kirkegaard

Title:
Inequality across US counties: an S factor analysis

Abstract:
A dataset of socioeconomic, demographic and geographic data for US counties (N≈3,100) was created by merging data from several sources. A suitable subset of 31 socioeconomic indicators were chosen for analysis of which 3 were excluded for being redundant with other variables. Factor analysis revealed a clear general socioeconomic factor (S factor) which was stable across extraction methods and different samples of indicators (absolute split-half sampling reliability = .85).

Self-identified race/ethnicity (SIRE) population percentages were strongly but non-linearly related to S. In general, the effect of White% and Asian% were positive while those for Black%, Hispanic% and Amerindian% were negative, while the effect was unclear for Other/mixed%. The best model consisted of White%, Black%, Asian% and Amerindian% and explained about 50% of the variance in S among counties.

SIRE homogeneity had a non-linear relationship to S both with and without taking into account the effects of SIRE variables. Overall, the effect was slightly negative due to low S, high White% areas.

An analysis of the SIRE composition of the top 100 counties showed that Whites and Asians were overrepresented (73.3% and 8.8% in top 100 vs. 64.8% and 4.5% in the total population for Whites and Asians, respectively). Then, a prediction about the expected proportions based on a cognitive meritocratic model was made and compared to the real numbers. The results showed that Blacks and Hispanics were overrepresented by large amounts (53% and 36%, respectively) while Whites and Amerindians were underrepresented (-11% and -16%, respectively). Some possible explanations of this pattern were offered.
Geospatial (latitude and longitude, elevation) and climatological (temperature, precipitation) predictors were tested in models. In linear regression, they added little to the variance explained (delta adjusted R2 = .05). However, there was evidence of non-linear relationships. When a model was fitted that allowed for non-linear effects of the environmental predictors, they were able to add a moderate amount of validity (delta adjusted R2 = .1). LASSO regression, however, suggested that much of this predictive validity was due to overfitting.

Spatial patterns in the data were examined using multiple methods, all of which indicated strong spatial autocorrelation for S and SIRE (k nearest spatial neighbor regression correlations [KNSNR] of .69 to
.88). Model residuals were also spatially autocorrelated and for this reason the model was re-fit controlling for spatial autocorrelation using KNSNR-based S residuals and spatial local regression. The results indicated that the effects of SIREs were not due to spatially autocorrelated confounds except possibly for Black% which was about 40% weaker after the controls.

Pseudo-multilevel analyses of both the factor structure of S and the SIRE predictive model showed results consistent with the main analyses. Specifically, the factor structure was similar across levels of analysis (states and counties) and within states.

Key words:
general socioeconomic factor, S factor, Japan, prefectures, inequality, intelligence, IQ, cognitive ability, cognitive sociology

Length:
26 pages, 8288 words, 51000 characters (excluding references).

Files:
https://osf.io/cknjr/

External reviewers:
I have no particular recommendations for external reviewers. I am open to suggestions.

Known issues:
- The tables lack borders and are not centered. I have not done so yet because I want to go over the numbers once more time. This means possibly repasting all the tables, wasting my time if I finalize their form now. They should be readable enough for peer review as they are.
Reply
2016-Apr-06, 23:27:31,
#2
RE: [OQSPS] Inequality across US counties: an S factor analysis
The paper "Inequality across US counties: an S factor analysis" is competently done, but I have a few notes and comments in the attached file and below, that might help improve the paper.

1. The version of the paper that I reviewed did not have an introduction, so the introduction for the final paper will hopefully indicate what past research in this area has found and discuss the importance of and/or the need for the present research.

2. The Discussion and Conclusion might have more value if that section highlighted what was learned from the new analyses and/or what previous findings were corroborated. The section mentions results from the paper, but a reader unfamiliar with the literature might not appreciate what about the paper was new.

3. The paper is dense in terms of data presentation. It might be worth considering focusing the main paper on fewer figures and/or tables, and placing some figures and/or tables in an appendix, if these figures and tables are not necessary to understand the main points of the paper (e.g., Table 11 perhaps).

4. The Discussion and Conclusion notes that high-white-percentage counties tend to have lower S scores but that this effect is probably not causal and reflects only the history of these areas; this history explanation is used to justify model specification. But if this "only history" explanation can be said about other correlations in the study, then the paper might benefit from a more thorough discussion of inference issues.


Attached Files
.pdf   Inequality across US counties.pdf (Size: 992.19 KB / Downloads: 65)
Reply
2016-Apr-07, 00:20:25,
#3
RE: [OQSPS] Inequality across US counties: an S factor analysis
LJ,

Thank you for your review. The version submitted here and on OSF does have an introduction and has several improvements over the earlier version you read. However, most were stylistic.

However, it turns out that RCA found a way to estimate cognitive ability for the counties using school district data. This data was not available to me but should be included. This requires that the entire paper is redone with the cognitive ability variable as well. See details on Twitter:

The data:
https://twitter.com/RCAFDM/status/717736004317286401

Cognitive ability x S:
https://twitter.com/RCAFDM/status/717797530554204160

Note that before he posted it, I predicted an r between .60 and .70. He found .69 with weights. :)

This information is pertinent to the current paper. The cognitive sociology model holds that SIRE x S is mediated almost entirely by cognitive ability. This can now be tested.
Reply
2016-Apr-07, 21:59:34,
#4
RE: [OQSPS] Inequality across US counties: an S factor analysis
Apparently, I had forgotten to upload the latest version (with the introduction etc.). Sorry about that. It is uploaded now.
Reply
2016-Apr-09, 21:30:41, (This post was last modified: 2016-May-04, 03:11:42 by Peter Frost.)
#5
RE: [OQSPS] Inequality across US counties: an S factor analysis
Emil,

This is a tricky analysis to do because you're trying to film a moving object. If a U.S. county does well socially and economically, it will experience more job growth and become, in general, a more attractive place to live. As a result, there will be an influx of people who have trouble finding work elsewhere, typically in services and construction. So the current demographic makeup of a county reflects its history of job growth, which ultimately reflects its past demographic makeup. These two variables -- current demographic makeup and past demographic makeup -- can be very different.
Reply
2016-May-02, 16:47:38, (This post was last modified: 2016-May-02, 16:48:11 by Emil.)
#6
Revision of 2nd May
LJ,

I have finished the new revision. This adds a bunch of new analyses using the cognitive ability data, as well as new sections dealing with mediation and Jensen's method. This also increased the length to 50 pages, including references.

I have incorporated the grammatical fixes you suggested.

Quote:2. The Discussion and Conclusion might have more value if that section highlighted what was learned from the new analyses and/or what previous findings were corroborated. The section mentions results from the paper, but a reader unfamiliar with the literature might not appreciate what about the paper was new.

3. The paper is dense in terms of data presentation. It might be worth considering focusing the main paper on fewer figures and/or tables, and placing some figures and/or tables in an appendix, if these figures and tables are not necessary to understand the main points of the paper (e.g., Table 11 perhaps).

I dislike it when papers try to do this. In my opinion, they usually do it too soon, e.g. after the first small study. In this case, we are working with a very large dataset but the data are all correlational and measured at approximately the same time. This necessarily makes it somewhat difficult to draw causal conclusions.

I prefer the presentation style where the evidence is generally just presented and the reader can then make up his own mind with regards to how significant the findings are. Show, not tell.

Quote:4. The Discussion and Conclusion notes that high-white-percentage counties tend to have lower S scores but that this effect is probably not causal and reflects only the history of these areas; this history explanation is used to justify model specification. But if this "only history" explanation can be said about other correlations in the study, then the paper might benefit from a more thorough discussion of inference issues.

It was not of particular interest to my investigations. However, I had in mind the models presented in Albion's Seed. See e.g. http://slatestarcodex.com/2016/04/27/boo...ions-seed/

--

Peter,

Quote:This is a tricky analysis to do because you're trying to film a moving object. If a U.S. country does well socially and economically, it will experience more job growth and become, in general, a more attractive place to live. As a result, there will be an influx of people who have trouble finding work elsewhere, typically in services and construction. So the current demographic makeup of a county reflects its history of job growth, which ultimately reflects its past demographic makeup. These two variables -- current demographic makeup and past demographic makeup -- can be very different.

You are right: the demographics of counties is constantly changing in response to the social conditions in the same counties. Research wise, this is good because it makes it possible to do cross-lagged longitudinal studies. Suppose that we propose that Asian Americans have a positive influence on S, then, we can check if counties that increased its share of Asian Americans over a timespan (e.g. 2000 to 2010) also increased its S score.

It would be possible to take into account the past demographics of a county to see whether that predicts the future S of a county regardless of the future demographics, a kind of enduring demographic effect.

These type of analyses are, however, not possible to do with the current dataset because it lacks longitudinal S data. Census data has SIRE data for 2000 and 2010, so one could use these years. It may be possible to find cognitive ability data as well. The school-district dataset spans the years 2004 to 2009, which may be a sufficiently long window to do this kind of research, but unfortunately, it does not match up with the census years. The S data is only available for 2009 and 2010, which gives only a 1 year window. My guess is that even if one could obtain SIRE data for 2009 and 2010, a window of only 1 year would probably mean that there is too much noise to detect real effects.

--

Revision #5
https://osf.io/btnx5/
Reply
2016-May-03, 22:53:55, (This post was last modified: 2016-May-04, 16:05:16 by NoahCarl.)
#7
RE: [OQSPS] Inequality across US counties: an S factor analysis
Interesting and very thorough paper. I have some minor points and suggestions:

1. I would recommend slightly more aesthetically-minded formatting (e.g., justifying text, centering table columns etc.), but I leave this at Emil's discretion.

2. The Abstract is perhaps slightly too long. It reads more like an abridged introductory section. But again, I leave this at Emil's discretion.

3. The y-axis on Figure 4 says "S", but it should say "CA". In fact, Figure 4 appears to be identical to Figure 5.

4. Why not include a scatterplot showing the relationship between cognitive ability and the S-factor across counties?

5. At the beginning of Section 5:

Emil Wrote:Since the top 100 counties have a mean S of 1.43 and S has a correlation to CA of perhaps .60 at the individual level (Strenze, 2007)8, the top 100 counties group should have a mean CA score of 1.43 * .6= .86 Z, or about 113 IQ."


Unless I am mistaken, this assumes that there are no spillover effects of cognitive ability on the S-factor. Jones (2015; 'Hive Mind') argues that there are such spillover effects (e.g., high IQ people are more co-operative, and so are more willing to fund public goods; high IQ people vote for more market-oriented policies, which leads to higher incomes). The assumption of no spillover effects should be noted in the text.

6. In Section 8, it might be interesting to estimate state fixed-effects models, i.e., multiple OLS models of the form:

county_s-factor_score = county_cognitive_ability + county_race_variables + state_dummies
Reply
2016-May-04, 18:34:16, (This post was last modified: 2016-May-04, 18:44:28 by Emil.)
#8
Reply to Noah #7
Noah,

Noah Carl Wrote:1. I would recommend slightly more aesthetically-minded formatting (e.g., justifying text, centering table columns etc.), but I leave this at Emil's discretion.

When you say "centering table columns", do you mean like this?

   

The second table has centered text, the first uses whatever the default setting was.

Noah Carl Wrote:2. The Abstract is perhaps slightly too long. It reads more like an abridged introductory section. But again, I leave this at Emil's discretion.

It is my understanding that the point of an abstract is to summarize the findings. To do this, one needs to summarize the main findings. A paper that contains many analyses thus necessitates a longer abstract.

Personally, I often re-read abstracts of papers because I forgot what the main findings of the paper were. When papers do not present these in the abstract, I have to skim the actual paper. Sometimes, I just need a single number. I am trying to avoid giving others this problem by actually presenting the main results in the abstract.

Noah Carl Wrote:3. The y-axis on Figure 4 says "S", but it should say "CA". In fact, Figure 4 appears to be identical to Figure 5.

You are right. It was the wrong figure. I have put the right one there now.

Noah Carl Wrote:Unless I am mistaken, this assumes that there are no spillover effects of cognitive ability on the S-factor. Jones (2015; 'Hive Mind') argues that there are such spillover effects (e.g., high IQ people are more co-operative, and so are more willing to fund public goods; high IQ people vote for more market-oriented policies, which leads to higher incomes). The assumption of no spillover effects should be noted in the text.

It assumes a lot of things, both parameter values and causal relationships. In particular, it assumes that positive-feedback aggregation effects are not present (as in Hive Mind, but I haven't read it). I think this is what you call spillover effects. In other words, it assumes that aggregate-level traits are a simple composition of the individual-level effects or traits.

To note, this section was written before I had cognitive ability data. Now that I have this, I checked that this assumption roughly holds. In other words, what is the mean cognitive ability among the top 100 S counties? It turns out, it is 1.36! Only slightly lower than the S which is 1.43. So this would imply a correlation of almost .95 at the individual-level, which is clearly untenable.

I have removed this section from the paper (and the abstract).

Noah Carl Wrote:6. In Section 8, it might be interesting to estimate state fixed-effects models, i.e., multiple OLS models of the form:

county_s-factor_score = county_cognitive_ability + county_race_variables + state_dummies

I ran the model. Output:

Code:
> lm("S ~ CA + White + Black + Asian + Amerindian + State", data = d_main, weight = Total.Population) %>%
+ MOD_summary(kfold = F)
$coefs
                       Beta   SE CI.lower CI.upper
CA                     0.67 0.02     0.64     0.70
White                  0.06 0.02     0.03     0.09
Black                 -0.13 0.02    -0.17    -0.10
Asian                  0.11 0.01     0.10     0.12
Amerindian            -0.12 0.02    -0.16    -0.08
State: Alaska          0.67 0.18     0.32     1.02
State: Arizona         0.33 0.11     0.10     0.55
State: Arkansas       -0.41 0.10    -0.61    -0.22
State: California      0.47 0.08     0.32     0.62
State: Colorado        0.24 0.09     0.07     0.42
State: Connecticut     0.14 0.10    -0.05     0.33
State: Delaware        0.15 0.15    -0.16     0.45
State: Florida        -0.22 0.07    -0.36    -0.08
State: Georgia         0.05 0.08    -0.10     0.20
State: Idaho           0.09 0.13    -0.15     0.34
State: Illinois        0.09 0.07    -0.05     0.24
State: Indiana        -0.35 0.08    -0.52    -0.19
State: Iowa            0.34 0.10     0.15     0.54
State: Kansas         -0.06 0.10    -0.26     0.14
State: Kentucky       -0.75 0.09    -0.92    -0.57
State: Louisiana       0.00 0.09    -0.17     0.18
State: Maine           0.07 0.13    -0.19     0.33
State: Maryland        0.17 0.09     0.01     0.34
State: Massachusetts  -0.15 0.09    -0.32     0.01
State: Michigan       -0.04 0.07    -0.19     0.11
State: Minnesota       0.27 0.09     0.10     0.44
State: Mississippi    -0.25 0.10    -0.44    -0.05
State: Missouri       -0.27 0.08    -0.43    -0.11
State: Montana        -0.21 0.15    -0.51     0.08
State: Nebraska        0.17 0.12    -0.06     0.40
State: Nevada          0.26 0.11     0.05     0.47
State: New Hampshire   0.26 0.13     0.00     0.52
State: New Jersey     -0.21 0.08    -0.38    -0.05
State: New Mexico      0.54 0.12     0.30     0.78
State: New York       -0.17 0.07    -0.31    -0.03
State: North Carolina -0.27 0.08    -0.42    -0.12
State: North Dakota    0.06 0.18    -0.29     0.40
State: Ohio           -0.52 0.08    -0.66    -0.37
State: Oklahoma       -0.17 0.10    -0.35     0.02
State: Oregon          0.20 0.09     0.02     0.39
State: Pennsylvania   -0.26 0.07    -0.41    -0.12
State: Rhode Island    0.35 0.14     0.07     0.64
State: South Carolina -0.10 0.09    -0.27     0.08
State: South Dakota   -0.01 0.16    -0.33     0.31
State: Tennessee      -0.10 0.08    -0.26     0.06
State: Texas          -0.44 0.07    -0.58    -0.29
State: Utah            0.79 0.10     0.58     0.99
State: Vermont         0.14 0.18    -0.21     0.49
State: Virginia        0.15 0.08     0.00     0.31
State: Washington      0.04 0.08    -0.13     0.20
State: West Virginia  -0.15 0.12    -0.38     0.08
State: Wisconsin       0.18 0.08     0.02     0.35
State: Wyoming         0.10 0.19    -0.28     0.47

$meta
      N      R2 R2 adj.
3086.00    0.78    0.78

One could also add the environmental variables. Output:

Code:
$coefs
                       Beta   SE CI.lower CI.upper
CA                     0.68 0.02     0.65     0.72
White                  0.06 0.02     0.02     0.09
Black                 -0.14 0.02    -0.18    -0.10
Asian                  0.11 0.01     0.10     0.12
Amerindian            -0.12 0.02    -0.17    -0.08
State: Alaska          0.70 0.34     0.04     1.36
State: Arizona         0.75 0.16     0.44     1.07
State: Arkansas       -0.39 0.11    -0.61    -0.17
State: California      0.95 0.16     0.64     1.25
State: Colorado        0.43 0.13     0.17     0.69
State: Connecticut    -0.29 0.15    -0.57     0.00
State: Delaware       -0.17 0.18    -0.52     0.18
State: Florida        -0.18 0.09    -0.34    -0.01
State: Georgia         0.12 0.09    -0.05     0.29
State: Idaho           0.26 0.18    -0.10     0.62
State: Illinois       -0.07 0.11    -0.28     0.14
State: Indiana        -0.55 0.11    -0.76    -0.34
State: Iowa            0.24 0.13    -0.01     0.49
State: Kansas         -0.04 0.12    -0.28     0.19
State: Kentucky       -0.83 0.11    -1.06    -0.61
State: Louisiana       0.13 0.10    -0.07     0.32
State: Maine          -0.46 0.19    -0.83    -0.09
State: Maryland       -0.13 0.12    -0.37     0.11
State: Massachusetts  -0.61 0.14    -0.89    -0.33
State: Michigan       -0.29 0.12    -0.52    -0.06
State: Minnesota       0.09 0.13    -0.17     0.35
State: Mississippi    -0.21 0.11    -0.43     0.00
State: Missouri       -0.33 0.10    -0.52    -0.13
State: Montana        -0.18 0.20    -0.57     0.22
State: Nebraska        0.14 0.14    -0.14     0.43
State: Nevada          0.64 0.17     0.30     0.97
State: New Hampshire  -0.20 0.18    -0.56     0.15
State: New Jersey     -0.57 0.13    -0.82    -0.31
State: New Mexico      0.88 0.15     0.58     1.18
State: New York       -0.57 0.13    -0.82    -0.33
State: North Carolina -0.43 0.09    -0.61    -0.25
State: North Dakota   -0.08 0.22    -0.51     0.34
State: Ohio           -0.75 0.11    -0.96    -0.54
State: Oklahoma       -0.10 0.11    -0.32     0.12
State: Oregon          0.37 0.18     0.02     0.72
State: Pennsylvania   -0.58 0.11    -0.81    -0.36
State: Rhode Island   -0.07 0.19    -0.45     0.31
State: South Carolina -0.21 0.10    -0.40    -0.01
State: South Dakota   -0.05 0.19    -0.43     0.33
State: Tennessee      -0.15 0.09    -0.33     0.02
State: Texas          -0.23 0.09    -0.41    -0.04
State: Utah            1.04 0.15     0.74     1.33
State: Vermont        -0.32 0.22    -0.75     0.12
State: Virginia       -0.20 0.11    -0.42     0.03
State: Washington      0.10 0.18    -0.25     0.45
State: West Virginia  -0.33 0.14    -0.60    -0.06
State: Wisconsin      -0.03 0.12    -0.27     0.21
State: Wyoming         0.23 0.22    -0.21     0.67
lat                    0.18 0.05     0.07     0.28
lon                    0.19 0.06     0.08     0.30
precip                 0.03 0.02     0.00     0.07
temp                   0.04 0.04    -0.04     0.12

$meta
      N      R2 R2 adj.
2682.00    0.79    0.78

The betas change a little, but the R2 is about the same. CA is the still the driving force. E.g. if one calculates eta squared, then CA has 47%, black 10%, Asian 14%, state 8% and everything else rounds to <1%.

I have added a new section with these results.

--

Revision 6 uploaded. https://osf.io/btnx5/

--

Revisions 7-8 uploaded. They have only a few minor cosmetic changes.
Reply
2016-May-04, 18:58:44,
#9
RE: [OQSPS] Inequality across US counties: an S factor analysis
Emil Wrote:When you say "centering table columns", do you mean like this?

Yes, I would recommend formatting the tables like that.

One more minor point:

In the latest version of the paper, Figure 10 is overlapping with footnote 8, which looks rather unsightly.

Once these two minor points (and a few typos/spelling errors) have been dealt with, I will approve the paper for publication.
Reply
2016-May-04, 20:41:40,
#10
RE: [OQSPS] Inequality across US counties: an S factor analysis
(2016-May-04, 18:58:44)NoahCarl Wrote:
Emil Wrote:When you say "centering table columns", do you mean like this?

Yes, I would recommend formatting the tables like that.

One more minor point:

In the latest version of the paper, Figure 10 is overlapping with footnote 8, which looks rather unsightly.

Once these two minor points (and a few typos/spelling errors) have been dealt with, I will approve the paper for publication.

I have changed the table formatting. This is manual labor as there does not appear to be a way to do this automatically in LibreOffice.

I have fixed the figure overlapping.

I have fixed a number of minor cosmetic errors. Furthermore, I have updated the abstract to include the cognitive ability results.

New revision uploaded.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)