Inequality across US counties: an S factor analysis

Open Quantitative Sociology & Political Science, 2016


A dataset of socioeconomic, demographic and geographic data for US counties (N≈3,100) was created by merging data from several sources. A suitable subset of 28 socioeconomic indicators was chosen for analysis. Factor analysis revealed a clear general socioeconomic factor (S factor) which was stable across extraction methods and different samples of indicators (absolute split-half sampling reliability = .85).

Self-identified race/ethnicity (SIRE) population percentages were strongly, but non-linearly, related to cognitive ability and S. In general, the effect of White% and Asian% were positive, while those for Black%, Hispanic% and Amerindian% were negative. The effect was unclear for Other/mixed%. The best model consisted of White%, Black%, Asian% and Amerindian% and explained 41/43% of the variance in cognitive ability/S among counties.

SIRE homogeneity had a non-linear relationship to S, both with and without taking into account the effects of SIRE variables. Overall, the effect was slightly negative due to low S, high White% areas.
Geospatial (latitude, longitude, and elevation) and climatological (temperature, precipitation) predictors were tested in models. In linear regression, they had little incremental validity. However, there was evidence of non-linear relationships. When models were fitted that allowed for non-linear effects of the environmental predictors, they were able to add a moderate amount of incremental validity. LASSO regression, however, suggested that much of this predictive validity was due to overfitting. Furthermore, it was difficult to make causal sense of the results.

Spatial patterns in the data were examined using multiple methods, all of which indicated strong spatial autocorrelation for cognitive ability, S and SIRE (k nearest spatial neighbor regression [KNSNR] correlations of .62 to .89). Model residuals were also spatially autocorrelated, and for this reason the models were re-fit controlling for spatial autocorrelation using KNSNR-based residuals and spatial local regression. The results indicated that the effects of SIREs were not due to spatially autocorrelated confounds except possibly for Black% which was about 50% weaker in the controlled analyses.

Pseudo-multilevel analyses of both the factor structure of S and the SIRE predictive model showed results consistent with the main analyses. Specifically, the factor structure was similar across levels of analysis (states and counties) and within states. Furthermore, the SIRE predictors had similar betas when examined within each state compared to when analyzed across all states.

It was tested whether the relationship between SIREs and S was mediated by cognitive ability. Several methods were used to examine this question and the results were mixed, but generally in line with a partial mediation model.

Jensen's method (method of correlated vectors) was used to examine whether the observed relationship between cognitive ability and S scores was plausibly due to the latent S factor. This was strongly supported (r = .91, Nindicators=28). Similarly, it was examined whether the relationship between SIREs and S scores was plausibly due to the latent S factor. This did not appear to be the case.

USA, United States, inequality, cognitive ability, intelligence, IQ, NAEP, S factor, general socioeconomic factor, race, SIRE, spatial autocorrelation, spatial statistics, multilevel, temperature, latitude, longitude, precipitation, elevation, LASSO regression, Jensen's method, method of correlated vectors

Reviewed by
Bryan J. Pesta, Noah Carl, L.J Zigerell

Review time 49 days.