Thank you for the review.
I understand that the analysis is complex. In reality, MGCFA can be much easier if the data is ideal (clean factor structure, near equivalent group samples, no Heywood cases, no pro-bifactor bias, assumption of no cross loadings for computing effect sizes of bias). Unfortunately, the data usually does not fulfill most of these ideal conditions. And in the case of the Project Talent, the large number of subtests, subgroups and models complicate the situation even more. I wish I could simplify as much as possible, but at the same time it is necessary to explain and address the problems that are often ignored in MGCFA studies.
I modified my article according to your suggestions, clarifying and fixing whenever necessary. I also updated my supplementary file.
The weak form of Spearman’s Hypothesis, which states that the racial group differences are primarily due to differences in the general factor (g), was tested and confirmed in this analysis of the Project Talent data, based on 34 aptitude tests among 9th-12th grade students.
...
One comes from Scheiber (2016b) who found strong measurement bias in the analysis of the WISC-V between 777 White males and 830 White females, 188 Black males and Black 221 females, and 308 Hispanic males and Hispanic 313 females.
...
When within-factor correlated residuals are misspecified, all fit indices correctly favor the correlated factors model regardless of conditions, except for SRMR, which incorrectly favors the bifactor model (Greene et al., 2019, Table 4).
I now provided a new Figure 1, along with the following text:
Figure 1 displays hypothetical competing CFA models that are investigated in the present analysis: 1) the correlated factors model which specifies that the first-order specific factors are correlated without the existence of a general factor, 2) the higher order factor model which specifies that the second-order general factor operates through the first-order specific factors and thus only indirectly influences the subtests, 3) the bifactor model which, unlike the higher order factor, specifies that both the general and specific factors, have direct influences on the subtests.
I also added a note under the model fit tables:
Note: higher values of CFI and Mc indicate better fit, while lower values of χ2, RMSEA, RMSEAD, SRMR indicate better fit.
I, however, found one of your request difficult to fulfill. Specifically, this one:
Table 1: There could be another column which states, in plain english, what each of these models is used to test for.
This is because, to summarize the purpose of the models in just 1 or 2 words is extremely difficult. Considering the column specification is already loaded with information, adding another column filled with more information will make the table more tedious to read, I believe.
The models in Table 1 have been somewhat summarized prior, but now expanded a bit more, with a reference to Table 1 as well.
MGCFA starts by adding additional constraints to the initial configural model, with the following incremental steps: metric, scalar, strict. A rejection of configural invariance implies that the groups use different latent abilities to solve the same set of item variables. A rejection in metric (loading) invariance implies that the indicators of a latent factor are unequally weighted across groups. A rejection in scalar (intercept) invariance implies that the subtest scores differ across groups when their latent factor means is equalized. A rejection in strict (residual) invariance implies there is a group difference in specific variance and/or measurement error. When invariance is rejected, partial invariance must release parameters until acceptable fit is achieved and these freed parameters must be carried on in the next levels of MGCFA models. The variances of the latent factors are then constrained to be equal across groups to examine whether the groups use the same range of abilities to answer the subtests. The final step is to determine which latent factors can have their mean differences constrained to zero without deteriorating the model fit: a worsening of the model fit indicates that the factor is needed to account for the group differences. These model specifications will be presented in Table 1 further below.
I provided the R output displaying all parameter values for the best model in Tables 2 through 13, in the supplementary file. The output is in fact so large that, even if I only display the group factor loadings, it will drastically increase the number of pages. The article is already very long. Notice that I did not originally display anywhere in the paper the general factor loadings. This is because, again, there were too many models and subgroups (loadings, both g and specific factors, would need to be displayed for each subgroups, black men, white men, white women, black women for each g models).
The X axis title in figures 2-5 has been modified as per your suggestion.
One final remark. I do not understand this sentence.
As far as I can tell the model specification is the same wherever it is stated. In which case it should only be stated once, with a phrase along the lines of “for all of our models the model specification is:”
Are you refering to the competing models (CF, HOF, BF) or rather to the model constraints (M1-M6)? I suspect the latter, though I'm not sure. If this is the case, each subgroups shows different pattern of non-invariance, so I have to discuss them separately rather than making general and rather unprecise statements about the results. I understand it may be tedious for the readers but I believe it is necessary.