I requested a second reviewer for this paper, and the person in question asked to be anonymous. I am unable to attach the RTF file of the review, so instead I copy pasted the entire content of the reviewer's comment. Given that it was extremely difficult to find volunteers for this paper, consider it as the last round of reviews. And do not expect a further response from the second reviewer, as it is unlikely that I will contact this reviewer again for this paper. So, try to answer as best as possible. I'll wait and read your updated draft, and tell you whether important points have been unanswered. The most serious issue is from the comments below on the methodology. I highly recommend responding to all points made.
______________________________
Review
OpenPsych (ODP-25-)
National IQs and Socioeconomic Development
This is a long manuscript (62 pages) on a) estimating national cognitive ability levels (national IQ), b) estimating the socioeconomic development levels of countries, and c) the relationship between these indices of national cognitive ability and socioeconomic development. The manuscript shows an in-depth engagement with the topic and uses sophisticated methods. However, there are also a number of points where I believe there is still room for improvement.
Suggestions in general
There should always be a date on page 1 of a manuscript.
Always add page numbers, also in the answer to the reviewers.
Abbreviations for numeric characters (indices), e.g. r, p, d, always as italics.
Correlations without a leading zero: ”Do not use a zero before a decimal when the statistic cannot be greater than 1 (proportion, correlation, level of statistical significance).” So no leading zero for correlations and p-values and standardized betas. See: https://apastyle.apa.org/instructional-aids/numbers-statistics-guide.pdf
Introduction
Page 2: If you introduce the topic of intelligence with "human capital" and economics, then you should also mention educational measures at the beginning, as these have long dominated economic research.
Page 3ff.: Since you will always talk about intelligence or cognitive abilities later, when you use the term "human capital" later, you should always speak of "cognitive human capital".
Data
Table 1 gives a very good overview. I have not checked the completeness of the reference list a the end, please check it yourself!
Regarding Patrinos & Angrist (2018): There are newer data sets:
Altinok, N., & Diebolt, C. (2024). Cliometrics of learning-adjusted years of schooling: Evidence from a new dataset. Cliometrica, 18(3), 691–764. https://doi.org/10.1007/s11698-023-00276-x
Angrist, N., Djankov, S., Goldberg, P. K., & Patrinos, H. A. (2021). Measuring human capital using global learning data. Nature, 592, 403–408.
There is a newer blog post of Cremieux Recueil. Please check if relevant.
Cremieux. (2025, Januar 16). National IQs are valid. National IQ estimates are robust, reliable, and realistic. Cremieux Recueil website: www.cremieux.xyz/p/national-iqs-are-valid
I don't quite understand what a Twitter source is supposed to do here (Recueil, C. (2023). With the latest PISA results …). Legitimate sources for PISA data are OECD reports.
Methodology
“Methodology” is usually “Method”.
”Missing values from the socioeconomic development indicators were imputed with multiple imputation by chained equations (m = 100), with a prediction threshold of r = 0.4, as many indicators are highly correlated with each other. This was reduced to 0.3 in the untransformed data, as the untransformed data was less intercorrelated than the transformed data.”
This may be a technically correct description (I'm not an expert here), but what does it mean? Please formulate it in a sentence or two that is generally understandable and then provide the technical and statistical details mentioned.
”Countries that had more than 45% of their data missing in socioeconomic indicators ... factor scores based on the variables that were not missing were calculated and then their rank relative to the sample was calculated. That rank was then regressed to the mean depending on the omega reliability of the estimate.”
Why was this alternative method chosen and what advantages does it have for countries with more than 45% missing data?
Table 2 gives a good overview.
What are the advantages of the ”spline iteration” method?
What are the advantages of the ”SVM iteration” method?
”On average, scores from these 16 methods correlated at .99, with intercorrelations ranging from .94 to .9999.” – Does that mean that all these methodological peculiarities lead to more or less the same result? If so, please highlight that.
”If a variable exhibited a strongly nonlinear relationship with HDI, where variance at extremes no longer predicted HDI”
What does that mean? This: ”If a variable has a strongly nonlinear relationship with the HDI, where values at the extremes no longer predict the HDI”?
Figure 1: Please show the linear and the nonlinear correlation.
“Sear (2022) also questions whether the figures that are estimated for the African countries are believable, as many of them fall in the 65 to 75 range …”
Also see:
Rindermann, H. (2013). African cognitive ability: Research, results, divergences and recommendations. Personality and Individual Differences, 55, 229–233.
Rindermann, H. (2024). Surprisingly low results from studies on cognitive ability in developing countries: Are the results credible? Discover Education, 3(1), 55. https://doi.org/10.1007/s44217-024-00135-5; www.researchgate.net/publication/380754429_Surprisingly_low_results_from_studies_on_cognitive_ability_in_developing_countries_are_the_results_credible
I recommend placing the section “3.2 Criticism of national IQs” in a suitable place, e.g. at the beginning or as a separate chapter, but not in 'Methodology'.
“Theoretically, some biases will deflate the African IQ relative to what would be expected from their true average levels of intelligence (low effort test takers, Flynn Effect related measurement variance, illiterates), and others will inflate it (use of primary/secondary school students which are less nationally representative in more uneducated countries, use of the standard deviation between groups instead of within groups, use of subtest differences instead of full scale differences).”
Not everything here is convincing. Underscored, also strange wording.
Measurement invariance: As far as I know, this was checked in student achievement tests (TIMSS, PISA etc.) but not in intelligence tests. Correct? E.g., Raven in Europe and in Africa?
Figures 14.1 to 14.4 have their original numbers. Should be changed.
“In practice, the differences between countries on PISA scores are extremely highly correlated and of roughly equal magnitude, as shown in Table 3. Therefore, it must be concluded that minor violations of measurement invariance on the PISA exams, and likely all scholastic tests, do not have a practically significant impact.” – Can this be transferred to the other student achievement tests and intelligence tests (often purely figural)?
Table 3, question: I have always wondered how it is that Cambodia should have such a low cognitive level, next to Vietnam, which has a good one. Is there something wrong? Are the peoples and cultures so different?
“Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5.” – Please outline the type of prediction again in brackets, predictor variables? See also Rindermann (2024), where a prediction was also used. Are there differences?
“Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5.” – Unfortunately, this is not an informative statement. Is the most probable value IQ 80.5 (as hardly readable in Figure 5, letters much too small)? Isn’t that too high?
Table 4: It would be good to have a final value, preferably by means of all or means of SCH and PSY.
Figure 6: All letters are too small, not legible. The correlation should be repeated in the notes to the figure. Explain the gray area as well.
Table 5: Sample size is number of countries?
Figure 7: Can the variable names be made more understandable?
Calculation of the standard error from the correlation (reliability) and standard deviation: Please provide the general formula and the source.
“Restricting to the earlier set of datasets that had no overlapping data (recent TIMSS/PIRLS/PISA results, Rindermann’s SAS estimates, and Becker’s quality weighted psychometric estimates)” – I do not understand, no overlapping data? And then a correlation?
“This is incorrect, as high levels of sample quality in certain regions may be indicative of fraud.” – I don't find that plausible.
“The hypothesis that lower IQ nations have more imprecisely estimated means by collecting estimates of national intelligence that were based on different data (recent TIMSS/PIRLS/PISA assessments, Becker’s psychometric estimates weighted by quality, Rindermann’s estimates of scholastic ability) and estimating the means and the standard errors, where the standard deviation of the sample averages divided by the square root of the number of samples.” – A sentence that is too long and incomprehensible.
“On average, a country’s estimated IQ has a standard error of 2.33, though this figure varies substantially by country: from 0.41 in Denmark to 12 in Cambodia.” – Please describe in an understandable way how standard errors were calculated for individual countries.
Table 6: Are standardized betas shown in Table 6? If so, add them. Tables need notes with explanations.
“To compute the intelligence of nations, measured IQ and achievement test results are used.” – Repeat the sources again.
“Rindermann included estimates that were based on performance in the mathematics olympiad for North Korea, Belarus, Brunei, Cambodia, Mauritania, Tajikistan, and Turkmenistan” – This should only be done if no better data (such as from PISA or IQ tests) are available.
“Samples were normed in a fashion that placed the UK at a mean of 99.26, which is roughly what the UK’s average psychometric IQ is compared to British Whites.” – Add for British Whites IQ 100.
I do not understand this procedure:
“An overall average was computed using nested means:
- Nest 1: Lynn’s estimates, Becker’s composite estimates, Becker’s scholastic estimates, and recent TIMSS math results.
- Nest 2: average of nest 1, recent TIMSS science results, average of Becker’s psychometric estimates, recent PIRLS results, World Bank test scores
- Nest 3: average of nest 2, recent PISA results, and Rindermann’s scholastic estimates
- Nest 4: average of nest 3, basic skills dataset, Rindermann’s IQ estimates”
Nest?
“subjective best estimate was given” – based on what? Rationale? There is no good wording and no good procedure.
Table 7: What is North Korea's IQ based on?