Back to Submissions

1
National IQs and Socioeconomic Development

Submission status
Reviewing

Submission Editor
Submission editor not assigned yet.

Authors
Leonardo Parra
Emil O. W. Kirkegaard

Title
National IQs and socioeconomic development

Abstract

Using 47 indicators of socioeconomic development and various sources of performance on cognitive tests, we constructed the SDI (socioeconomic development index) and a set of national IQs for 197 nations, the latter using no geographic imputations. Combining the various datasets reduced the estimated standard error of national IQs from 5.41 to 2.58, and a strong correlation between socioeconomic development and national IQs was observed (r = .88). 

Based on the prior that Flynn Effect gains do not pass measurement invariance, IQ scores should exhibit some non-negligible bias between countries. Empirical assessments of measurement invariance across nations finds that measurement invariance violations are uncommon, and are more prevalent in verbal than nonverbal tests. In most countries, national IQs show high levels of reliability and validity, and we encourage their use in the literature.

 

Keywords
intelligence, IQ, economic development, economics

Pdf

Paper

Reviewers ( 0 / 1 / 1 )
Reviewer 1: Accept
Reviewer 2: Considering / Revise

Fri 10 Jan 2025 19:19

Reviewer | Admin

This paper examines the relationship between national IQs and an index of socioeconomic development based on 47 indicators, and finds that it is strong. It also addresses various criticisms of national IQs. I would ask the authors to address the following minor points before I can recommend the paper for publication:

1. The authors write:

"Due to its implausibility, the estimate for North Korea (SDI = .98, which would make it the 47th most developed country in the world), was removed from the dataset, as it’s inconsistent with its very low GDP per capita.

This decision seems poorly justified. If there is reason to believe that North Korea's higher-than-expected performance on the index is due to fraudulent data then that would justify removing it, but a mere discrepancy with GDP per capita doesn't seem sufficient. The authors should provide more detail here.

2. The authors write:

"If the expected African IQ differs greatly from the observed one, then this difference is likely to be due to test bias or incorrect assumptions ... Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5."

This seems like a fairly pointless excerise, since we already know that the average IQ in Sub-Saharan Africa lies between 55 and 100. Elsewhere the authors suggest that an estimate of 70 seems plausible. The authors should consider removing this paragraph and Figure 5.

3. The authors write:

"In some cases, unweighted means are more accurate than sample size weighted means when the sample sizes of the studies are large, when the sample sizes are small ..."

The second comma here should be a full stop. There are several sentences like this. The paper needs an English check.

4. Lyman Stone has recently criticised national IQs in an article titled 'Fertility Really Isn't Dysgenic'. The authors should consider addressing his criticisms.

Reviewer | Admin

I requested a second reviewer for this paper, and the person in question asked to be anonymous. I am unable to attach the RTF file of the review, so instead I copy pasted the entire content of the reviewer's comment. Given that it was extremely difficult to find volunteers for this paper, consider it as the last round of reviews. And do not expect a further response from the second reviewer, as it is unlikely that I will contact this reviewer again for this paper. So, try to answer as best as possible. I'll wait and read your updated draft, and tell you whether important points have been unanswered. The most serious issue is from the comments below on the methodology. I highly recommend responding to all points made.

______________________________

 

 

Review

OpenPsych (ODP-25-)

National IQs and Socioeconomic Development

This is a long manuscript (62 pages) on a) estimating national cognitive ability levels (national IQ), b) estimating the socioeconomic development levels of countries, and c) the relationship between these indices of national cognitive ability and socioeconomic development. The manuscript shows an in-depth engagement with the topic and uses sophisticated methods. However, there are also a number of points where I believe there is still room for improvement.

Suggestions in general

There should always be a date on page 1 of a manuscript.

Always add page numbers, also in the answer to the reviewers.

Abbreviations for numeric characters (indices), e.g. r, p, d, always as italics.

Correlations without a leading zero: ”Do not use a zero before a decimal when the statistic cannot be greater than 1 (proportion, correlation, level of statistical significance).” So no leading zero for correlations and p-values and standardized betas. See: https://apastyle.apa.org/instructional-aids/numbers-statistics-guide.pdf

Introduction

Page 2: If you introduce the topic of intelligence with "human capital" and economics, then you should also mention educational measures at the beginning, as these have long dominated economic research.

Page 3ff.: Since you will always talk about intelligence or cognitive abilities later, when you use the term "human capital" later, you should always speak of "cognitive human capital".

Data

Table 1 gives a very good overview. I have not checked the completeness of the reference list a the end, please check it yourself!

Regarding Patrinos & Angrist (2018): There are newer data sets:

Altinok, N., & Diebolt, C. (2024). Cliometrics of learning-adjusted years of schooling: Evidence from a new dataset. Cliometrica, 18(3), 691–764. https://doi.org/10.1007/s11698-023-00276-x

Angrist, N., Djankov, S., Goldberg, P. K., & Patrinos, H. A. (2021). Measuring human capital using global learning data. Nature, 592, 403–408.

There is a newer blog post of Cremieux Recueil. Please check if relevant.

Cremieux. (2025, Januar 16). National IQs are valid. National IQ estimates are robust, reliable, and realistic. Cremieux Recueil website: www.cremieux.xyz/p/national-iqs-are-valid

I don't quite understand what a Twitter source is supposed to do here (Recueil, C. (2023). With the latest PISA results …). Legitimate sources for PISA data are OECD reports.

Methodology

“Methodology” is usually “Method”.

”Missing values from the socioeconomic development indicators were imputed with multiple imputation by chained equations (m = 100), with a prediction threshold of r = 0.4, as many indicators are highly correlated with each other. This was reduced to 0.3 in the untransformed data, as the untransformed data was less intercorrelated than the transformed data.”

This may be a technically correct description (I'm not an expert here), but what does it mean? Please formulate it in a sentence or two that is generally understandable and then provide the technical and statistical details mentioned.

”Countries that had more than 45% of their data missing in socioeconomic indicators ... factor scores based on the variables that were not missing were calculated and then their rank relative to the sample was calculated. That rank was then regressed to the mean depending on the omega reliability of the estimate.”

Why was this alternative method chosen and what advantages does it have for countries with more than 45% missing data?

Table 2 gives a good overview.

What are the advantages of the ”spline iteration” method?

What are the advantages of the ”SVM iteration” method?

”On average, scores from these 16 methods correlated at .99, with intercorrelations ranging from .94 to .9999.” – Does that mean that all these methodological peculiarities lead to more or less the same result? If so, please highlight that.

”If a variable exhibited a strongly nonlinear relationship with HDI, where variance at extremes no longer predicted HDI”

What does that mean? This: ”If a variable has a strongly nonlinear relationship with the HDI, where values at the extremes no longer predict the HDI”?

Figure 1: Please show the linear and the nonlinear correlation.

“Sear (2022) also questions whether the figures that are estimated for the African countries are believable, as many of them fall in the 65 to 75 range …”

Also see:

Rindermann, H. (2013). African cognitive ability: Research, results, divergences and recommendations. Personality and Individual Differences, 55, 229–233.

Rindermann, H. (2024). Surprisingly low results from studies on cognitive ability in developing countries: Are the results credible? Discover Education, 3(1), 55. https://doi.org/10.1007/s44217-024-00135-5; www.researchgate.net/publication/380754429_Surprisingly_low_results_from_studies_on_cognitive_ability_in_developing_countries_are_the_results_credible

I recommend placing the section “3.2 Criticism of national IQs” in a suitable place, e.g. at the beginning or as a separate chapter, but not in 'Methodology'.

“Theoretically, some biases will deflate the African IQ relative to what would be expected from their true average levels of intelligence (low effort test takers, Flynn Effect related measurement variance, illiterates), and others will inflate it (use of primary/secondary school students which are less nationally representative in more uneducated countries, use of the standard deviation between groups instead of within groups, use of subtest differences instead of full scale differences).”

Not everything here is convincing. Underscored, also strange wording.

Measurement invariance: As far as I know, this was checked in student achievement tests (TIMSS, PISA etc.) but not in intelligence tests. Correct? E.g., Raven in Europe and in Africa?

Figures 14.1 to 14.4 have their original numbers. Should be changed.

“In practice, the differences between countries on PISA scores are extremely highly correlated and of roughly equal magnitude, as shown in Table 3. Therefore, it must be concluded that minor violations of measurement invariance on the PISA exams, and likely all scholastic tests, do not have a practically significant impact.” – Can this be transferred to the other student achievement tests and intelligence tests (often purely figural)?

Table 3, question: I have always wondered how it is that Cambodia should have such a low cognitive level, next to Vietnam, which has a good one. Is there something wrong? Are the peoples and cultures so different?

“Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5.” – Please outline the type of prediction again in brackets, predictor variables? See also Rindermann (2024), where a prediction was also used. Are there differences?

“Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5.” – Unfortunately, this is not an informative statement. Is the most probable value IQ 80.5 (as hardly readable in Figure 5, letters much too small)? Isn’t that too high?

Table 4: It would be good to have a final value, preferably by means of all or means of SCH and PSY.

Figure 6: All letters are too small, not legible. The correlation should be repeated in the notes to the figure. Explain the gray area as well.

Table 5: Sample size is number of countries?

Figure 7: Can the variable names be made more understandable?

Calculation of the standard error from the correlation (reliability) and standard deviation: Please provide the general formula and the source.

“Restricting to the earlier set of datasets that had no overlapping data (recent TIMSS/PIRLS/PISA results, Rindermann’s SAS estimates, and Becker’s quality weighted psychometric estimates)” – I do not understand, no overlapping data? And then a correlation?

“This is incorrect, as high levels of sample quality in certain regions may be indicative of fraud.” – I don't find that plausible.

“The hypothesis that lower IQ nations have more imprecisely estimated means by collecting estimates of national intelligence that were based on different data (recent TIMSS/PIRLS/PISA assessments, Becker’s psychometric estimates weighted by quality, Rindermann’s estimates of scholastic ability) and estimating the means and the standard errors, where the standard deviation of the sample averages divided by the square root of the number of samples.” – A sentence that is too long and incomprehensible.

“On average, a country’s estimated IQ has a standard error of 2.33, though this figure varies substantially by country: from 0.41 in Denmark to 12 in Cambodia.” – Please describe in an understandable way how standard errors were calculated for individual countries.

Table 6: Are standardized betas shown in Table 6? If so, add them. Tables need notes with explanations.

“To compute the intelligence of nations, measured IQ and achievement test results are used.” – Repeat the sources again.

“Rindermann included estimates that were based on performance in the mathematics olympiad for North Korea, Belarus, Brunei, Cambodia, Mauritania, Tajikistan, and Turkmenistan” – This should only be done if no better data (such as from PISA or IQ tests) are available.

“Samples were normed in a fashion that placed the UK at a mean of 99.26, which is roughly what the UK’s average psychometric IQ is compared to British Whites.” – Add for British Whites IQ 100.

I do not understand this procedure:

“An overall average was computed using nested means:

- Nest 1: Lynn’s estimates, Becker’s composite estimates, Becker’s scholastic estimates, and recent TIMSS math results.

- Nest 2: average of nest 1, recent TIMSS science results, average of Becker’s psychometric estimates, recent PIRLS results, World Bank test scores

- Nest 3: average of nest 2, recent PISA results, and Rindermann’s scholastic estimates

- Nest 4: average of nest 3, basic skills dataset, Rindermann’s IQ estimates”

Nest?

“subjective best estimate was given” – based on what? Rationale? There is no good wording and no good procedure.

Table 7: What is North Korea's IQ based on?

 

Reviewer | Admin

Here's the last part of the review (not displayed above due to limitation):

 

____________

 

 

Results

I had the impression that there were results before.

Figure 11: If linear and non-linear curves are shown, please always show the correlations for both.

Figure 11: If I understand the flattening of the curves at the lower and upper ends correctly, then either the IQs are underestimated or overestimated, or the abilities are so low or so high that they are no longer adequately reflected in the Socioeconomic Development Index (no longer an effect).

Figures 13 and 14: As is so often the case, letters are far too small, barely legible or illegible. Figures need notes with explanations (SDI).

Figure 15: I am surprised that there is such a high correlation here (r=−.63). As is so often the case, letters are far too small, barely legible or illegible. Add correlation for linear relationship (Pearson).

Discussion

(e.g. Africa) -> (e.g. in Africa)

“Our measurement of socioeconomic development, the SDI, correlates highly with the HDI and the SPI (r = .98 and .97, respectively), indicating that it has high levels of external validity.” – If the correlation is so high, what is the benefit of the new variables?

“We have estimated that the composite measurement (SE of 2.6) has 50% less error than the average dataset that measures proxies for national intelligence.” – Important point, should be verbalized even more, i.e. more measurement accuracy by adding data sets, etc. Also in the abstract.

“with nonverbal tests (e.g. mathematics) showing more invariance than verbal (e.g. reading) ones.” -> “with nonverbal tests (e.g. mathematics) showing more invariance (i.e. being better comparable) than verbal (e.g. reading) ones.”

“Some groups that are genetically highly similar still differ greatly in IQ:” – how do you know? Better: “Some groups with highly similar ancestry still differ greatly in IQ:”

Acknowledgement

What is “handling the DOI data”?

Appendix

Table A1: In the notes, add information about the sources used and the method used to determine the mean value.

“Figure A2. Relationship between national IQ (estimated in 2002 by Lynn) and national IQ (estimated in 2024).” – Add source.

Figures A4, A5: The correlation stands for the linear or nonlinear relationship? Always show both.

Figure A6: Letters are far too small, barely legible or illegible. Name the upper branches (2 to 10). That alone would be worth publishing elsewhere!!!

Table A5: The use of the three-digit country code is very good and should be standard! Unfortunately, this is missing in the other tables, e.g. in Table A1.

 

 

Author
Replying to Reviewer 1

This paper examines the relationship between national IQs and an index of socioeconomic development based on 47 indicators, and finds that it is strong. It also addresses various criticisms of national IQs. I would ask the authors to address the following minor points before I can recommend the paper for publication:

1. The authors write:

"Due to its implausibility, the estimate for North Korea (SDI = .98, which would make it the 47th most developed country in the world), was removed from the dataset, as it’s inconsistent with its very low GDP per capita.

This decision seems poorly justified. If there is reason to believe that North Korea's higher-than-expected performance on the index is due to fraudulent data then that would justify removing it, but a mere discrepancy with GDP per capita doesn't seem sufficient. The authors should provide more detail here.

The paper has since been reworked to only about national IQs, as the part about socioeconomic development was secondary and detracted from its original purpose. 

2. The authors write:

"If the expected African IQ differs greatly from the observed one, then this difference is likely to be due to test bias or incorrect assumptions ... Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5."

This seems like a fairly pointless excerise, since we already know that the average IQ in Sub-Saharan Africa lies between 55 and 100. Elsewhere the authors suggest that an estimate of 70 seems plausible. The authors should consider removing this paragraph and Figure 5.

I disagree. Let's say there was a large scale project to measure the IQ of SS Africa using matrix tests that rivalled PISA/PIRLS results in quality, so that the results could not be blamed on selection bias on part of the researchers/aggregators or nonrepresentative sampling, and the average found was 72. One could still consider that to be an anomalously low result and blame it on psychometric bias that does not reflect latent intelligence; the purpose of the exercise was to note that, based on what we know about genes and environments, that any value between 55 and 100 could be technically possible, though using realistic estimates for the relevant parameters would yield a plausible range of somewhere between 70 and 90. The section however was poorly phrased, and I rewrote it to clarify that the function was to clarify that an IQ of 70 could plausibly reflect a that is that low as well. 

3. The authors write:

"In some cases, unweighted means are more accurate than sample size weighted means when the sample sizes of the studies are large, when the sample sizes are small ..."

The second comma here should be a full stop. There are several sentences like this. The paper needs an English check.

That is correct. The next version of the paper will be subject to much more extensive grammar/spelling checks. 

4. Lyman Stone has recently criticised national IQs in an article titled 'Fertility Really Isn't Dysgenic'. The authors should consider addressing his criticisms.

I responded to these criticisms on my personal blog -- I'll clarify that poor quality of individual samples does not necessarily reflect that the dataset is of poor quality because standard errors are a function of both the standard deviation and the sample size.

Author

Note: any suggestion related to the socioeconomic portion of the article has been removed. 

Replying to Mon 03 Mar 2025 18:33

There should always be a date on page 1 of a manuscript.

On OP there are submission dates. I believe those suffice.

Always add page numbers, also in the answer to the reviewers.

Added.

Page 3ff.: Since you will always talk about intelligence or cognitive abilities later, when you use the term "human capital" later, you should always speak of "cognitive human capital".

Other organizations (in this case, the world bank and the people behind the basic skills dataset) are not explicit about whether human capital is cognitive or physiological, so the regular term will still be used.

Data

Table 1 gives a very good overview. I have not checked the completeness of the reference list a the end, please check it yourself!

Regarding Patrinos & Angrist (2018): There are newer data sets:

Altinok, N., & Diebolt, C. (2024). Cliometrics of learning-adjusted years of schooling: Evidence from a new dataset. Cliometrica, 18(3), 691–764. https://doi.org/10.1007/s11698-023-00276-x

Angrist, N., Djankov, S., Goldberg, P. K., & Patrinos, H. A. (2021). Measuring human capital using global learning data. Nature, 592, 403–408.

There is a newer blog post of Cremieux Recueil. Please check if relevant.

Cremieux. (2025, Januar 16). National IQs are valid. National IQ estimates are robust, reliable, and realistic. Cremieux Recueil website: www.cremieux.xyz/p/national-iqs-are-valid

I don't quite understand what a Twitter source is supposed to do here (Recueil, C. (2023). With the latest PISA results …). Legitimate sources for PISA data are OECD reports.

I'll change this in the next version.

“Sear (2022) also questions whether the figures that are estimated for the African countries are believable, as many of them fall in the 65 to 75 range …”

Also see:

Rindermann, H. (2013). African cognitive ability: Research, results, divergences and recommendations. Personality and Individual Differences, 55, 229–233.

Rindermann, H. (2024). Surprisingly low results from studies on cognitive ability in developing countries: Are the results credible? Discover Education, 3(1), 55. https://doi.org/10.1007/s44217-024-00135-5; www.researchgate.net/publication/380754429_Surprisingly_low_results_from_studies_on_cognitive_ability_in_developing_countries_are_the_results_credible

I recommend placing the section “3.2 Criticism of national IQs” in a suitable place, e.g. at the beginning or as a separate chapter, but not in 'Methodology'.

I'm considering putting my response to the criticisms in the introduction.

“Theoretically, some biases will deflate the African IQ relative to what would be expected from their true average levels of intelligence (low effort test takers, Flynn Effect related measurement variance, illiterates), and others will inflate it (use of primary/secondary school students which are less nationally representative in more uneducated countries, use of the standard deviation between groups instead of within groups, use of subtest differences instead of full scale differences).”

Not everything here is convincing. Underscored, also strange wording.

None of these are certain factors, just things that could possibly interfere with accurate estimations of g between different nations.

Measurement invariance: As far as I know, this was checked in student achievement tests (TIMSS, PISA etc.) but not in intelligence tests. Correct? E.g., Raven in Europe and in Africa?

Figures 14.1 to 14.4 have their original numbers. Should be changed.

The images will be edited.

“In practice, the differences between countries on PISA scores are extremely highly correlated and of roughly equal magnitude, as shown in Table 3. Therefore, it must be concluded that minor violations of measurement invariance on the PISA exams, and likely all scholastic tests, do not have a practically significant impact.” – Can this be transferred to the other student achievement tests and intelligence tests (often purely figural)?

Visually it seems to hold for other scholastic tests. I doubt this would be the case for particular tests of intelligence (e.g. reaction time, reading, forward digit span) due to differences in g-loadings. I'll mention that there being discrepancies between tests in terms of national differences is not necessarily indicative of bias as the cause may be differences in g-loadings.

Table 3, question: I have always wondered how it is that Cambodia should have such a low cognitive level, next to Vietnam, which has a good one. Is there something wrong? Are the peoples and cultures so different?

Not clear; the cause could be environmental in origin, or the country's intelligence may be misestimated.

“Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5.” – Please outline the type of prediction again in brackets, predictor variables? See also Rindermann (2024), where a prediction was also used. Are there differences?

The predictor variables are the proportion of the difference in intelligence between American Whites and Blacks that is due to additive genes, the extent to which the environment of Sub-Saharan Africa depresses IQ relative to the American one, and the magnitude of the difference in intelligence between American Blacks and Whites. The expected Sub-Saharan African IQ can be calculated using arithmetic.

“Using these parameter ranges, the expected IQ of Sub-Saharan Africa could be anywhere from 55 to 100, as shown in Figure 5.” – Unfortunately, this is not an informative statement. Is the most probable value IQ 80.5 (as hardly readable in Figure 5, letters much too small)? Isn’t that too high?

A bit, though the purpose of the exercise was to note that an estimated IQ of 70 is low, but not impossible.

Table 4: It would be good to have a final value, preferably by means of all or means of SCH and PSY.

The final value is in the results section as well as the appendix.

Figure 6: All letters are too small, not legible. The correlation should be repeated in the notes to the figure. Explain the gray area as well.

The correlation is in the text. Grey area will be explained in the 2nd ver.

Table 5: Sample size is number of countries?

In the third column.

Figure 7: Can the variable names be made more understandable?

Given that the focus is being shifted away from socioeconomic development, this figure has been removed from the 2nd version.

Calculation of the standard error from the correlation (reliability) and standard deviation: Please provide the general formula and the source.

Leong, F., & Huang, J. (2010). Standard error of measurement. In Encyclopedia of Research Design. SAGE Publications, Inc. https://doi.org/10.4135/9781412961288.n436 The formula is in the text.

“Restricting to the earlier set of datasets that had no overlapping data (recent TIMSS/PIRLS/PISA results, Rindermann’s SAS estimates, and Becker’s quality weighted psychometric estimates)” – I do not understand, no overlapping data? And then a correlation?

Some of the datasets overlap in terms of sources, e.g. the World Bank Test Scores and Basic Skills Dataset both use PISA scores. Phrase was removed to avoid confusions.

“This is incorrect, as high levels of sample quality in certain regions may be indicative of fraud.” – I don't find that plausible.

Maybe it's not true. Who knows.

“The hypothesis that lower IQ nations have more imprecisely estimated means by collecting estimates of national intelligence that were based on different data (recent TIMSS/PIRLS/PISA assessments, Becker’s psychometric estimates weighted by quality, Rindermann’s estimates of scholastic ability) and estimating the means and the standard errors, where the standard deviation of the sample averages divided by the square root of the number of samples.” – A sentence that is too long and incomprehensible.

Fixed.

“On average, a country’s estimated IQ has a standard error of 2.33, though this figure varies substantially by country: from 0.41 in Denmark to 12 in Cambodia.” – Please describe in an understandable way how standard errors were calculated for individual countries.

SD/sqrt(n)

Table 6: Are standardized betas shown in Table 6? If so, add them. Tables need notes with explanations.

I don't think the standardized betas are relevant here. The purpose of the table is to show that the higher standard errors in lower IQ countries are not due to there being less data available for those countries.

“To compute the intelligence of nations, measured IQ and achievement test results are used.” – Repeat the sources again.

I'll state that the sources are in Table 1.

“Rindermann included estimates that were based on performance in the mathematics olympiad for North Korea, Belarus, Brunei, Cambodia, Mauritania, Tajikistan, and Turkmenistan” – This should only be done if no better data (such as from PISA or IQ tests) are available.

“Samples were normed in a fashion that placed the UK at a mean of 99.26, which is roughly what the UK’s average psychometric IQ is compared to British Whites.” – Add for British Whites IQ 100.

I do not understand this procedure:

“An overall average was computed using nested means:

- Nest 1: Lynn’s estimates, Becker’s composite estimates, Becker’s scholastic estimates, and recent TIMSS math results.

- Nest 2: average of nest 1, recent TIMSS science results, average of Becker’s psychometric estimates, recent PIRLS results, World Bank test scores

- Nest 3: average of nest 2, recent PISA results, and Rindermann’s scholastic estimates

- Nest 4: average of nest 3, basic skills dataset, Rindermann’s IQ estimates”

Nest?

I'm not sure what else to call them. Basically the idea is:

Nest 1 = a + b + c + d

Nest 2 = nest 1 + e + f

Nest 3 = nest 2 + g + ...

 

“subjective best estimate was given” – based on what? Rationale? There is no good wording and no good procedure.

I posted a supplement where I go through my reasoning for each estimation. In the article itself, I note that this procedure was largely unnecessary as the subjective estimates correlated highly with the mathematical estimates that would have been given. 

Table 7: What is North Korea's IQ based on?

Two samples of ~30 people, if I recall correctly. 

 

Author
Replying to Mon 03 Mar 2025 18:34

Here's the last part of the review (not displayed above due to limitation):

 

____________

 

 

Results

I had the impression that there were results before.

Figure 11: If linear and non-linear curves are shown, please always show the correlations for both.

Figure 11: If I understand the flattening of the curves at the lower and upper ends correctly, then either the IQs are underestimated or overestimated, or the abilities are so low or so high that they are no longer adequately reflected in the Socioeconomic Development Index (no longer an effect).

Figures 13 and 14: As is so often the case, letters are far too small, barely legible or illegible. Figures need notes with explanations (SDI).

Figure 15: I am surprised that there is such a high correlation here (r=−.63). As is so often the case, letters are far too small, barely legible or illegible. Add correlation for linear relationship (Pearson).

The figure is in high resolution and can be zoomed into.

Discussion

(e.g. Africa) -> (e.g. in Africa)

“Our measurement of socioeconomic development, the SDI, correlates highly with the HDI and the SPI (r = .98 and .97, respectively), indicating that it has high levels of external validity.” – If the correlation is so high, what is the benefit of the new variables?

“We have estimated that the composite measurement (SE of 2.6) has 50% less error than the average dataset that measures proxies for national intelligence.” – Important point, should be verbalized even more, i.e. more measurement accuracy by adding data sets, etc. Also in the abstract.

It's in the abstract.

“with nonverbal tests (e.g. mathematics) showing more invariance than verbal (e.g. reading) ones.” -> “with nonverbal tests (e.g. mathematics) showing more invariance (i.e. being better comparable) than verbal (e.g. reading) ones.”

Fixed.

“Some groups that are genetically highly similar still differ greatly in IQ:” – how do you know? Better: “Some groups with highly similar ancestry still differ greatly in IQ:”

Fixed.

Acknowledgement

What is “handling the DOI data”?

Appendix

Table A1: In the notes, add information about the sources used and the method used to determine the mean value.

The entire methodology outlines this; why repeat it?

“Figure A2. Relationship between national IQ (estimated in 2002 by Lynn) and national IQ (estimated in 2024).” – Add source.

Fixed.

Figures A4, A5: The correlation stands for the linear or nonlinear relationship? Always show both.

Figure A6: Letters are far too small, barely legible or illegible. Name the upper branches (2 to 10). That alone would be worth publishing elsewhere!!!

Table A5: The use of the three-digit country code is very good and should be standard! Unfortunately, this is missing in the other tables, e.g. in Table A1.

 

 

 

Bot

Authors have updated the submission to version #4

Reviewer | Admin

I approve the paper for publication.

Reviewer | Admin

Your response to the reviewers is mostly fine, but I need to remind you that you need to upload all your files (especially the data used in the analysis) at OSF and display the link in your submission. If possible, display this OSF link at the end of the paper, right before the appendix or reference section.

Now, since I have read the draft carefully, it would be a waste not to give you my inputs. Although you may decide to ignore those or not, since I'm not a reviewer here.

Below is my comment regarding your response.

In practice, the differences between countries on PISA scores are extremely highly correlated and of roughly equal magnitude, as shown in Table 3. Therefore, it must be concluded that minor violations of measurement invariance on the PISA exams

High correlation does not prove nor does it make more likely the tests are unbiased. There could be uniform bias or external validity issue (PISA may not measure real world outcome equally across all countries). Remember also that correlation structure says nothing about mean structure. Even if there is no predictive bias, some absolute scores could still be distorted.

Same argument applies to this passage later in the paper "Therefore, it must be concluded that minor violations of measurement invariance on the PISA exams, and likely all scholastic tests, do not have a practically significant impact."

Table 5: Sample size is number of countries?

In the third column. 

It seems to me the reviewer asked, not where the sample size is, but whether the sample size column refers to the data points for each country.

I don't think the standardized betas are relevant here. The purpose of the table is to show that the higher standard errors in lower IQ countries are not due to there being less data available for those countries. 

If you don't intend to include standardized beta (in table 4), at the very least, specify in the table the estimates are unstandardized, to avoid confusion.

Figure A2. Relationship between national IQ (Lynn, 2002) and national IQ (estimated in 2024).

I still don't see Lynn 2002 in the reference list at the end

Figures A4, A5: The correlation stands for the linear or nonlinear relationship? Always show both.

You did not address this request of reviewer 2. If you can display the correlation for the non-linear relationship, do so (even if it has to be displayed somewhere else, e.g., in the title or description of the figure).

There is a Table 5 in the main text and another Table 5 in the appendix. Probably Table 5 should be A2 (as well as in the text in appendix)

 

Below is my comment on your newest draft. Here too I'll let you decide what to make of it. But make sure to fix citations etc.

There is, however, no Table 1 at all. The Wu et al 2017 Figure 1 should be the table 1. (also mention this as table 1 in the main text as well)

most of the researchers who are critical of Lynn’s national IQ dataset do not support the theory of genetic differences in intelligence between races

Some people use strawman arguments by implying hereditarians such as Lynn, Jensen, Rushton etc, advanced the idea that 100% of the difference is genetic. It would be better then if your sentence conveys more clearly the idea that you are referring to group differences being partially genetic. Some would take advantage of misconstruing this sentence by implying that "the theory of genetic differences" refers to a 100% genetic based difference.

wrote a defense of the use of national IQs, mostly in the methods section.

Since you accepted the reviewer 2 request of moving this section to the intro section, this portion should be rewritten or removed. Especially in the new draft, the response to those criticism starts in the paragraphs right below that sentence.

One observation about referencing in the main text. Remember next time to cite properly. Academics typically cite authors by alphabetic order (this is highly recommended). But in your case, it's all over the place. 

For comparing national means, scalar invariance is the most important test of measurement invariance that needs to be satisfied.

To clarify, all three first steps are equally important. If configural or metric (or both) is violated, then you are comparing apples and oranges, which is no less worse than having intercept bias. Here, your sentence seems to suggest (at least to me) that the only important invariance model is the intercept. 

Strict measurement invariance was held within Anglo and East Asian cultural groups on the 1999 TIMSS tests, though only weak (metric, but not scalar) measurement invariance was held between the cultural groups (Wu et al., 2007), as shown in Figure 1

Older papers (but even some recent ones too) use outdated cutoffs. Wu et al use .01 CFI as cutoff, but we know more recently that even >.003 CFI can lead to substantial scalar bias. Given that most comparisons in their table 3 display a rounded value of .01, I can tell that full scalar invariance is violated in nearly all group comparisons.

Their methodology is limited by the fact that measurement invariance was assessed at the factor level, as groups are likely to differ in general and specific ability

They assess math subtests. It is only when the subtests measure another ability other than math (ie, the unidimensionality assumption is violated) that you would be correct. Also, item level analysis does not preclude any further test at the subtest level, as these two analyses do not provide  the same information about test bias.

Another study of international test bias of the PISA item data on the reading subtest found that scalar invariance was violated in most nations, with the magnitude of invariance ranging from 0.041 in Canada to 0.93 in Kyrgyzstan (Asil & Brown, 2015).

This is reported quite badly. You should report the findings more accurately, that they use dMACS as effect size measure of bias. You should also say that it is interpreted as Cohen's d, with thresholds for small, moderate, and large effects at 0.2, 0.5, and 0.8 respectively. Also again, this paper uses outdated cutoff (CFI >.01) so their conclusion about scalar invariance is very misleading.

On average, a country’s estimated IQ has a standard error of 2.33, though this figure varies substantially by country: from 0.41 in Denmark to 12 in Cambodia

If you can add a footnote or a short sentence to tentatively explain why SE for Cambodia is so high (few data points or extreme variability?), this would be greatly useful.

Samples that displayed unusual heterogeneity or extreme means in either direction were manually reviewed

It is typically recommended to say explicitly which criteria are used for flagging data points as "odd" values.

Regarding the appendix, it would have been much more helpful if there was a description somewhere (main text or in the appendix section) explaining the usefulness of these tables and figures. Without such a description, no one really knows what is your main focus with these displays. Also, description allows reader to understand eventually some technical terms such as Gower distance, etc.

Take this for instance: Figure A8. IQ by country (alternative version).

Relative to what? Figure 4? Better be clear and professional in general.

Or figure A10, maybe you could explain to the non-initiated reader what this means -> log(GNI) = 0.0876*NIQ + 2.09.

CIA 2023; DOI foundation 2024; IMF 2024; Legatum prosperity index; Qi et al 2013; United Nations 2011; Wikipedia 2024a; World Bank Open Data 2023a 2023b 2023c; World Population Review 2024a, 2024b 2024c 2024d; are mentioned in the reference but not in the text. 

Leong & Huang, 2010 is mentioned in text but not reference.

You have OECD 2022 in reference but in table 2 it's OECD 2023