Psychometric Analysis of the Multifactor General Knowledge Test

First Previous 1 2

Psychometric Analysis of the Multifactor General Knowledge Test

Sebastian Jensen

Previous Versions

Version #9 - Accepted - Sep 25th Version #8 - Sep 5th Version #7 - Jul 6th Version #6 - Jul 6th Version #5 - Jan 10th Version #4 - Jan 2nd Version #3 - Nov 29th Version #2 - Sep 26th Version #1 - Jun 20th

Submission status
Reviewing

Submission Editor
Noah Carl

Authors
Emil O. W. Kirkegaard
Sebastian Jensen

Title
Psychometric Analysis of the Multifactor General Knowledge Test

Abstract

The Multifactor general knowledge test for the openpsychometrics website was evaluated on multiple dimensions, including its reliability, ability to generate differences in areas where it is known that groups differ, how it should be scored, whether older individuals scored higher, and its dimensionality. The best method to generate the scores was to treat every checkbox as an item and add up the correct and incorrect scores. This generated a highly reliable (ω = 0.93) test, with a low median completion time (577 seconds), and a high ceiling (IQ = 149). One set of items (internet abbreviations) were found to have very low g-loadings, so we recommend removing them. The test also had age, national, and gender differences which replicate previous literature.

The test was clearly biased against non-Anglos, especially in the sections of aesthetic knowledge, cultural knowledge, literary knowledge, and technical knowledge. DIF testing suggested that the test was not biased in favor of Anglo countries, calling into question its usefulness in identifying highly biased tests. Between sexes, DIF found that many items were biased against both genders, but the magnitude of the bias did not vary by either sex. We highly recommend using this test to examine the general knowledge of native English speakers, and the use of a cultural and linguistic translation for non-English speakers.

Keywords
intelligence, sex differences, statistics, knowledge, methods

Supplemental materials link
https://osf.io/erx6q/

Pdf

Paper

Reviewers ( 0 / 2 / 2 )
Reviewer 1: Considering / Revise
Reviewer 2: Accept
Reviewer 3: Considering / Revise
Reviewer 4: Accept

Tue 20 Jun 2023 02:43

Reviewer 2

Fri 14 Jun 2024 22:18

Reviewer | Admin

Speaking of the review status, despite having accepted your paper, I wish I could provide some brief commentary on the final version once this version is updated, especially after the comments added by the reviewer (ie, reviewer 4) just above me.

Sebastian Jensen

Sat 06 Jul 2024 13:18

Author

Replying to Reviewer 4

Replying to Sebastian Jensen

The introduction lacks an explanation of what the goal of the paper is, or which hypotheses your study intends to test. Please add this, as it is a fundamental part of any scientific paper.

" Given that the sex difference in brain size is about d = 0.84 (Nyborg, 2005), the predicted male-female standardized difference in intelligence is 0.24.". Does this account for body size differences?

An explanation of the goal has been added to both the abstract and the introduction to make it unambiguous. Anciliary analysis has been shuffled into the supplement.

The napkin calculation does not account for body size differences. I find the implicit (if existing) objection unconvincing, as there is (to my knowledge) no evidence to date that the ratio between brain size and body size itself is causally related to intelligence within individuals, independent of the effect body size and brain size have on intelligence. The gap in brain size between the sexes also exists after controlling for height, so even if the ratio theory were true, it would be irrelevant to this case in particular.

Forum Bot

Sat 06 Jul 2024 13:19

Bot

Authors have updated the submission to version #6

Sebastian Jensen

Sat 06 Jul 2024 13:42

Author

A few comments on the new version:

1. I looked over the code. Predictably, it was terrible as I wrote it a year ago, but practically speaking it didn't seem to matter.

2. Bibliography was added, as I am approaching the final version of the paper.

3. A few minor things were changed, notably that I removed the optimal method (because it was suboptimal, actually) and added a spline model instead. I also added that the sex difference in brain size persists after controlling for body size.

Forum Bot

Sat 06 Jul 2024 13:42

Bot

Authors have updated the submission to version #7

Reviewer 2

Tue 16 Jul 2024 20:07

Reviewer | Admin

After reading this version, there is nothing I have to say above what I previously pointed out. Now looking at Figure 3, this is interesting as it shows that poorly motivated people will likely rush through the test to finish it as quickly as possible. This is a common problem with low-stake tests. There are obviously some ways of dealing with this issue. But since you've noticed this issue, I wished you would run robustness tests based on people who meet certain criteria (using different time completions for instance). I don't think it will change the results much given the pattern of Fig 3, but I would like you to remember this point for your later research, because in recent years I've read multiple papers repeatedly pointing out this issue. The ideal situation is having time response data on each item, as one could therefore model rapid guessing behaviour (a proxy for item motivation).

A few minor observations:

the results of an oblimin rotated factor analysis with 7 factors is available in Table 1.

It should be "are available"

In light of the fact that adjusting for DIF bias between Anglos and Germans increases the difference slightly, the value of this result is quest

The sentence is unfinished.

Reviewer 4

Tue 16 Jul 2024 20:13

Reviewer

Replying to Forum Bot

Authors have updated the submission to version #7

The authors have satisfactorily addressed my concerns so I updated my decision to accept

Sebastian Jensen

Thu 05 Sep 2024 02:24

Author

Replying to Reviewer 2

After reading this version, there is nothing I have to say above what I previously pointed out. Now looking at Figure 3, this is interesting as it shows that poorly motivated people will likely rush through the test to finish it as quickly as possible. This is a common problem with low-stake tests. There are obviously some ways of dealing with this issue. But since you've noticed this issue, I wished you would run robustness tests based on people who meet certain criteria (using different time completions for instance). I don't think it will change the results much given the pattern of Fig 3, but I would like you to remember this point for your later research, because in recent years I've read multiple papers repeatedly pointing out this issue. The ideal situation is having time response data on each item, as one could therefore model rapid guessing behaviour (a proxy for item motivation).

A few minor observations:

the results of an oblimin rotated factor analysis with 7 factors is available in Table 1.

It should be "are available"

In light of the fact that adjusting for DIF bias between Anglos and Germans increases the difference slightly, the value of this result is quest

The sentence is unfinished.

Errors in new ed are fixed

Forum Bot

Thu 05 Sep 2024 02:25

Bot

Authors have updated the submission to version #8

Forum Bot

Wed 25 Sep 2024 16:01

Bot

The submission was accepted for publication.

First Previous 1 2

Subscribe Reply