Hello There, Guest!

# [ODP] Examining the ICAR and CRT tests in a Danish student sample

Dalliard,

Thanks for the comments. Quotations below are from you unless otherwise stated.

Quote:What do those numbers (1, 4, 1) stand for, percentages?

They were raw numbers. I have changed them to percentages and checked that they sum to 100%.

Quote:According to the explanation here, IRT factor analysis means that first a tetrachoric correlation matrix is calculated from item data and then the correlation matrix is factor-analyzed in the usual way. In other words, it is assumed that normally distributed continuous latent variables underlie dictotomous item responses. Add this explanation to the paper because it's not obvious otherwise.

Yes, it uses the tetrachoric correlations which estimates the Pearson correlations if the variable wasn't dichotomous. I have added:

Quote:To examine the internal structure, we used both traditional factor analysis (FA) and item response theory factor analysis (IRT FA) on all the cognitive items to extract 1 factor. Although popular, principal components analysis has been shown to give misleading results in some cases.[6] For this reason, we used another extraction method which by default is MinRes, but it does not appear to make a large difference which method is used.[6] The functions fa() and irt.fa() from the psych package were used for extraction.[7] The difference between FA and IRT FA is that latter is done on the correlation matrix calculated using tetrachronic correlations. A tetrachronic correlation is designed to estimate the regular Pearson correlation when used on dichotomous variables such as correct/incorrect items

Quote:On what basis are those values high? Unless you have some good standard to compare them against, I would say that the values suggest reasonable or good congruence between languages, especially as your sample is small.

In my study of item-level SPM's studies, I found a mean item pass rate correlation of .88 across 66 comparisons. These were very diverse samples (Roma, White, Colored, Black, Indian, North African). The SPM has 60 items, making it much easier to attain high correlations than with only 16 items. Thus, the pass rate correlation of .85 obtained in this study seems remarkably good given that this is a partly verbal translated test.

http://emilkirkegaard.dk/en/?p=4971

I'm not aware of any standard of comparison for standard deviations, but I know that they fluctuate more than the item pass rates. A value of .63 thus seems good to me.

Quote:The CIs are the same for both tests, so the amount of uncertainty is the same. I would just say that the 95% CI for CRT includes zero.

I have changed the text to:

Quote:Both predictors had positive betas; however, the beta for CRT was only .11 and the confidence interval included 0. Due to the small sample, this may either be because it has no incremental validity or because power was too low to detect it.

---

I have updated the OSF with the new files. PDF draft #7

https://osf.io/2ipgb/
1) "A large fraction of cognitive and personality tests are privately owned and are usually very expensive to obtain legally (e.g. the Wechschler test is owned by Pearson)."

It's spelled Wechsler. Given that individually administered tests like Wechsler's comprise <a href="http://mla-s2-p.mlstatic.com/wisc-iiiequipo-completo-en-bolso-y-con-cajitas-de-acrilico-4061-MLA120359773_1740-F.jpg">booklets and physical stimulus materials</a>, the suggestion that you could obtain them illegally is unusual (i.e., you'd have to physically steal them).

2) "The difference between FA and IRT FA is that latter is done on the correlation matrix calculated using tetrachronic correlations. A tetrachronic correlation is designed to estimate the regular Pearson correlation when used on dichotomous variables such as correct/incorrect items."

This is not quite correct. Pearson correlations can be calculated for dichotomous data -- that's what the "traditional factor analysis" in your paper is about. IRT FA is based on the assumption that underlying the observed dichotomous data are normally distributed continuous latent variables. Tetrachoric (not tetrachronic!) correlations are used to estimate correlations between these latent variables.

I would put it like this:

"The difference between FA and IRT FA is that the latter is done on a correlation matrix calculated using tetrachoric correlations. A tetrachoric correlation estimates the Pearson correlation between two normally distributed continuous latent variables that are assumed to underlie dichotomous variables such as correct/incorrect items."

3) The caption of Table 6 should read "... Parameter ESTIMATES from OLS regression."

4) You apparently have the subjects' ages. How do they influence test scores? It would make sense to adjust the scores for age.

5) There are, in principle, ethical issues when you recruit human subjects. How did you present your project to the participants?
Dalliard,

Thanks for commenting again.

Quote:It's spelled Wechsler. Given that individually administered tests like Wechsler's comprise booklets and physical stimulus materials, the suggestion that you could obtain them illegally is unusual (i.e., you'd have to physically steal them).

You seem to neglect the possibility that one could simply copy them, which I what I was alluding to. This wouldn't be stealing, but it would involve breaching intellectual monopoly laws (copyright in this case).

Quote:This is not quite correct. Pearson correlations can be calculated for dichotomous data -- that's what the "traditional factor analysis" in your paper is about. IRT FA is based on the assumption that underlying the observed dichotomous data are normally distributed continuous latent variables. Tetrachoric (not tetrachronic!) correlations are used to estimate correlations between these latent variables.

We don't seem to be in disagreement, aside from the spelling error. I will swap to your preferred phrasing as it makes no difference to me.

Quote:3) The caption of Table 6 should read "... Parameter ESTIMATES from OLS regression."

Ok.

Quote:4) You apparently have the subjects' ages. How do they influence test scores? It would make sense to adjust the scores for age.

We have the variables listed in section 2. You could download the data yourself if you want to examine age. It was not a planned research question for us and besides the variation in age is fairly small (sd = 1.25), so the expected effect size of age is not very large and sample size is way too small to detect it reliably.

I tried now that you asked. The correlation with factor scores is .11. Running a partial correlation with factor scores and GPA controlling for age resulted in r = .404, or pretty much the same as without controlling for age.

Quote:5) There are, in principle, ethical issues when you recruit human subjects. How did you present your project to the participants?

The questionnaire can be found here: https://docs.google.com/forms/d/1QYXN9OL...4cQog/edit#

The description:

Quote:Vi ønsker at studere gymnasieelevers karakterer og deres sammenhæng med andre faktorer. Derfor har vi sammensat dette spørgeskema som vi håber du vil være med til at svare på.

Alle besvarelser er anonyme. Spørgeskemaet er struktureret som følger.

Først spørger vi efter grundliggende demografisk information. Derefter spørger vi om karakterer.

Derefter kommer en kognitive test i 5 dele. Del 1 måler din evne til at reflektere, del 2 måler din evne til sproglig ræsonnering, del 3 måler din evne til finde alfanumeriske mønstrer, del 4 måler din evne til se mønstrer i figurer, og del 5 måler din evne til at rotere figurer i 3D.

Der er INGEN TIDSBEGRÆNSNING på opgaverne, så giv dig god tid uden at blive forstyrret. Det tager ca. 10-15 minutter at besvare.

In English (just a quick translation):

Quote:We are interested in studying the grades of gymnasie students and their relationship to others factors. For this reason we have put together this questionnaire which we hope that you will take part in.

All responses are anonymous. The questionnaire is structured as follows:

Then comes a cognitive test in 5 parts. Part 1 measures your ability to reflect, part 2 measures your ability to reason verbally, part 3 measures your ability to find alphanumeric patterns, part 4 measures your ability to see patterns in figures and part 5 measures your ability to rotate figures in 3D.

There is NO TIME LIMIT on the tasks, so give yourself good time without being distracted. It takes about 10-15 minutes to answer.

We did not seek any kind of pre-approval for this, either from the schools or my university as that seemed to be a waste of time.

---

Changes:
* Fixed spelling of Wechsler.
* Swapped to Dalliard's preferred phrasing re. FA and FA IRT.
* Changed Table 6 caption.
* Fixed "international reliability measures" in Table 4 caption.

I have updated the file on OSF: https://osf.io/2ipgb/
(2015-Jun-20, 22:46:28)Emil Wrote: You seem to neglect the possibility that one could simply copy them, which I what I was alluding to. This wouldn't be stealing, but it would involve breaching intellectual monopoly laws (copyright in this case).

It takes a dedicated pirate to copy, for example, the <a href="http://log24.com/log/images/020831-wechsler.jpg">plastic blocks</a> used in Wechsler's block design test, but okay.

It's not true that there's "no good reason" for psychometric tests not being free. The obvious advantage of a commercial IQ test is the accompanying normative data based on a representative national sample. It's not cheap for psychologists to administer tests one-on-one to thousands of people. The ICAR doesn't have this kind of normative data. Group differences cannot be properly studied without random samples, for one thing.

Quote:We have the variables listed in section 2. You could download the data yourself if you want to examine age. It was not a planned research question for us and besides the variation in age is fairly small (sd = 1.25), so the expected effect size of age is not very large and sample size is way too small to detect it reliably.

I tried now that you asked. The correlation with factor scores is .11. Running a partial correlation with factor scores and GPA controlling for age resulted in r = .404, or pretty much the same as without controlling for age.

It's standard practice to residualize IQ scores for age before factor analysis, and this should have been done in your paper, too. However, given the small age range in your data, age effects cannot be large, so I don't think it's a big problem. You should add a mention of at least the GPA-IQ correlation adjusting for age.

I approve publication once you add some discussion of the effect of the age variable in the paper.
Dalliard,

I have added the following to the discussion section at the end:

Quote:One reviewer criticized the study for not regressing out the effect of age. One could do this, but given the small variation (SD = 1.25) of age in the sample, the expected effect size of age was minute. However, we did calculate the correlation of the IRT FA scores with age, which was .11. The partial correlation of the IRT FA scores with GPA controlling for age was virtually identical at .404.

I have updated the OSF with the new files.

---

D Wrote:It takes a dedicated pirate to copy, for example, the plastic blocks used in Wechsler's block design test, but okay.

It's not true that there's "no good reason" for psychometric tests not being free. The obvious advantage of a commercial IQ test is the accompanying normative data based on a representative national sample. It's not cheap for psychologists to administer tests one-on-one to thousands of people. The ICAR doesn't have this kind of normative data. Group differences cannot be properly studied without random samples, for one thing.

Oh, pirates are very dedicated. There's an entire Asian industry for fake smart phones for instance.

The things you mention could easily be done with less money than is currently spent on buying tests from the testing companies. The government can easily obtain random samples. Cognitive tests are not exactly difficult to make; pretty much no matter what you do, you end up with a g test, if I may cite Dalliard 2013. ;)

Besides, test companies are pretty reluctant to share their data, especially with regards to group differences. I recall that at least one testing company has a moratorium on sharing data for studying racial differences. If the government sponsored the data, it would be publicly available for any purpose.
I approve publication.

While it would make sense to have freely available, publicly funded IQ tests with good normative data, I like the fact that there are competing commercial operators making and selling tests. If the construction and standardization of tests was a government monopoly, the whole process could become politicized.
This looks okay now. Ready for publishing.
Last paragraph of introduction is sloppy. What you are doing is a psychometric validation of a Danish translation of psychometric tests originally made for English speakers. So instead of saying “we decided”, I’d say “the aim of this study is…” and instead of “to translate the above two tests into Danish and administer them to a student sample to verify that the tests function as expected”, say “to psychometrically validate the Danish translation of two tests using a student sample”. For example, see http://www.ncbi.nlm.nih.gov/pubmed/17516705 or http://www.pec-journal.com/article/S0738...9/abstract
The title should be edited accordingly. Instead of “examine”, say “Validating the Danish translation of the ICAR and CRT tests in a student sample”.
“we used another extraction method which by default is MinRes”: non R users do not know what minres stands for (unweighted least squares). This should be specified.
Piffer,

Thanks for taking the time to review this paper.

(2015-Jul-23, 16:02:51)Duxide Wrote: Last paragraph of introduction is sloppy. What you are doing is a psychometric validation of a Danish translation of psychometric tests originally made for English speakers. So instead of saying “we decided”, I’d say “the aim of this study is…” and instead of “to translate the above two tests into Danish and administer them to a student sample to verify that the tests function as expected”, say “to psychometrically validate the Danish translation of two tests using a student sample”. For example, see http://www.ncbi.nlm.nih.gov/pubmed/17516705 or http://www.pec-journal.com/article/S0738...9/abstract
The title should be edited accordingly. Instead of “examine”, say “Validating the Danish translation of the ICAR and CRT tests in a student sample”.
“we used another extraction method which by default is MinRes”: non R users do not know what minres stands for (unweighted least squares). This should be specified.

Title changed to:

Quote:Validating a Danish translation of the International Cognitive Ability Resource sample test and Cognitive Reflection Test in a student sample

Changed paragraph to:

Quote:Since we want to contribute to the on-going development of free psychology tools and have a Danish language test to use for future projects, the aim of this study was to psychometrically validate the Danish translation of two tests using a student sample.

Changed sentence to:

Quote:For this reason, we used another extraction method which by default is MinRes (minimum residual), but it does not appear to make a large difference which method is used.\cite{Kirkegaard2014The}

Let me know whether these edits are okay, then I will upload the new version.
(2015-Jul-27, 20:25:19)Emil Wrote: Piffer,

Thanks for taking the time to review this paper.

(2015-Jul-23, 16:02:51)Duxide Wrote: Last paragraph of introduction is sloppy. What you are doing is a psychometric validation of a Danish translation of psychometric tests originally made for English speakers. So instead of saying “we decided”, I’d say “the aim of this study is…” and instead of “to translate the above two tests into Danish and administer them to a student sample to verify that the tests function as expected”, say “to psychometrically validate the Danish translation of two tests using a student sample”. For example, see http://www.ncbi.nlm.nih.gov/pubmed/17516705 or http://www.pec-journal.com/article/S0738...9/abstract
The title should be edited accordingly. Instead of “examine”, say “Validating the Danish translation of the ICAR and CRT tests in a student sample”.
“we used another extraction method which by default is MinRes”: non R users do not know what minres stands for (unweighted least squares). This should be specified.

Title changed to:

Quote:Validating a Danish translation of the International Cognitive Ability Resource sample test and Cognitive Reflection Test in a student sample

Changed paragraph to:

Quote:Since we want to contribute to the on-going development of free psychology tools and have a Danish language test to use for future projects, the aim of this study was to psychometrically validate the Danish translation of two tests using a student sample.

Changed sentence to:

Quote:For this reason, we used another extraction method which by default is MinRes (minimum residual), but it does not appear to make a large difference which method is used.\cite{Kirkegaard2014The}

Let me know whether these edits are okay, then I will upload the new version.

Yes these are ok. I look forward to reading the new version so that I can give my approval.

Forum Jump:

Users browsing this thread: 1 Guest(s)