Back to [Archive] Post-review discussions

[ODP] Crime, income and employment among immigrant groups in Norway and Finland
Admin
Author
Emil OW Kirkegaard

Abstract
I present new prediction analyses for immigrant crime, income, educational achievement and employment in Norway and Finland. Results are in line with previous studies. Typical correlation sizes are around .4-.6 indicating that immigrant performance at the group level is substantially predictable by their countries of origin.

Key words
National IQs, group differences, country of origin, Norway, Finland, immigration, crime, spatial transferability hypothesis, income, employment, educational achievement

---

This is the first draft. I wanted to keep it short. It is a replication article with some new datasets. Nothing much of interest aside from the fact that the spatial transferability hypothesis successfully replicated in new high-quality datasets.

---

I accidentally forgot to save my R file when moving stuff. I have recreated it from the history.
Several little things, before I give you my approval. Table 1 is troublesome. The numbers are just too small compared to the size of the text. Especially the rows "n".

I took a closer look at Statistisk Sentralbyra's (SSB) 2 website for data that was useful for testing the spatial transferability hypothesis.


No one can understand what is ST hypothesis if you don't explain it (with few words) or at least refer to a paper/article, or something.

I have the feeling this sentence has a word or two that has gone missing somewhere :

The big sample only educational attainment variable had higher correlations.
Admin
Hi MH,

Thank you for reviewing my paper.

Several little things, before I give you my approval. Table 1 is troublesome. The numbers are just too small compared to the size of the text. Especially the rows "n".


It is hard to fit the table on the page. The width is already at maximum. I made the sample sizes (n's) smaller so to reduce the vertical size of the table. The n's are currently in size 7. Which size would you like me to change them to?

The table is here: https://docs.google.com/spreadsheets/d/1Y89ZSuTgaJwwLjWsfIi75E-x9myxVtSg74cRIIctPok/edit#gid=1029777982

No one can understand what is ST hypothesis if you don't explain it (with few words) or at least refer to a paper/article, or something.


Would you be satisfied with the following two paragraphs added to the introduction?

The theoretic background for testing country-level predictors is the spatial transferability hypothesis.\cite{Fuerst2014Do} Briefly, the idea is that 1) countries' performance on a variety of variables is to some degree caused by the psychological attributes of the people living in the countries. 2) When people move to another country they retain their psychological attributes to some degree which will be reflected on psychological tests. 3) Moreover, in the receiving countries the psychological attributes of the people cause to some degree their relative performance on a variety of socioeconomic variables such as crime, educational attainment, income, and employment rate.

For instance, when people from a e.g. poor country move to another country, they will tend to be relatively poor in that country as well. This is because part of the reason the country is poor is that the people living there are low in general intelligence (and possibly other psychological attributes too). When they move to a new country, they will generally still be low in general intelligence, and this will cause them to be relatively poor in that country as well. This is of course still allowing for other causes (e.g. culture that people tend to bring with them) as well as improvements on an absolute scale. Somalians living in Denmark are far richer than those who stayed in Somalia, but Somalians in Denmark are relatively poor compared to Danes who live in Denmark just as Somalia is relatively poor compared to Denmark.


I have the feeling this sentence has a word or two that has gone missing somewhere :


The sentence is correct. Think of it this way:

(The big sample only educational attainment variable) had higher correlations.

However, I rewrote it to:

Findings of note include: Violent crime was easier to predict than property crime, just as in the Danish dataset.\cite{kirkegaard2014DK} The bad performance of Altinok in some cases seems to be due to sampling error. The educational attainment variable which includes only large samples ("Tert. ed. NO big") had higher correlations than the one with smaller samples too. This is probably because the smaller ones introduced sampling error. Islam was a better predictor of female unemployment than of male, which perhaps has something to do with the role of women in Islam.


Let me know what you think.
I agree with all of your changes. For ST hypothesis, I didn't mean you should write that much, just only one sentence or two. For the table, I suggest you make the size of the "n" rows as large as the size of the rows "IQ" "Altinok" "Islam" "GDP".

These are only minor changes, and I don't disagree with your analyses or your interpretation. I give you my approval.
Admin
This one has:
- A new correlation matrix with larger text for n's. Everything is now size 9.
- Two more paragraphs in the introduction.
1) The 'draft' watermark is REALLY irritating when trying to copy text. Don't use it.

2) Figures 1 and 2 are under copyright and probably cannot be used under fair use.

3) "These findings indicate that the same areas of origin tend to be criminal in both Norway and Finland."

Areas are not criminal, people from those areas are.

4) "population sizes were not sorted by age, so I had to use the entire age group. This introduces error if the age population structures are"

Poorly worded. Also, aside from age, there are also differences in sex ratios.

5) "Pew Research's Islam rate."

"Islam rate" is meaningless. Clarify.

6) "the sample of countries was not so large as to introduce significant sampling error in estimates."

Clarify why increasing N increases sampling error.

7) I assume you used crime rates from Table 2 in Skardhamar et al. These rates are unadjusted for sex and age, unlike their Figure 1. Because sex and age are important causes of crime, not adjusting for group differences in them leads to biased estimates. You can get adjusted values from their Table 4 by converting the M2 odds ratios to d values (or, more roughly, by using sex and age as reported in Table 1 as covariates). The unemployment, education, and income results should also be adjusted for age and/or sex but I would guess it's not easy to do.

8) When you have four predictor variables, each of which predicts about equally well, the study seems incomplete without multiple regression analyses where the effect of each predictor is examined when holding the others constant. The unemployment, education, and income data are confounded by age and sex differences between groups, so even the zero-order results regarding them are dubious, but MR could be used on the sex- and age-adjusted crime rates. (Then again, the crime data have small Ns. Perhaps just run an analysis where one variable is held constant, e.g., what's the correlation with IQ when the effect of Islam is removed, and vice versa.)

9) In the references list, capitalize country names and nationalities. Your Norwegian immigrant paper is marked as "submitted", but it was already published.
Admin
Hi Dalliard,

Thank you for reviewing my paper.

1) The 'draft' watermark is REALLY irritating when trying to copy text. Don't use it.


You can just copy the text from the source file (article.tex). The reason to use the draft watermark was discussed here: http://www.openpsych.net/forum/showthread.php?tid=119

2) Figures 1 and 2 are under copyright and probably cannot be used under fair use.


The journal is hosted in Denmark and from my reading of the Danish copyright law, it is clearly a legal case of quotation.

3) "These findings indicate that the same areas of origin tend to be criminal in both Norway and Finland."

Areas are not criminal, people from those areas are.


Rewrote to: "These findings indicate that people from the same areas of origin tend to be criminal in both Norway and Finland."

4) "population sizes were not sorted by age, so I had to use the entire age group. This introduces error if the age population structures are"

Poorly worded. Also, aside from age, there are also differences in sex ratios.


I have rewritten it to:

"The reason it is a pseudo per capita is that population sizes were not available by age groups, so I had to use the entire age group even though the educational attainment data concerned only people aged 16 and above. This introduces error if the age population age structures are different. The data are also not broken down by gender so there is possibly gender ratio bias as well."

Does this work for you?

5) "Pew Research's Islam rate."

"Islam rate" is meaningless. Clarify.


Doesn't seem meaningless to me. It is the per capita rate, i.e. percent of population who believes in Islam. https://en.wikipedia.org/wiki/Islam_by_country#Table

I changed the wording to "Pew Research's Islam prevalence in percent.".

Does this work for you?

6) "the sample of countries was not so large as to introduce significant sampling error in estimates."

Clarify why increasing N increases sampling error.


It is indirectly mentioned earlier: "The first variable includes all groups with a population $>=200$. The second only includes groups with $>=1000$ such as to reduce sampling error."

Including many countries means the samples must be smaller. This introduces sampling error.

7) I assume you used crime rates from Table 2 in Skardhamar et al. These rates are unadjusted for sex and age, unlike their Figure 1. Because sex and age are important causes of crime, not adjusting for group differences in them leads to biased estimates. You can get adjusted values from their Table 4 by converting the M2 odds ratios to d values (or, more roughly, by using sex and age as reported in Table 1 as covariates). The unemployment, education, and income results should also be adjusted for age and/or sex but I would guess it's not easy to do.


I used Table 2 data, yes.

Adjusting for sex and age statistically overdoes the adjustment. If you look at their Table 4, M2 (age+sex adjust in logistic regression) has a rate of .7 for Afghans in Norway, lower than natives (1.0). That's not right. Afghan men aged 20-29 do not have lower crime rates than Norwegian men aged 20-29. (Looking at women is uninteresting since they commit only 10% of crimes or something).

The Danish dataset allows one to examine the effects of statistically controlling for age and sex while also doing it manually (by limiting the sample to men in some age group).

What one want to do is compare similar age groups as I did in the Danish study. I have asked SSB (Norway Statistics) for these data: men, age 15-19, 20-29, per capita crime rates by country of origin. It may take some time for them to give it to me. It may cost a lot of money. Basically, SSB is given a government monopoly on access to the data, so they can set the prices any way they want.

Perhaps they will give me the data soon, then we can compare the statistically adjusted values with the real ones.

8) When you have four predictor variables, each of which predicts about equally well, the study seems incomplete without multiple regression analyses where the effect of each predictor is examined when holding the others constant. The unemployment, education, and income data are confounded by age and sex differences between groups, so even the zero-order results regarding them are dubious, but MR could be used on the sex- and age-adjusted crime rates. (Then again, the crime data have small Ns. Perhaps just run an analysis where one variable is held constant, e.g., what's the correlation with IQ when the effect of Islam is removed, and vice versa.)


I did some MR analyses, but it didn't seem very interesting, so I left it out.

I used automatic modelling in R, using AIC and BIC. AIC results always included Islam and some at least one of the others. BIC always resulted in the model of Islam+GDP. This was also the result in my previous Danish study when I compared adjusted R^2 values.

9) In the references list, capitalize country names and nationalities. Your Norwegian immigrant paper is marked as "submitted", but it was already published.


Fixed.
1. "Islam was a better predictor of female unemployment than of male, which perhaps has something to do with the role of women in Islam."

That sounds a bit coy. Why not write: "which may be related to the role of women in Islam"

2. "I changed the wording to "Pew Research's Islam prevalence in percent."

It would be better to write "prevalence of Muslim adherents (as estimated by the Pew Research Center)". The Pew Research Center classifies people as Muslim even if they are only nominal Muslims and do not follow the precepts of Muslim law.

3. Does "area of origin" mean last country of residence, country of birth, or country of ancestral origin? I've been told that many "British" immigrants to Norway are actually of Pakistani origin. How would they be classified in your analysis?
Admin
It is probably last country of residence. I don't know exactly. Pakistani who went to UK and then to Norway would probably be classified as UK.
Admin
This version has the fixes that Dalliard and Peter Frost suggested.

The lack of capitalization in the references list is a LATEX feature (it strips it as a feature). However, we found a way to get around it. http://tex.stackexchange.com/questions/10772/bibtex-loses-capitals-when-creating-bbl-file
I cannot find anything wrong with this paper. I approve publication. My only quibble is that it should be a brief communication, as it doesn't add a new perspective compared to the papers the author has previously published, but it simply extends them.
Admin
Brief communication is fine with me. I wanted to keep it short and to the point, given that it's just new data, not new theory.
I approve. The following is a list of suggested corrections:

Page 1, line 1 - replace "prediction analyses" with "predictive analyses"
Page 1, line 8 - replace "socioeconomic variables" with "socioeconomic traits" or "socioeconomic attributes"
Page 1, line 11 - replace "theoretic" with "theoretical"
Page 1, lines 12-13 - replace with "a country's performance on a variety of variables is to some degree due to the average psychological makeup of its inhabitants"
Page 1, line 14 - replace "degree which will be reflected" with "degree, as reflected"
Page 1, line 16 - replace "cause" with "determine"
Page 1, line 18 - delete "e.g." (this is redundant because you already wrote "For instance")
Page 1, lines 20-21 - you might want to mention some of these other attributes, e.g., time preference, impulse control, anger threshold, monotony avoidance, affective empathy, etc.
Page 1, lines 24 and 25 - replace "Somalians" with "Somalis". The entire sentence could be rewritten as:
"Somalis living in Denmark are far richer than those who have stayed behind in Somalia, but they are nonetheless poorer than ethnic Danes, just as Somalia is poorer as a whole than Denmark."

Page 2, figures 1 and 2 - delete "groups"
Page 2, lines 2 and 3 - replace "tend to be criminal" with "are similarly predisposed to criminal behavior."
Page 2, line 5 - replace "was useful" with "could be useful"
Page 2, line 6 - replace "must concern" with "must inform us about"
Page 2, line 7 - replace "for" with "on"
Page 2, lines 7-8 - replace with "with a large enough sample of countries, i.e., more than 10"
Page 2, line 11 - correct the spelling, i.e. "datasets"

Page 3, line 4 - replace "in percent" with "as a percent"

Page 5, line 2 - replace "is available" with "are available" and add an 's' to "source code"

References - go through the references and capitalize proper nouns, e.g. "Denmark", "Danish", "Norwegian", etc.
This version has the fixes that Dalliard and Peter Frost suggested.

The lack of capitalization in the references list is a LATEX feature (it strips it as a feature). However, we found a way to get around it. http://tex.stackexchange.com/questions/10772/bibtex-loses-capitals-when-creating-bbl-file


You will have to post a version without a water mark if you want a more thorough commentary.

1. Rewrite the Abstract e.g., "[The] results", "their nation or origin [characteristics]", "I
present new [finding] regarding [rates] of crime, income ... for different immigrant [groups decomposed by nation of origin]."
2. In P1, "This study is a step in that direction"; Also please cite the Danish happiness paper e.g., "see also [4]."
3. In P2, "The psychological attributes of the individuals living" ("People" can refer to the collective attributes)
4. In p2 "Which will be reflected on outcome measures such as psychological tests or SES indexes" (it's not just tests).
5. In P4. "are low in the relevant behavior traits" (Don't pigeon hole the theory.)
6. In p4. How about: "The STH doesn't specify a cause for the national group differences; these could be due to cultural or other factors. This hypothesis also allows migrant absolute performance to increase; it merely predicts that migrants will carry with them certain national differences because these differences represent or result from temporally stable behavioral ones at the individual level. To be clear, this need not be the case. Measured national differences could be due to psychometric bias or they could be group level phenomena. For example, national differences in crime rate or in Happiness might be due to interactive factor in the nation of origin that can't be carried along by migrants by virtue of being on the national level."

7. In P7, "To be useful, the data must [include].

"Tertiary educational attainment per capita for persons aged 16 and above in 2013.5 This table was in absolute numbers so I supplemented it with [the] population size by country of origin to calculate a pseudo per capita value."

"This introduces error if [the national population age and migrant group age structures are different.]"

rewrite "the bad performance of Antinok" e.g., "The poor predictive power of Antinok et al.'s national achievement scores seems to have been largely due to sampling error."

rewrite: "Figures show three plots". e.g., "The figures below show plots for ... "

...

No conclusion? Give at least a couple of sentences.

"This study has replicated and extended results of previous studies. More support for the STH with regards to X,Y, and Z has been found. On the international level X,Y, and Z have been found to be .... "
No conclusion? Give at least a couple of sentences.


Why need for conclusion ? It's just a replication, as explicited in the intro. He gave the references for previous studies, so people can look at them and read the discussion section from these papers, no ? I don't think it needs conclusion if the purpose is to rewrite what has been written in previous articles.
No conclusion? Give at least a couple of sentences.


Why need for conclusion ? It's just a replication, as explicited in the intro. He gave the references for previous studies, so people can look at them and read the discussion section from these papers, no ? I don't think it needs conclusion if the purpose is to rewrite what has been written in previous articles.


If that's kosher here, so it is. It just strikes me as being odd. Can you show me a replication study published elsewhere that doesn't have a conclusion/discussion? That's what mean.

...

Emil, can you make the corrections mentioned? Let's get this process moving along.
Admin
Yes, I will make the corrections. I will also add some notes about limitations. Most of the datasets are not age and/or sex corrected, which introduces error.
1) The 'draft' watermark is REALLY irritating when trying to copy text. Don't use it.


You can just copy the text from the source file (article.tex). The reason to use the draft watermark was discussed here: http://www.openpsych.net/forum/showthread.php?tid=119


Have you tried copying text from the watermarked pages? Very annoying. How about you just write DRAFT at the top of the first page?

2) Figures 1 and 2 are under copyright and probably cannot be used under fair use.


The journal is hosted in Denmark and from my reading of the Danish copyright law, it is clearly a legal case of quotation.


I wouldn't be sure of that, but it's your call. I don't think the figures are necessary.

6) "the sample of countries was not so large as to introduce significant sampling error in estimates."

Clarify why increasing N increases sampling error.


It is indirectly mentioned earlier: "The first variable includes all groups with a population $>=200$. The second only includes groups with $>=1000$ such as to reduce sampling error."

Including many countries means the samples must be smaller. This introduces sampling error.


Yes, I understand that but it's poorly explained in the paper. You should mention the sample size problem again when talking about the second "condition" for adequate predictions.

I used Table 2 data, yes.

Adjusting for sex and age statistically overdoes the adjustment. If you look at their Table 4, M2 (age+sex adjust in logistic regression) has a rate of .7 for Afghans in Norway, lower than natives (1.0). That's not right. Afghan men aged 20-29 do not have lower crime rates than Norwegian men aged 20-29. (Looking at women is uninteresting since they commit only 10% of crimes or something).

The Danish dataset allows one to examine the effects of statistically controlling for age and sex while also doing it manually (by limiting the sample to men in some age group).

What one want to do is compare similar age groups as I did in the Danish study. I have asked SSB (Norway Statistics) for these data: men, age 15-19, 20-29, per capita crime rates by country of origin. It may take some time for them to give it to me. It may cost a lot of money. Basically, SSB is given a government monopoly on access to the data, so they can set the prices any way they want.

Perhaps they will give me the data soon, then we can compare the statistically adjusted values with the real ones.


Without adjustment for sex + age your estimates are biased in favor of the spatial transferability hypothesis. So if you cannot adjust your estimates for this, you should at least emphasize in the paper that the results are questionable because of this.

I did some MR analyses, but it didn't seem very interesting, so I left it out.

I used automatic modelling in R, using AIC and BIC. AIC results always included Islam and some at least one of the others. BIC always resulted in the model of Islam+GDP. This was also the result in my previous Danish study when I compared adjusted R^2 values.


Automatic modeling seems iffy especially with such limited data. I think you should report at least predictor intercorrelations and selected, theoretically sensible MR results.
Admin
Dear reviewers,

I have made significant findings using the datasets and it will necessitate heavily expanding the paper. Briefly, I used factor analysis to find an S factor among the groups (yes it is there), quantified the strength of it using multiple measures (very strong), calculated the cors with predictors and the S factor (pretty high, IQ -0.69, Islam 0.81, GDP-0.45, International S -0.68), and correlated the vectors of cors of each predictor with each other to see if the same variables were highly/not very highly predictable using different predictors (they were, cors around .9).

I ran FA using complete cases (N=15), which is known to be suboptimal. I will look into better ways to impute missing data.

And before that I discovered that I had made huge errors in inputting the data from Skardhamar. I have now fixed this, altho it did not change results that much. I used the sex and age adjusted data.

Various small changes to language (e.g. added sample sizes to the introduction of the datasets).

A new expanded draft will follow shortly, 1-3 days.