Back to [Archive] Post-review discussions

[ODP] Criminality among Norwegian immigrant populations
Admin
The weighted results from dataset1 (the damaged one).


I'll ask a statistician friend about the weighting issue. In the meantime, I noticed that your crime variables were highly positively skewed. (Sorry!) Given this, you have to either:

( a) Transform your data e.g., log transform as shown in the example
( b) Use non parametric statistics
( c) Use robust statistics

Try (a). See example.


For option b, results are attached. I used Spearman's rho. This made the violent crime r's very high and also increased the r's for all crime.
The report can be found here: https://www.ssb.no/sosiale-forhold-og-kriminalitet/artikler-og-publikasjoner/kriminalitet-og-straff-blant-innvandrere-og-ovrig-befolkning

Direct download: https://www.ssb.no/emner/03/05/rapp_201121/rapp_201121.pdf

It is in Norwegian which I can read since I'm Danish.


Here are some national crime statistics. Would it be possible to add homicide rates (as this seems to be the least unreliably measured statistic)? If not, ok. Having looked at the international crime data, I am convinced that it is not very reliable -- that is, using national IQ is probably the least worse method.

https://www.unodc.org/unodc/en/data-and-analysis/statistics/data.html
[hr]
For option b, results are attached. I used Spearman's rho. This made the violent crime r's very high and also increased the r's for all crime.


Ok. I'm fine with you reporting, for this sample, r + rho or, alternatively, noting the discord between r and rho and that this is due to the non-normal distribution. I would note that others have used log transformations with similar situations. For example:

Perhaps surprisingly, this “unmeasured worker skill” estimate varies widely for immigrants from different countries. The standard deviation of log unadjusted wages is 0.29 across Hendricks’s sample of 76 countries, while the standard deviation of log unmeasured worker skill across these countries is still a sizable 0.19. Henceforth we refer to uwsi, the log of “unmeasured worker skill” in country i. (Jones and Schneider, 2010)

And I don't think that it would be much more trouble to add log variables in addition to untransformed ones to the correlation table.

I'm also now fine with you not including national crime statistics, etc. because these seem to be rather unreliable -- though you might check with the homicide rates I linked to.

This leaves the weighting issue. I'm waiting to hear back on this. This is the more important issue since it informs our general methodology. We should try to create a relatively consistent method. And I've been reporting weighted results. Part of the issue is that my samples are much different than yours. All of your samples have relatively high ns. For mine, I often have n=10, n=50, n=500, etc. That is, I have more of a sampling error problem. I'm not sure that you don't, though.
[hr]
[quote='Chuck' pid='180' dateline='1396458350']
The report can be found here: https://www.ssb.no/sosiale-forhold-og-kriminalitet/artikler-og-publikasjoner/kriminalitet-og-straff-blant-innvandrere-og-ovrig-befolkning

Direct download: https://www.ssb.no/emner/03/05/rapp_201121/rapp_201121.pdf


Ok. I apologize. I didn't carefully read your paper. I just looked at the tables. I assumed that your murder rate variable was a migrant not national one. This is fine -- the best approach. Apologies. I will be sure to more carefully read the papers next time.
Admin
Ok. I apologize. I didn't carefully read your paper. I just looked at the tables. I assumed that your murder rate variable was a migrant not national one. This is fine -- the best approach. Apologies. I will be sure to more carefully read the papers next time.


No apology needed.

There are murder rates for migrant groups in Norway in the stat agency (Denmark too). But given the rarity of murders, this will have a huge sampling error when the groups are less than many thousands. For this reason I have not previously used such a measure.

Updated. This was wrong. It is not possible to get the migrant murder rate in Norway. One can only fetch data for various categories (all in Norwegian).

I checked the numbers for murder in Denmark (intentional murder, "first degree"). Even using data for all years 2000-2012 would give useless, unreliable data. Almost all the years are 0 murders for almost all groups. Correlating this would not be worthwhile.
Data error. If you look in the excel datafile, you can see that there is only crime data for years 2002 and 2003 for Mongolia. 156 and 109 charges. The population of Mongolians in these two years are 10 and 15, giving a charge per capita of 15.6 and 7.3! (cells M54 and N54 before removal). The second highest country had a charge per capita of .79 (Georgia) which as noted is already about 71 times that of the lowest country with .01 (Thailand)....


Let me reply to a few points.

You said: "The stats agency only reports crime rates for the countries with larger populations, so sampling error should not be so large."


Ok. There are two sources of sampling error with this type of study. First, the migrant sample rates (i.e., number of criminals caught ) as an index of the true population rates in country X. So, if you have 500 Syrians you are less likely to get a true estimate of e.g., the true migrant population murder/criminal rates than if you have 10000 Syrian, because one uncaught Syrian murder/criminal will have a much greater impact if the sample size is low. Dealing with low probability outcomes like crime/murder greatly worsens this problem. As for the second source, the migrant population in country X as an index of true migrant performance, were migrants unselected. Large migrant samples will not eliminate this second type of sampling error, it will just reduce the chances of it-- since the greater the migrant population the more likely that the migrants are unselected (taking into account national population size).

Because of these two types of sampling error, I suggested weighting.

You said: I'm not familiar with any cross-country comparable crime statistic.

What I mean is that you should briefly mention the correlation between homicide rates and IQ on the national level to give some perspective. See LV 2012 page 271. You should do this since your NIQ- MigrantHomicide correlation is limited by the NIQ-NHomicide correlation, which is between 0.2 and 0.8, methodology depending. Also doing this gives some background.

You said: Can you explain? in response to my statement "You could just compute the cross year joint probability."

You were saying that it is improbably that the rates would be consistently negative across years if there was no true association. I replied that you could compute the joint probability to test this point. You can use Fischer's test:
https://en.wikipedia.org/wiki/Fisher%27s_method
It's not a big deal -- don't bother.

You said: What is wrong with the current formulation [next up]?

This was too informal.
Admin
"How about I add a paragraph noting that the author draws no particular causal conclusion from the paper and that terms are meant as descriptors only."

It would be sufficient to replace the terms "Islam", "prevalence of Islam" and "Islam belief" with the term "Muslim origin". It is possible to talk about non-practising Muslims. It is impossible to talk about non-practising Islam.


I think that it is better to use "Islam" as it is shorter and really not that confusing. It's mostly just semantics.

But to accommodate the reviewer I have changed the wording to "prevalence of Muslims".

Updated draft attached. It also includes updated tables with better variable descriptions and a correlation table for Spearman's rho.
Admin
You said: I'm not familiar with any cross-country comparable crime statistic.

What I mean is that you should briefly mention the correlation between homicide rates and IQ on the national level to give some perspective. See LV 2012 page 271. You should do this since your NIQ- MigrantHomicide correlation is limited by the NIQ-NHomicide correlation, which is between 0.2 and 0.8, methodology depending. Also doing this gives some background.


I can do this, but I don't think it's necessary. In general, I like to keep my writing straight to the point for most papers and leave the perspectiving to other papers. Research papers are often unnecessarily long due to the authors repeating the perspectives all the time. I don't want to contribute to that trend.

You said: Can you explain? in response to my statement "You could just compute the cross year joint probability."

You were saying that it is improbably that the rates would be consistently negative across years if there was no true association. I replied that you could compute the joint probability to test this point. You can use Fischer's test:
https://en.wikipedia.org/wiki/Fisher%27s_method
It's not a big deal.


Seems like a waste of space to include such a calculation.

You said: What is wrong with the current formulation [next up]?

This was too informal.


I think it is fine as it is.

---

See also the new draft above, which includes changes that you requested.
You said: I'm not familiar with any cross-country comparable crime statistic.

What I mean is that you should briefly mention the correlation between homicide rates and IQ on the national level to give some perspective. See LV 2012 page 271. You should do this since your NIQ- MigrantHomicide correlation is limited by the NIQ-NHomicide correlation, which is between 0.2 and 0.8, methodology depending. Also doing this gives some background.


I can do this, but I don't think it's necessary. In general, I like to keep my writing straight to the point for most papers and leave the perspectiving to other papers. Research papers are often unnecessarily long due to the authors repeating the perspectives all the time. I don't want to contribute to that trend.
....
See also the new draft above, which includes changes that you requested.


I saw it -- it is fine as it is. I just wanted to explain what I was thinking.
Don't bother with the joint correlation stuff.
As for mentioning the NIQ-Ncrime correlation, I see what you're saying. And I agree about getting to the point. If you are going to discuss this in a follow up paper, I'm fine with your discussion.

This leaves weighting. How about:

"Puzzled by the results, I checked the distribution of the crime data, which was very skewed to the left. I normalized the data by taking the log-10 and reran the correlations (results in 1). Correlations were somewhat higher with this, but still not at the level of the results from Denmark. A reviewer suggested that the low population sizes for some migrant groups could result in sampling error, especially given that low probability outcomes are being studied. In short, for small migrant groups, recorded delinquency rates might not reliably index true migrant group delinquency rates; moreover, small population migrant groups can more often be less representative of nation of origin populations than can large population migrant groups. The reviewer suggested that to reduce sampling error the results could be weighted by the square root of the migrant population size. The weighted results are presented below the diagonal."

Also, I wouldn't present crime data for each year. Just all crime and violent crime rates. See attached example. For weighted results you would have to manually calculate p-values using N=55.
http://www.danielsoper.com/statcalc3/calc.aspx?id=44
Admin
I'm still not convinced that weighting is legit the way SPSS does it. I noticed that it also changes all the other correlations for some reason, by increasing all of them basically. Obviously, if it was done correctly, then weighting the r (crime rate x national IQ) would have no effect on e.g. national IQ x national GDP.

I think it may be wise to switch to another program to perform this, or plainly switch to using a programming language.
I'm still not convinced that weighting is legit the way SPSS does it. I noticed that it also changes all the other correlations for some reason, by increasing all of them basically. Obviously, if it was done correctly, then weighting the r (crime rate x national IQ) would have no effect on e.g. national IQ x national GDP.

I think it may be wise to switch to another program to perform this, or plainly switch to using a programming language.


I can't require you to use weights, since I can find no published justification for my proposed method. So, I am ok with your analyses as they are. I think you should clean up some of your tables, though. Beyond that, I feel that this paper is publishable.
Admin
I'm still not convinced that weighting is legit the way SPSS does it. I noticed that it also changes all the other correlations for some reason, by increasing all of them basically. Obviously, if it was done correctly, then weighting the r (crime rate x national IQ) would have no effect on e.g. national IQ x national GDP.

I think it may be wise to switch to another program to perform this, or plainly switch to using a programming language.


I can't require you to use weights, since I can find no published justification for my proposed method. So, I am ok with your analyses as they are. I think you should clean up some of your tables, though. Beyond that, I feel that this paper is publishable.


I looked into using a programming language for doing the calculations. R is such a language and it includes a premade command for calculating weighted correlations. Next step is to compute weighted correlations using SPSS and R and see if the results match up. If they don't something is amiss.

http://cran.r-project.org/bin/windows/base/

Which tables do you have in mind?
Which tables do you have in mind?


Everyone with redundant information. Copy the SPSS R-matrices to Excel, delete the correlations under the diagonal, then take a screen shot of that. Also I would NOT show the correlations for the specific years. I asked Meng Hu to read your paper over and he also felt the tables were overfilled.

I thought of a new method for migrant analyses. Unfortunately, I've been under the weather, so I haven't had a chance to try it, let alone rework the NLSF paper. If you have some time on your hand let me know and I will explain. If it works it should allow one to go through a large number of PISA-like surveys quickly and do much more comprehensive analyses.
Admin
Which tables do you have in mind?


Everyone with redundant information. Copy the SPSS R-matrices to Excel, delete the correlations under the diagonal, then take a screen shot of that. Also I would NOT show the correlations for the specific years. I asked Meng Hu to read your paper over and he also felt the tables were overfilled.

I thought of a new method for migrant analyses. Unfortunately, I've been under the weather (to put things euphemistically), so I haven't had a chance to try it, let alone rework the NLSF paper. If you have some time on your hand let me know and I will explain. If it works it should allow one to go through a large number of PISA-like surveys quickly and do much more comprehensive analyses.


I have remade most of the tables. Correlation tables now feature both Pearson and Spearman correlations. Regression tables summarize all results more efficiently.

I have removed the correlation table for interyear correlations (one might also put it in the appendix, but since we have open data policy, this doesn't seem necessary).
[quote='Emil' pid='204' dateline='1396595892']
Which tables do you have in mind?
I have remade most of the tables. Correlation tables now feature both Pearson and Spearman correlations. Regression tables summarize all results more efficiently. I have removed the correlation table for interyear correlations (one might also put it in the appendix, but since we have open data policy, this doesn't seem necessary).


Ok. Looks good. This should be published.
Admin
Three reviewers (Chuck, Philbrick, Duxide) have expressed consent to publish.
Admin
The author's final edit.
Admin
http://blogs.discovermagazine.com/neuroskeptic/2014/03/26/ugly-ducklings-science/

This is the kind of stuff I had in mind when I talking about endlessly reanalyzing data to get something out of it.