Hello There, Guest!

[ODP] Crime, income and employment among immigrant groups in Norway and Finland

One can use deterministic imputation with multiple regression. For a given dataset, one uses every other variable to predict the values of that dataset perhaps including interactions. Then in the cases where a value is missing, one imputes the predicted value.
I remembered the first time I was using imputation, it was on AMOS. I have requested several data set, one by one. And when working with each of them, I had identical results. I discovered after this, that I was using the option "regression imputation" while the recommended option would have been "stochastic regression imputation" (even though AMOS is bad tool for making imputed data sets). In the latter option, you cannot have identical data set. And all data analysts will tell you not to work with imputation that is not "stochastic". In the first case, there is no random component (i.e., no error term), and you will under-estimate standard errors. When I said it's not possible to get identical data set, i was referring to the imputation with random component, as is usually recommended to do.
1) "Recent studies show that criminality and other useful socioeconomic traits"

I don't think criminality is a "useful socioeconomic trait." Perhaps "important social and economic characteristics"?

2) Use the equal or greater than sign (≥) rather than >=.

3) "Are some predictors just generally better at predicting than others, or is there an interaction effect between predictor and variables?"

Not sure what you mean by interaction here. The question is whether any of the predictors have unique predictive power.

4) "using multiple imputation8 to impute data to cases with 1 or fewer missing values"

How is it possible to have fewer than 1 missing values?

5) "Table 4 shows description statistics"

Descriptive statistics.

6) "the squared multiple correlation of regression the first factor on the original variables"

Word missing or something.
Thank you for reviewing again, Dailliard.

Quote: 1) "Recent studies show that criminality and other useful socioeconomic traits"

Removed "useful". The editor must have added it.

Quote: 2) Use the equal or greater than sign (≥) rather than >=.

Done.

Quote: 3) "Are some predictors just generally better at predicting than others, or is there an interaction effect between predictor and variables?"

Not sure what you mean by interaction here. The question is whether any of the predictors have unique predictive power.

An interaction would be that a given predictor P1 is better at predicting variable V1 than P2, but that P2 is better at predicting V2 than P1. A predictor x outcome variable interaction. This shows up as lower than |1| correlations between the prediction correlation vectors. However, surprisingly, they were all close or closish to 1.

For instance, one might have posited that Islam prevalence should be good at predicting crime due to incompatible religion/culture, but that it has little influence on male unemployment. This wasn't found. The IQ x Islam vectors correlate |.99| (shown in Table 2). Surprising to me. I remember when I first got my hands on the Danish data, I checked for this. Obviously, having read a lot about intelligence and education, I would expect IQ to be a better predictor than Islam, but perhaps the other way around for crime. However, it was pretty linear there too. Actually, I didn't think of this way of testing it before now. Perhaps I should add this to the reanalysis of the Danish data. The Danish data is much more suited for testing it since the number of outcome variables is much larger (9 vs. 25).

How would you like me to change this?

Islam does not correlate highly with the others (data from the DF.cor object in R):

Islam x IQ = -27
x Altinok = -43
x GDP = -14
x International S = -.33

so it can be combined fruitfully in multiple regression. For instance, I tried with IQ+Islam to predict S scores in Norway (imp. 3). R2 adjusted is .59, i.e. r=0.77 which is higher than any of the predictors alone (if one doesn't consider S scores from Denmark a predictor, it has .78).

Quote:4) "using multiple imputation8 to impute data to cases with 1 or fewer missing values"

How is it possible to have fewer than 1 missing values?

If there are no missing values for a case. Check the relevant code:

Code:
```#count NA's DF.norway.missing = apply(DF.norway, 1, is.na) #produces a col with T/F for each case DF.norway.missing = apply(DF.norway.missing, 2, sum) #sums the number of missing per col DF.norway.missing.table = table(DF.norway.missing) #tabulates them #subsets DF.norway.complete = DF.norway[DF.norway.missing <= 0,] #complete cases only, reduces N to 15 DF.norway.miss.1 = DF.norway[DF.norway.missing <= 1,] #keep data with 1 or less missing values, N=18 DF.norway.miss.2 = DF.norway[DF.norway.missing <= 2,] #keep data with 2 or less missing values, N=26 DF.norway.miss.3 = DF.norway[DF.norway.missing <= 3,] #keep data with 3 or less missing values, N=67```

Quote: 5) "Table 4 shows description statistics"

Fixed.

Quote: 6) "the squared multiple correlation of regression the first factor on the original variables"

Word missing or something.

Fixed to: "the squared multiple correlation of regressing the first factor on the original variables".

"Factor analytic methods require that there are no missing values. The easiest and most common way to deal with this is to limit the data to the subset with complete cases. This however produces biased results if the data are not missing completely at random ...For the above reasons, I used three methods for handling missing cases"

One of the MI assumptions is that data is MAR. Why would the possibility of MNAR then be reason to use it (as opposed to deletion), as your wording suggests?
(2014-Sep-17, 01:51:01)Emil Wrote: An interaction would be that a given predictor P1 is better at predicting variable V1 than P2, but that P2 is better at predicting V2 than P1. A predictor x outcome variable interaction. This shows up as lower than |1| correlations between the prediction correlation vectors. However, surprisingly, they were all close or closish to 1.

Perhaps someone here can translate for me ? I don't understand the entire sentence.
P1 = predictor 1, P2 = predictor 2, etc.
V1 = outcome var 1, V2 = outcome var 2, etc.
I performed the analogous analysis on the Danish data as mentioned above.

Code:
```> DF.Denmark.predict.cor            IQ Altinok Islam logGDP S.score IQ       1.00    0.99 -0.96   0.98    0.99 Altinok  0.99    1.00 -0.94   0.98    0.98 Islam   -0.96   -0.94  1.00  -0.94   -0.96 logGDP   0.98    0.98 -0.94   1.00    0.99 S.score  0.99    0.98 -0.96   0.99    1.00```

The results are even better than in the Norwegian data. The Danish data is larger (25 variables) and higher quality (age and sex controlled). Apparently predictors are substantially general, not very specific. Theories of their explanatory power should be general, not specific.
(2014-Sep-18, 17:10:48)Emil Wrote: I performed the analogous analysis on the Danish data as mentioned above.

Code:
```> DF.Denmark.predict.cor            IQ Altinok Islam logGDP S.score IQ       1.00    0.99 -0.96   0.98    0.99 Altinok  0.99    1.00 -0.94   0.98    0.98 Islam   -0.96   -0.94  1.00  -0.94   -0.96 logGDP   0.98    0.98 -0.94   1.00    0.99 S.score  0.99    0.98 -0.96   0.99    1.00```

The results are even better than in the Norwegian data. The Danish data is larger (25 variables) and higher quality (age and sex controlled). Apparently predictors are substantially general, not very specific. Theories of their explanatory power should be general, not specific.

Anyways, as I'm fine with the paper, I approve publication.
Here's a new revision.

It has a lot of smaller language changes, and the changes mentioned above. I have also added a short explanation of the predictor x outcome variable interaction idea that Dalliard noted wasn't well-enough explained. The appendix now contains a list of the variables in the Danish dataset too so that readers don't have to find the prior Danish study to know what I analyzed there.

Reviewers who ok'd this paper before the addition of the Danish re-analyses should re-read the submission and see if they still agree with publication.

Attached Files