Back to [Archive] Post-review discussions

[ODP] Immigrant GPA in Danish primary school is predictable from country-level variab
Admin
I wrote to the DST to ask if they had more data about GPAs. They have replied to my email.

Kære Emil,
I vores offentliggørelse har vi fokuseret på de ikke-vestlige efterkommere. Det er ikke fordi de klarer sig dårligt, men først og fremmest fordi de antalsmæssigt er den klart største gruppe. 1. januar 2014 var der i alt 4.753 ikke-vestlige efterkommere på 16 år i den danske befolkning. Det tilsvarende tal for vestlige efterkommere var kun 371, mens der var 787 vestlige indvandrere. De viste lande er de oprindelseslande, som i perioden 2009-2013 har haft de største efterkommerbefolkninger, og vi har kun medtaget de lande, hvor vi vurderede, at befolkningen var tilstrækkelig stor til at beregningerne gav mening.
Tallene findes kun i publikationen Indvandrere i Danmark 2014 og er ikke tilgængelig i statistikbanken. Jeg har vedlagt tallene bag figurerne.
I publikationen findes de samlede tal for vestlige indvandrere og efterkommere.
I modsætning til efterkommere, der alle er født i Danmark, er indvandrere meget forskelligt sammensat mht. opholdstid. Nogle har været her fra de var små børn, mens andre først er kommet efter de var fyldt 10 år. Opholdstiden er et forhold, der også spiller en rolle for karakterer. Ud over et datamæssigt tyndt materiale er indvandrergruppernes forskelle mht. opholdstid en væsentlig grund til, at vi ikke har medtaget karakterer fordelt på oprindelseslande for indvandrere.

Med venlig hilsen
Thomas Klintefelt
Chefkonsulent
Befolkning
Tlf. 39 17 33 14
tmn@dst.dk


A quick translation:
Dear Emil,
In our publication we have focused on the non-western second generation. It is not because they perform badly, but first and foremost because they are numerically the largest group. Per January the 1st 2014 there were in total 4753 non-western second generations age 16 in the Danish population. The same number for western second generation was only 371, while there were 787 for western first generation. The shown countries are those countries of origin which in the period 2009-2013 had the largest second generation populations, and we have only included the countries where we judged that the population was sufficiently large for calculations made sense.

The numbers are only found in the publication Indvandrere i Danmark 2014, og is not available in the statistical database. I have attached the numbers used for the figures. The total numbers for western first and second generation are found in the publication.
In contrast to second generation all of who are born in Denmark, the first generation is very heterogenous regarding how many they have stayed in Denmark. Some have been here since they were small, while others came here after they were 10 year old. The duration of the stay is a factor that plays a role for grades. Besides the thin data material, then the differences regarding duration of stay is a significant part of the reason why we have not used the numbers for first generation.

Kind regards,
Thomas Klintefelt
Chief consultant
Population
Tlf. 39 17 33 14
tmn@dst.dk


In other words, there are no more data than the two datasets I have now. I will update the paper with the new data too.
Admin
I have updated the paper to draft #4. https://osf.io/p9d5z/

Changes: Reworked the table. Added analysis of second GPA dataset. Added restriction of variance analysis. Added discussion section. Some more references.

This is a big change. Paper should be re-approved.
You note that Pearson correlation is robust to non-normality. But I'm sure about what I say. Andy Field (2009) writes, in "Discovering Statistics Using SPSS ", for example :

Pearson’s correlation coefficient was described in full at the beginning of this chapter. Pearson’s correlation requires only that data are interval (see section 1.5.1.2) for it to be an accurate measure of the linear relationship between two variables. However, if you want to establish whether the correlation coefficient is significant, then more assumptions are required: for the test statistic to be valid the sampling distribution has to be normally distributed and as we saw in Chapter 5 we assume that it is if our sample data are normally distributed (or if we have a large sample). Although typically, to assume that the sampling distribution is normal, we would want both variables to be normally distributed, there is one exception to this rule: one of the variables can be a categorical variable provided there are only two categories (in fact, if you look at section 6.5.5 you’ll see that this is the same as doing a t-test, but I’m jumping the gun a bit). In any case, if your data are non-normal (see Chapter 5) or are not measured at the interval level then you should deselect the Pearson tick-box.
Admin
Nothing in the quoted is inconsistent with what I say. A given metric may rely on certain assumptions, e.g. normality. They can be more or less resistant to these assumptions not being met. In case of Pearson's correlations vs. Spearman's correlations, Pearson's are quite resistant to lack of normality. The results are very similar. I have not seen a case doing my research where they did not give similar results.
Admin
MH, any further objections to the revised version? You previously approved a prior version.

Can other reviewers comment on this submission?
The new immigration report has come out. Every year the Danish Statistics Agency releases a report on immigration in Denmark. This year they have included quite a lot of more useful data. As a new thing, it includes GPAs for some countries of origin, but only for a small number and only for second generation. I will update the paper with these new data and analyses of them.

http://www.dst.dk/da/Statistik/Publikationer/VisPub.aspx?cid=19004


To give some perspective, could you provide the standardized differences with respect to the indigenous Danish norm? This would better allow me to judge the generational effect. Is there no way to estimate how mix generation the first sample was? For example, comparing the first and second sample sample size.

One problem with interclass correlations is that it doesn't take into account the absolute differences. Thus, you might have a national-IQ second generation GPA correlation of 1, with trivial actual differences. Thus, providing standardized differences is helpful.
Emil,

I notice that girls do better than boys in the second dataset. Do we see the same sex difference in the first dataset or does the sex difference increase as one goes from the foreign-born to the Danish-born?
MH, any further objections to the revised version? You previously approved a prior version.

Can other reviewers comment on this submission?


I need to know which pages and/or sections have been modified.
One problem with interclass correlations is that it doesn't take into account the absolute differences. Thus, you might have a national-IQ second generation GPA correlation of 1, with trivial actual differences. Thus, providing standardized differences is helpful.


Emil, could you address this concern and the ones mentioned by PF and MH? Thanks.
Admin
I only today finished my exam, so I will reply to my reviewers tonight.
Admin
John,

The new immigration report has come out. Every year the Danish Statistics Agency releases a report on immigration in Denmark. This year they have included quite a lot of more useful data. As a new thing, it includes GPAs for some countries of origin, but only for a small number and only for second generation. I will update the paper with these new data and analyses of them.

http://www.dst.dk/da/Statistik/Publikationer/VisPub.aspx?cid=19004


To give some perspective, could you provide the standardized differences with respect to the indigenous Danish norm? This would better allow me to judge the generational effect. Is there no way to estimate how mix generation the first sample was? For example, comparing the first and second sample sample size.

One problem with interclass correlations is that it doesn't take into account the absolute differences. Thus, you might have a national-IQ second generation GPA correlation of 1, with trivial actual differences. Thus, providing standardized differences is helpful.


They don't report standard deviations, so it isn't possible to calculate Cohen's d.

This report concerning schools in the Copenhagen area report SD as being: 1.01, 1.09 and 1.11 for years 2012, 2013 and 2014. But this is a between school number, not between-student number.

I had a look at UNI-C data explorer. One can fetch the raw numbers for how many got each grade here.

Count Grade (x-mean)^2
405 -3 95.05
16428 0 45.56
56625 2 22.56
115647 4 7.56
143423 7 0.06
101669 10 10.57
63873 12 27.57
Weighted mean
6.75
Weighted var
11.61
Weighted SD
3.41


So, if I did this right, the gaps between groups are marked by orange in the spreadsheet I linked Peter to. Group difference d's are .50, .38, and .12.

I can calculate the predicted IQ gap by generation and age as well, but have not done so. This one should be compared with GPA to see how the estimated general intelligence difference compares with the grade difference.
Admin
Peter,

Sorry for the late reply. I was busy with an exam.

Emil,

I notice that girls do better than boys in the second dataset. Do we see the same sex difference in the first dataset or does the sex difference increase as one goes from the foreign-born to the Danish-born?


The first dataset does not have data broken down by gender. All the data there is, is in the paper.

UNI-C does not have data by specific countries of origin, only Danish/1. gen./2. gen. and sex. I have translated the data and put them here for you.

The overall gender difference is somewhat stronger in Natives: .8 vs. .6 and .6 for 1st and 2nd gen. This is somewhat surprising given that female immigrants do much better in Denmark. Perhaps there is still some suppression which is responsible for the .2 difference to Danish natives. Or maybe immigrants have a different difference between the genders.
John,

I can calculate the predicted IQ gap by generation and age as well, but have not done so. This one should be compared with GPA to see how the estimated general intelligence difference compares with the grade difference.


Can you include a brief comment on this?
Admin
Yes, give me a few days. One can fetch population data by the age group of 9th graders (about 15 years old) and by generation (first and second). These one can feed to the model discussed in me and Bo's immigration inequality paper to get an estimate of the IQ of this group. Then one can calculate the d value for IQ and compare it with the d values for GPA above.

GPA has to do with other things than general intelligence (e.g. academic interest and Cons.) which the groups probably do not differ so much in. So one needs the g-GPA correlation among 9th graders and then one can calculate the expected GPA gaps given only the modeled IQ difference.
Admin
The first dataset concerns 9th grade pupils both generations in the years 2007-2009.
The second concerns 9th grade second generation pupils in the years 2009-2013.

I fetched population data for both of these groups from the DST.

I had to estimate some missing countries e.g. Czechoslovakia, from Czech Rep. and Slovak Rep.

Output is:
09-13 group, 2nd gen:
Mean IQ: 87.2
IQ SD: 16.3

07-09 group, mixed gen:
Mean IQ: 87.3
IQ SD: 16.7

So, the IQ d gap is a bit less than 1 according to the modeling, depending on which SD one uses to calculate d.

What is the correlation of IQ x GPA in primary school? Perhaps .5? If so, these results are in the right ballpark.
The first dataset concerns 9th grade pupils both generations in the years 2007-2009.
The second concerns 9th grade second generation pupils in the years 2009-2013.

I fetched population data for both of these groups from the DST.

I had to estimate some missing countries e.g. Czechoslovakia, from Czech Rep. and Slovak Rep.

Output is:
09-13 group, 2nd gen:
Mean IQ: 87.2
IQ SD: 16.3

07-09 group, mixed gen:
Mean IQ: 87.3
IQ SD: 16.7

So, the IQ d gap is a bit less than 1 according to the modeling, depending on which SD one uses to calculate d.

What is the correlation of IQ x GPA in primary school? Perhaps .5? If so, these results are in the right ballpark.


OK, add a comment about this. There is nothing more that I would like you to add. After you make the appropriate additions, I approve publication. (Generally. I feel that all of the ST hypothesis papers should discuss generational effects.)
Admin
Often the data does not have generational information, so that makes it difficult to say much about the matter.
Often the data does not have generational information, so that makes it difficult to say much about the matter.


Edited.

Perhaps, yet it also makes it simple to handle the issue briefly. In such cases, you can just note that generational effects are an important issue, that they could not be explored with the present data set, and that exploration of these is an important area for future research.
Admin
Actually, the groups I calculated for above were the wrong ones if one wants to compare with the UNI-C data. The right groups are: first generation, 15 year olds, 2014 data; second generation, 15 year olds, 2014 data.

Results are:
First gen.
Mean IQ: 87.6484
IQ SD: 17.73221

Second gen.
Mean IQ: 86.75102
IQ SD: 16.46624

So little change in the means, but larger chance in SDs. There are fewer Western immigrants in the second generation, so this reduces the mean and the SD somewhat.

I have written a new version with all this stuff. I will have someone proofread it before posting it here.
Admin
Laird has been so kind as to proofread it. The latest draft (#5) is on OSF now along with the new source code etc.

https://osf.io/p9d5z/

The new draft has a new section re. the size of the GPA gaps, which includes both calculations on a country-level and at the total immigrant population level by generation.