[ODP] Immigrant GPA in Danish primary school is predictable from country-level variab
Some errors below :

Correlation analyses shows that it is highly predictable from country-level variables:

In general, area is a vague term to me. Country would have sufficed.

which on page 13 lists the grade point average (GPA) of children by their mother's birth country/area.

Finally, when you mention Malloy, you should probably cite a reference. Fuerst used L&V with Malloy's correction to L&V in one of his first papers for ODP. You can cite it.

When you do correlation, I suggest you talk some more about normality on your data. You rarely do that. You can easily make histogram, stem-and-leaf, P-P and Q-Q plot on R. And remember that outliers can be very influential when sample size is small.
You know that I always hated significance test. S-W is useless because when N is large, the p-value will always be lower than 0.05, let alone the problem of arbitrary cut off values for significance. I don't understand why you keep using it. I will never approve a paper that uses S-W to examine normality, instead of histogram, P-P and Q-Q plots. As I said, look at histogram, P-P plot and Q-Q plot. That's all you need. If you don't want to display all the graphs but only one, I think you should probably select histogram.

By the way, If my memory is correct, the spatial transferability theory states that immigrant IQ will stay the same, and that the country of origin will predict test performance. Your paper says the result is consistent with ST theory, but it's just a correlational analysis, so it's only about the second prediction of the theory.
It will only be so if the data is non-normal.

Of course not. I saw it many times my variables normally distributed when looking at histogram and P-P plot, and yet S-W says otherwise. Like i said: it's p-value. Your argument is similar as saying that if p is significant, it can only be so if the effect size is large. Yet that is not true, and a lot of examples show that the effect size can be close to zero but p is significant. One example you have is from Pekkala Kerr et al. (2013) "School tracking and development of cognitive skills". They say that schooling reform shows no transfer effect, just because some tests are significantly improved, and others not. But when you calculate effect sizes, which they don't, the d gaps are between 0.00 and 0.03 (or 0.04). In my opinion, it's not different than to say the effect size is zero for each test.

It's dangerous to use significance tests. I always said it, and I will always repeat it.

And even if you trust W value, I don't trust cut off values. What is the .99 really means, given the operation to get W value, which is given here ?

So, what does that mean when you have 0.95 or 0.96 instead of 0.99 ? How can you judge that ? If you think eye-balling is not nice, I will say cut-off values are not better.

If the IQs changed, but stayed in the same relatively order, then correlation analysis will not detect it, that's right. ST hypothesis says they will generally stay the same, which also implies the order will stay generally the same, and for this reason the usual correlates of IQ will be found.

If IQ changes and rank order are not the same thing, and you see that rank order remains the same, you cannot conclude that IQ has not changed. This should be made clear.

---

I don't mind if you use both S-W and histograms/plots. But make sure you don't rely on p values.
The p value will only go down if the data are non-normal.

First, you say that p goes down only if the variable is not normally distributed. Then, you say it can go down if N is large (and you show it in your blog post). So, it's not "only if" the variabe is not normal. That's the problem I always pointed out.

In the case of GPA here, however, it is not very normal. I tried log-transforming it, but it is slightly less normal after that (Ws .9251 before, .9124 after). What do you want me to do with it?

Sometimes, transformation does not work because the data follows a specific distribution (e.g., poisson). In such a case, use poisson regression. Sometimes, the data is truncated or censored. Then use truncated or tobit regression. You have used Spearman's rho, and it is a valid test here, given your histogram. But sometimes, even Spearman's test is not appropriate. Like I said, when you have censored, truncated or poisson distribution. Just because the data is not normal does not mean Spearman's test is the most appropriate solution. But of course, you can't see that with your S-W test. That's why I prefer histogram. S-W test cannot even tell anything about skewness and kurtosis. But histogram can do that. When you look only at S-W test, you will come to the conclusion that the data is not normal, and that Pearson's test is not appropriate. Correct. But when it comes to choose the best method, S-W fails badly. It cannot show you which method to apply.

For your article anyway, I will only ask that you note the data is not normally distributed, as indicated by histogram (and S-W test if you really want it) and then show that Spearman's correlation produces the same result. And I will approve. I don't think the analysis needs to be carried out even further. And I don't even have ideas. Pearson/Spearman is just sufficient.
I don't understand your comment. I said that histogram displays much more information than S-W, and that if you would ask me which method I prefer, it's clearly histogram. By far. For example, only histogram can tell you if techniques like poisson regression should be conducted. And that is a common pattern in crime/delinquency data, where you usually have things like : 90% with zero occurrence, 5% with 1 occurrence, 3% with 2 occurrences, 2% with 3+ occurrences.
So you have made the changes concerning the citation of Malloy's data, added the histogram and Spearman's correlation. I have nothing more to add.

I approve.

---