It will only be so if the data is non-normal.

Of course not. I saw it many times my variables normally distributed when looking at histogram and P-P plot, and yet S-W says otherwise. Like i said: it's p-value. Your argument is similar as saying that if p is significant, it can only be so if the effect size is large. Yet that is not true, and a lot of examples show that the effect size can be close to zero but p is significant. One example you have is from Pekkala Kerr et al. (2013) "School tracking and development of cognitive skills". They say that schooling reform shows no transfer effect, just because some tests are significantly improved, and others not. But when you calculate effect sizes, which they don't, the d gaps are between 0.00 and 0.03 (or 0.04). In my opinion, it's not different than to say the effect size is zero for each test.

It's dangerous to use significance tests. I always said it, and I will always repeat it.

And even if you trust W value, I don't trust cut off values. What is the .99 really means, given the operation to get W value,

which is given here ?

So, what does that mean when you have 0.95 or 0.96 instead of 0.99 ? How can you judge that ? If you think eye-balling is not nice, I will say cut-off values are not better.

If the IQs changed, but stayed in the same relatively order, then correlation analysis will not detect it, that's right. ST hypothesis says they will generally stay the same, which also implies the order will stay generally the same, and for this reason the usual correlates of IQ will be found.

If IQ changes and rank order are not the same thing, and you see that rank order remains the same, you cannot conclude that IQ has not changed. This should be made clear.

---

I don't mind if you use both S-W and histograms/plots. But make sure you don't rely on p values.