Noah,

Thanks for the review.

1)

R does not like it when variable names have spaces (even in data.frames), so I have used underscores. One could also use CamelCase instead, but I think underscores are more readable. My preference is to keep the code consistent with the paper (for those who wish to analyze it more closely) over that of a slightly more polished presentation.

2)

I have added borders to the table to make it more readable.

3)

The achievement measure is clearly not optimal, but one has to make do with what there is. Perhaps one can find a better measure. You seem familiar with the data, could you perhaps tell me whether one of these measures are better?

http://data.london.gov.uk/dataset/gcse-results-location-pupil-residence-boroughIt looks like there are several measures one could perhaps factor analyze or otherwise combine to get a composite measure:

a) All Pupils at the End of KS4 Achieving 5+ A* - C

b) All Pupils at the End of KS4 Achieving 5+ A* - G

c) All Pupils at the End of KS4 Achieving 5+ A* - C Including English and Mathematics

d) All Pupils at the End of KS4 Achieving 5+ A* - G Including English and Mathematics

e) All Pupils at the End of KS4 Achieving the Basics

f) All Pupils at the End of KS4 Entering the English Baccalaureate

g) All pupils at the End of KS4 Achieving the English Baccalaureate

h) Average GCSE and Equivalent Point Score Per Pupil at the End of KS4

i) Average Capped GCSE and Equivalent Point Score Per Pupil at the End of KS4

As far as I can tell, many of these are threshold versions of more continuous variables (cf.

http://www.lagriffedulion.f2s.com/adverse.htm). Such variables have somewhat non-linear relationships. Perhaps (h) is the best variable to use? It looks like a mean score type variable, meaning that no threshold transformation has been applied to it.

I did analyze them. The currently used variable (c) has a factor loading of .96, but the loadings of all the variables are in the .69-.98 range, so it would probably not matter so much. The highest loading is (i).

In fact, because we have 9 variables all measuring scholastic ability, one can use Jensen's method. The prediction being that the variables that better measure scholastic ability should show higher correlations with the criteria variable (S). This was in fact found, r's .94-.95 (depending on which S score vector was used).

All correlations between GCSE variables and S were substantial r's .582 to .886. The strongest correlation is with (h) as one could expect because it is the underlying continuous variable. (i) seems to be some capped (?) version of this, which introduces a ceiling effect.

So it seems to me that one should use (h).

It seems somewhat unnecessary to include all this in the main part of the paper. Perhaps add an appendix discussing the GCSE variables and the above analysis?

Let me know what you would prefer.

4)

I agree. I changed it to:

In line with much other research (15,16), one would expect higher cognitive ability to lead to higher S. The GCSE grades are not exactly an IQ test (17,18), **but it has been found that at the national-level, scholastic ability and cognitive ability as measured by traditional IQ tests are nearly perfectly correlated (19,20). This suggests that it may also be a useful proxy at the borough-level, but this may not be the case. **Prior research using similar data has found strong relationships between scholastic/ability ability and S, so a correlation in the vicinity of .40 to .90 would be expected here.

--

Files updated.