Back to [Archive] Other discussions

1
Thoughts on "How geography influences complex cognitive ability"
I came across the following paper: "How geography influences complex cognitive ability." I wanted to solicit comments specifically with regards to using Fst to index evolutionary influence and the choice of the mediation model. The authors note:

In the tests of these hypotheses evolutionary genetics was addressed by FST distance, an indicator of the time since two populations had a last common ancestor, that is, the time since they were in fact the same population...Socioeconomic development [i.e., HDI = education, gdp, life expectancy] was handled as another mediating variable.

The cognitive influence of genetic distance from South Africa, an evolutionary variable, was weaker than the independent cognitive influence of HDI, a contemporary variable, whatever the country sample (Table 1).


I would have tended to treat CA as a mediator of Fst-HDI, since we are ultimately interested in CA because it explains HDI. Doing so gives the results attached below (FSTCOGHDI). More importantly, the idea of using average genetic distance as an indicator of trait specific genetic differences -- in a trait that clearly has been under selective pressure -- seems dubious at best. Rindermann and Becker (2014) apparently tried something similar, though they used pairwise differences.

Comparisons of Africans and Australoids come to mind, two very genetically distant populations which are arguably phenotypically similar in CA. I wonder though if there is a way to sensibly use such data.

Generally, I think Davide's approach, where genetic distance/proximity is treated as a confound not an index intuitively makes more sense:

There was a positive correlation between the genome-wide (Gedmatch) distances and the 4 SNPs g factor: r= 0.67 (N=10, p= 0.032).

The Gedmatch genome-wide distance was not significantly correlated to IQ differences: r= 0.27 (N=10, p=0.46). However, the 4 SNPs g factor was significantly correlated to IQ differences: r= 0.845 (N=10, p= 0.002).

To assess the relationship of the 4 SNPs g factor net of genome-wide distances, a regression was run with IQ difference as dependent and 4 SNPs g factor, Gedmatch distances as independent variables (table 6). A significant model emerged (F2,9= 26.58, p= 0.01).


Maybe I'm missing something, though.
I came across the following paper: "How geography influences complex cognitive ability." I wanted to solicit comments specifically with regards to using Fst to index evolutionary influence and the choice of the mediation model. The authors note:

In the tests of these hypotheses evolutionary genetics was addressed by FST distance, an indicator of the time since two populations had a last common ancestor, that is, the time since they were in fact the same population...Socioeconomic development [i.e., HDI = education, gdp, life expectancy] was handled as another mediating variable.

The cognitive influence of genetic distance from South Africa, an evolutionary variable, was weaker than the independent cognitive influence of HDI, a contemporary variable, whatever the country sample (Table 1).


I would have tended to treat CA as a mediator of Fst-HDI, since we are ultimately interested in CA because it explains HDI. Doing so gives the results attached below (FSTCOGHDI). More importantly, the idea of using average genetic distance as an indicator of trait specific genetic differences -- in a trait that clearly has been under selective pressure -- seems dubious at best. Rindermann and Becker (2014) apparently tried something similar, though they used pairwise differences.

Comparisons of Africans and Australoids come to mind, two very genetically distant populations which are arguably phenotypically similar in CA. I wonder though if there is a way to sensibly use such data.

Generally, I think Davide's approach, where genetic distance/proximity is treated as a confound not an index intuitively makes more sense:

There was a positive correlation between the genome-wide (Gedmatch) distances and the 4 SNPs g factor: r= 0.67 (N=10, p= 0.032).

The Gedmatch genome-wide distance was not significantly correlated to IQ differences: r= 0.27 (N=10, p=0.46). However, the 4 SNPs g factor was significantly correlated to IQ differences: r= 0.845 (N=10, p= 0.002).

To assess the relationship of the 4 SNPs g factor net of genome-wide distances, a regression was run with IQ difference as dependent and 4 SNPs g factor, Gedmatch distances as independent variables (table 6). A significant model emerged (F2,9= 26.58, p= 0.01).


Maybe I'm missing something, though.


In two subsequent papers I used "real" Fst values, not relying on Gedmatch proxies. These showed a stronger correlation with IQ. E.g. Inspection of the correlation matrix ( https://docs.google.com/spreadsheets/d/1EanD-Jj15a3OBJpu5Tr5e86iLTLZPpa9vUEPTwaVsHU/edit?usp=sharing ) reveals that IQ distances have a stronger correlation to the GWAS hits distances (r x gdist= 0.78;gdist_9=0.65; Polygenic= 0.76) than to the distances representing population structure(r x Fst Chr.1=0.58; Fst Chr.2= 0.58; randomnine1=0.52; randomnine2=0.64; randomnine3=0.57; randomnine4=0.52). See also the regression results in the attached file.

Piffer, Davide (2015): A review of intelligence GWAS hits: their relationship to country IQ and the issue of spatial autocorrelation.. figshare.
http://dx.doi.org/10.6084/m9.figshare.1393160
Retrieved 07:49, May 20, 2015 (GMT)

I think that neutral genetic distances should be treated as a confound. See for examply my 2013 paper, where I showed that despite being genetically very different, Africans and Papuans both have similarly low frequencies of high IQ alleles. And Native Americans have lower frequencies of IQ increasing alleles than Europeans, despite being genetically closer to East Asians.
I think that neutral genetic distances should be treated as a confound. See for example my 2013 paper, where I showed that despite being genetically very different, Africans and Papuans both have similarly low frequencies of high IQ alleles. And Native Americans have lower frequencies of IQ increasing alleles than Europeans, despite being genetically closer to East Asians.


Yet another genetic-distance paper:

Kodila-Tedika, O., & Asongu, S. (2015). Genetic Distance and Cognitive Human Capital: A Cross-National Investigation (No. 15/012).
Admin
In two subsequent papers I used "real" Fst values, not relying on Gedmatch proxies. These showed a stronger correlation with IQ. E.g. Inspection of the correlation matrix ( https://docs.google.com/spreadsheets/d/1...sp=sharing ) reveals that IQ distances have a stronger correlation to the GWAS hits distances (r x gdist= 0.78;gdist_9=0.65; Polygenic= 0.76) than to the distances representing population structure(r x Fst Chr.1=0.58; Fst Chr.2= 0.58; randomnine1=0.52; randomnine2=0.64; randomnine3=0.57; randomnine4=0.52). See also the regression results in the attached file.


These may be a coincidence though. Sample size isn't that large.
In two subsequent papers I used "real" Fst values, not relying on Gedmatch proxies. These showed a stronger correlation with IQ. E.g. Inspection of the correlation matrix ( https://docs.google.com/spreadsheets/d/1...sp=sharing ) reveals that IQ distances have a stronger correlation to the GWAS hits distances (r x gdist= 0.78;gdist_9=0.65; Polygenic= 0.76) than to the distances representing population structure(r x Fst Chr.1=0.58; Fst Chr.2= 0.58; randomnine1=0.52; randomnine2=0.64; randomnine3=0.57; randomnine4=0.52). See also the regression results in the attached file.


These may be a coincidence though. Sample size isn't that large.


Of course, everything in statistics can be a coincidence or significance testing wouldn't exist. The regression results are fairly strong though. Have you actually read the entire paper or just a snippet here and there?
Admin
You misunderstood. You made a claim about one being larger than the other, i.e. a claim about a difference of correlations. One needs quite large samples to detect those.

Try it in R to get a feel for it.

library(psychometric)
CIrdif(.5, .3, 50, 50)
> DifR SED LCL UCL
>1 0.2 0.1822087 -0.1571224 0.5571224
You misunderstood. You made a claim about one being larger than the other, i.e. a claim about a difference of correlations. One needs quite large samples to detect those.

Try it in R to get a feel for it.

library(psychometric)
CIrdif(.5, .3, 50, 50)
> DifR SED LCL UCL
>1 0.2 0.1822087 -0.1571224 0.5571224


There are 72 missing cases for IQ, and a total of 325 pairwise comparisons. Thus sample size is N=253.

IQ Correlations: r x Fst Chr 1 vs r x 4 SNPs G factor

> CIrdif(0.776,0.581,253,253)
DifR SED LCL UCL
1 0.195 0.0647361 0.06811957 0.3218804

IQ correlations: r x Fst Chr 1 vs r x 9 SNPs G factor

> CIrdif(0.655,0.581,253,253)
DifR SED LCL UCL
1 0.074 0.0698223 -0.0628492 0.2108492

IQ correlations: r x RandomNine SNPs* vs Polygenic score (9 hits)

> CIrdif(0.7606,0.563,253,253)
DifR SED LCL UCL
1 0.1976 0.06607334 0.06809863 0.3271014

The confidence intervals show that 2/3 comparisons are significant at the 0.05 level and the other one is borderline significant.
*Average correlation of the 4 correlation coefficients for the 4 random polygenic scores comprising 9 SNPs picked at random.
Admin
You really shouldn't use NHST-language. "borderline significance" is nonsense. https://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

Results show that according to frequentist statistics, the IQ SNPs was more strongly related to national IQs than were population structure, but that the evidence is not that strong. The CIs are fairly wide. Especially doubtful is the fact that results are weaker for the 9 SNP factor than the 4 SNP factor. One would think it was the other way around.

These results would surely be publishable, but they will not convince skeptics.

--

Can you try running the automatic modeling (both versions) and see if they consistently favor models without the population structure predictor? I.e. use national IQs as dependent and use Fst Chr 1/9, 4 SNP factor, 9 SNP factor, polygenic 4, polygenyc 9 as predictors.

I would probably also try without those two SNPs that have loadings in the wrong direction on the SNP factor. They are either false positives, recent mutations or their effect is due to interaction effects (epistasis). I'll put my money on the first.
You really shouldn't use NHST-language. "borderline significance" is nonsense. https://mchankins.wordpress.com/2013/04/21/still-not-significant-2/

Results show that according to frequentist statistics, the IQ SNPs was more strongly related to national IQs than were population structure, but that the evidence is not that strong. The CIs are fairly wide. Especially doubtful is the fact that results are weaker for the 9 SNP factor than the 4 SNP factor. One would think it was the other way around.

These results would surely be publishable, but they will not convince skeptics.

--

Can you try running the automatic modeling (both versions) and see if they consistently favor models without the population structure predictor? I.e. use national IQs as dependent and use Fst Chr 1/9, 4 SNP factor, 9 SNP factor, polygenic 4, polygenyc 9 as predictors.

I would probably also try without those two SNPs that have loadings in the wrong direction on the SNP factor. They are either false positives, recent mutations or their effect is due to interaction effects (epistasis). I'll put my money on the first.


Which automatic modelling are you talking about?
Edit 1: I showed the robusteness of the effects in a much more elegant way than automatic modeling. I ran regressions with random SNPs and Fst as independent vars and showed that the beta coefficients were undistinguishable. On the other hand, the Fst distances or random factors lost all predictive power when entered in a regression together with the GWAS hits factor.


P.S.: I came to the conclusion that NHST is fine...the only wrong part is considering an arbitrary threshold as the judge of whether a result is significant or not but there is nothing wrong with NHST. Actually Bayesian stats has more problems than NHST because it relies too much on subjective assumptions. Bayes' theorem can be properly applied only to a few situations but most ways it's used today it's absolute crap and NHST is to be preferred.
Admin
Frequentist modeling
http://emilkirkegaard.dk/en/?p=4881
Bayesian equivalent
https://thewinnower.com/papers/using-bayes-factors-to-get-the-most-out-of-linear-regression-a-practical-guide-using-r

Using an arbitrary threshold is a fundamental part of NHST. The talk of "significance" invites and frequently results in misunderstanding the results. Never use "significant" to mean low p. Just say "low p", or spell it out "the data is unlikely given a null model", that is what p means.

Then there is the very common inference that p>threshold means no effect. This conclusion is of course very often wrong because studies are consistently underpowered and have been so for decades.

Just use confidence intervals. Bayesian modeling has few advantages over just using CIs and it is much more complicated to use.
Frequentist modeling
http://emilkirkegaard.dk/en/?p=4881
Bayesian equivalent
https://thewinnower.com/papers/using-bayes-factors-to-get-the-most-out-of-linear-regression-a-practical-guide-using-r

Using an arbitrary threshold is a fundamental part of NHST. The talk of "significance" invites and frequently results in misunderstanding the results. Never use "significant" to mean low p. Just say "low p", or spell it out "the data is unlikely given a null model", that is what p means.

Then there is the very common inference that p>threshold means no effect. This conclusion is of course very often wrong because studies are consistently underpowered and have been so for decades.

Just use confidence intervals. Bayesian modeling has few advantages over just using CIs and it is much more complicated to use.


I do not like the word "stat sig" either..I just had a brief relapse into common language but hopefully here we all know that significant thresholds are arbitrary.
Admin
It is better not to invite the confusion. Surveys show that other scientists don't know what it means either. I cannot find the survey again, but if even non-social science researchers cannot understand the meaning, it is too confusing to use. I have stopped using it and I won't be using NHST in my papers either.
1