Back to [Archive] Free GWAS project

Piffer's Freewas
Okay, then my short term todo-list is to

  • Find the ancestral allele frequency for 1kg.
  • Find the ancestral allele frequency for the three groups of SNPs


I should also include the number of SNPs within each group that has no ancestral info.


Yes that's right.
I attach the list of candidate SNPs from Rietveld et al, 2014. I attach only the first page because the second page has too high p values and it's probably just random noise.
Would you be able to get the frequencies for those alleles? Then compute their average frequency for the 26 populations. Of course the first 3 SNPs in the list must be excluded because they're the same top 3 SNPs we have used for the PC. I want to see if the average frequency for the other SNPs is correlated to the top 3 PC.

If the sign of the beta coeff. is positive, the increasing allele is the Coded allele. Otherwise it's the non-coded allele.
There are some SNPs with opposite effects in cognitive performance and years of education. Exclude these too: rs17176043;rs6732189; rs2966

I don't know what we'll get..probably just random noise. Still worth trying...
Oki, but I'll do it after the two things I posted earlier today.

Do you have a txt or st of that file so I do not have to type all that by hand?
Oki, but I'll do it after the two things I posted earlier today.

Do you have a txt or st of that file so I do not have to type all that by hand?



You can download the supplemental data from there:

http://ssgac.org/Data.php

They've got first and second stage meta-analysis .txt files but I do not know which SNPs they report.
Was able to finish the 1kg stuff today, but the computations are still running. Will post tomorrow.
I was rude enough to contact Harpending about our work.

He had many interesting things to say:

Hi Endre, thanks for the preprint. Your approach seems exactly right to me but I am no expert on statistics and data analysis. I do have some complaints that I will lay out below.

There is a large consortium that is pursuing GWAS for cognitive SNPs that you cite. I expect that you are their ongoing nightmare because you and Piffer are not afraid to finish their sentence, so to speak. You are wise to keep your head down but things may be getting better. I was at a meeting at the Chicago economics department a few months ago that included many of the GWAS bunch. It is probably worth a look at this link: there are videos of some of the talks. There was no self-righteous hostility, to my surprise, but everyone in the talks stepped gingerly, careful not to mention group differences. The discussions in the bar in the evening were much more open and explicit (and honest).

Here are some not-well-organized reactions to your draft:

I am likely being an old fogey, but "correlated to" just sounds wrong to me, but I do see that it is growing in popularity. I would say "correlated with".

You are not very clear about what you did. For example you performed PCAs but you did not lay out what data matrix you factored. I think of PCA starting with a matrix of rows that correspond to genes and columns that correspond to objects. What are your objects? Populations? What populations? I presume, early on, there are 3 rows corresponding to the GWAS hits?

What are the entries of the data matrix? We see two in the literature. If x_i is the frequency of the i'th allele in a population (or individual), are the entries of your data matrix x_i or x_i/(x_i bar(1-x_i bar)). There are arguments to be made for either one, the second is customary in some circles because the "speed" of frequency change under either drift or selection ought to be proportional to x_bar(1-xbar). The Visscher lot uses the second normalization. I don't know whether or not this would make any different at all with respect to the patterns that you find.

You should not report significance levels for cross-national findings since the data are autocorrelated. Are Denmark, the UK, Sweden, and Nigeria four nations or 2 nations? This error plagues cross-cultural research: anthropologists used to call it Galton's problem.

You speak of "selection" a lot but I can't see why. Of course the differences must reflect selection but the details are open to our imaginations. What selection to you see?

There two, maybe 3, pockets of high IQ in the world. One is in NW Europe, one in NE Asia, and maybe one in south India. The European and Asian pockets seem to be different. It would be nice if you had, say, Steve Hsu's and James Lee's 23andMe scans to see if they have the same configuration as Venter and Watson. May be hard to get since Hsu and Lee both have ponies in this race.

My sense of what is going on in cognitive genetics and more generally is that the old old fight between the biometricians and the Mendelians is coming back full steam and that we will see a period of Galton coming back as he certainly ought to.

Best, Henry


From this it seems that partnering with Lee/Hsu might not be such a terrible idea, the question is whether they would like to.

Ps. I have systematically been calling this your idea, but our work. Does that sound right to you? It is not like the stuff I have to do is self explanatory (indeed, some of it isn't explained anywhere), so I am doing some creative work too even though I admit to not having had anything to do with developing the idea, nor the theory.

Psps. I contacted him some time ago, before you said you wanted to wait. He likes the idea and the work, so perhaps he could put in a good word for us with Greg? I'll post my reply to Harpending here, before sending it to him so that you can include stuff you want to ask about or similar.
I was rude enough to contact Harpending about our work.

He had many interesting things to say:
From this it seems that partnering with Lee/Hsu might not be such a terrible idea, the question is whether they would like to.

Ps. I have systematically been calling this your idea, but our work. Does that sound right to you? It is not like the stuff I have to do is self explanatory (indeed, some of it isn't explained anywhere), so I am doing some creative work too even though I admit to not having had anything to do with developing the idea, nor the theory.

Psps. I contacted him some time ago, before you said you wanted to wait. He likes the idea and the work, so perhaps he could put in a good word for us with Greg? I'll post my reply to Harpending here, before sending it to him so that you can include stuff you want to ask about or similar.


Yes calling it my idea and our work is ok. I'd wait before contacting Greg, when we'll have the next paper.
Another genome from a Mensa Italy member (F). Educational level: BA (Law Degree).

https://drive.google.com/file/d/0B7hcznd4DKKQNC1KNUxTWjVtNUE/view?usp=sharing
We should ask for blurbs whenever we get someone knowledgeable/intelligent/famous to look at our stuff. I would like to have some stuff to quote whenever some overzealous nice person accuses our science of being bad, wanting the uni to fire me etc. You might not care what some morons think, but I have a daughter to feed ;)
We should ask for blurbs whenever we get someone knowledgeable/intelligent/famous to look at our stuff. I would like to have some stuff to quote whenever some overzealous nice person accuses our science of being bad, wanting the uni to fire me etc. You might not care what some morons think, but I have a daughter to feed ;)


Yes at a later stage, maybe our next paper.
I was able to finish early with the urgent work stuff (wanting to do this is a powerful motivating factor).

The results are very good, in that I do not think they will be offensive to many people.

Results:
Pop %ancestral alleles
ACB 0.035075304039
ASW 0.0350700542785
BEB 0.0351445670528
CDX 0.0351293488305
CEU 0.0351166746368
CHB 0.0351203880422
CHS 0.0351206208601
CLM 0.0351116272273
ESN 0.0350639706557
FIN 0.0351160873829
GBR 0.0351166086565
GIH 0.0351438593698
GWD 0.0350757945072
IBS 0.0351203693207
ITU 0.0351465482021
JPT 0.035117834625
KHV 0.0351286934721
LWK 0.0350615039994
MSL 0.0350758294636
MXL 0.0351133903568
PEL 0.0351216521707
PJL 0.035141654023
PUR 0.0351093784774
STU 0.0351471205601
TSI 0.0351182139037
YRI 0.0350575085643

number snps not matching 36178012
number snps used for calculations 840619546


It seems that four percent of the ancestral alleles were unlike any of those in modern humans. Is it possible that both human alleles have mutated? I guess so...

Please tell me if the results seem plausible.

Ps. the numbers above varied a little between chromosomes (the rank ordering too)- interesting stuff!
Ps. new version of 1kg out. Should redo all the analyses, since chr 12 was unusable with the previous version.
It seems that you will be able to compute frequencies now (I can still do it for you): http://browser.1000genomes.org/Homo_sapiens/UserData/Allele

Dunno if it is easy to use or not.
I was able to finish early with the urgent work stuff (wanting to do this is a powerful motivating factor).

The results are very good, in that I do not think they will be offensive to many people.

Results:
Pop %ancestral alleles
ACB 0.035075304039
ASW 0.0350700542785
BEB 0.0351445670528
CDX 0.0351293488305
CEU 0.0351166746368
CHB 0.0351203880422
CHS 0.0351206208601
CLM 0.0351116272273
ESN 0.0350639706557
FIN 0.0351160873829
GBR 0.0351166086565
GIH 0.0351438593698
GWD 0.0350757945072
IBS 0.0351203693207
ITU 0.0351465482021
JPT 0.035117834625
KHV 0.0351286934721
LWK 0.0350615039994
MSL 0.0350758294636
MXL 0.0351133903568
PEL 0.0351216521707
PJL 0.035141654023
PUR 0.0351093784774
STU 0.0351471205601
TSI 0.0351182139037
YRI 0.0350575085643

number snps not matching 36178012
number snps used for calculations 840619546


It seems that four percent of the ancestral alleles were unlike any of those in modern humans. Is it possible that both human alleles have mutated? I guess so...

Please tell me if the results seem plausible.

Ps. the numbers above varied a little between chromosomes (the rank ordering too)- interesting stuff!


3.5% ancestral alleles? They are around 35% using polymorphisms other than SNPs...Unlikely to be so low.
Yeah, there might be something fishy there. Took 150 lines of code, will try to look over it tomorrow.
We could also do a meta-correlation (I prefer this name to the high brow sounding "method of correlated vectors") between PC loading and odds ratios in W+V. We got results that point in that direction with the high PC loading alleles (r>0.9) having higher odds ratios than the middling alleles (r between 0.005 and 0.9) and the uncorrelated alleles (p<0.005) having even lower odds ratios (chance level or unity). A more fine grained analysis would compute the correlation between the alleles' correlation with the PC and the odd ratios produced using FE by allele sets of different r magnitudes (e.g. 0-0.05; 0.05-0.1; 0.1-0.15, etc).
I got this emailf from Jeff Hsu. Can you guys help me reply to him?
Davide,
Great series of papers on exploring polygenic selection in human populations. I'm looking to understand and possibly extend your work somehow by doing some simulations on various models of population histories and differing underlying assumptions. In this regard, I have some questions regarding your papers:
Is there a downloadable file for 1000G population IQs that you use. I understand you use the Lynn and Vanehanen 2012 national IQ data, but I'm note sure of the mapping between to the 1000G subpopulations. I just grabbed the national IQ with some of your adjustments and used it on the 1000G populations, ie CDX would just use the Chinese national IQ.
You can view some of analysis in ipython's notebook here:
http://nbviewer.ipython.org/github/jeffhsu3/Piffer_analysis/blob/master/notebooks/Pfiffer%20et.%20al%202014.ipynb

Any suggestions or corrections that I am doing wrong would be great!
The first thing I would do is thank him for double-checking your work. Perhaps even invite him to publish in openpsych.

It would be easier to help you reply if I knew your thoughts on this.

Perhaps you could explain that we are afraid of being leapfrogged so we are unable to give away too much info about the new analyses, but explain that we will make the whole pipeline available afterwards. Being leapfrogged would be bad for you since you need this study to get some money, while I will have wasted plenty of precious PhD-time with nothing to show for it. Make sure to give good reasons so he does not find you rude - that would just make him want to beat us.

He seems to be skilled so perhaps he is someone we want on the team instead of competing? I'm pretty sure this is him: https://www.linkedin.com/pub/jeffrey-hsu/7/25b/523

Ask him, just out of curiosity, if there is any relation to S. Hsu.

As for his question, if the national IQs aren't appropriate for the populations in 1kg, shouldn't this just reduce the correlations observed? It would be strange if all the mismatches between 1kg and Lynn went in the right direction so that the R2 is increased.

And tell him that he needs to understand snakemake: https://bitbucket.org/johanneskoester/snakemake/wiki/Home to understand our code.
Here are a few example SNPs I have computed ancestral alleles for:

rs115724926 ancestral allele: C a1, a2: T C
rs76558199 ancestral allele: C a1, a2: T C
rs115097218 ancestral allele: A a1, a2: C A
rs73912893 ancestral allele: T a1, a2: C T
rs148989228 ancestral allele: C a1, a2: T C
rs188353846 ancestral allele: C a1, a2: A C
rs183990933 ancestral allele: A a1, a2: G A
rs150703965 ancestral allele: C a1, a2: T C
rs12754304 ancestral allele: G a1, a2: A G


Could you check a few of them to see that my calculations are correct? I'm getting curious results like:

ACB 0.915314274671
ASW 0.91517916277
BEB 0.917049238231
CDX 0.916604144085
CEU 0.916312368517
CHB 0.916396070154
CHS 0.916384565614
CLM 0.916188882557
ESN 0.915038231054
FIN 0.916286699506
GBR 0.916317467703
GIH 0.917061446611
GWD 0.915371030784
IBS 0.916431791717
ITU 0.917123471431
JPT 0.916327029174
KHV 0.916603507552
LWK 0.914944244915
MSL 0.915354857881
MXL 0.916200977742
PEL 0.916411334999
PJL 0.91699048296
PUR 0.916128942459
STU 0.917130458765
TSI 0.91637858723
YRI 0.914866869338


I thought it was known that AFR were more ancestral?
He seems to be skilled so perhaps he is someone we want on the team instead of competing? I'm pretty sure this is him: https://www.linkedin.com/pub/jeffrey-hsu/7/25b/523

Ask him, just out of curiosity, if there is any relation to S. Hsu.

As for his question, if the national IQs aren't appropriate for the populations in 1kg, shouldn't this just reduce the correlations observed? It would be strange if all the mismatches between 1kg and Lynn went in the right direction so that the R2 is increased.


I think it's the guy on linkedin you mentioned. I also wondered about a relation to S.Hsu, so I'll ask. He's not saying that the national IQs are inappropriate, he's just saying that he's not sure how to map the IQs to the 1KG populations.
I am gonna try and get him to work on another project I have (modelling natural selection after taking into account population history) which is what he seems to want to do. So hopefully he'll not want to work on the Freewas for a while...