Ok I carried out PCA. One component was extracted explaining 75.4% of the variance (other components explained very little and had eigenvalues <0.6). Component loadings were very high and in the expected direction (0.804; 0.877; 0.938; 0.850). KMO was good (0.782). Overall the results are better than with phase 1 data, as expected given bigger N.
Here you can find the IQ PC that you'll have to use for the GWAS:
https://docs.google.com/spreadsheets/d/1wtBDdDTHmwCFTcI-zj00WSLq_we_dZMipTztt-CXZUk/edit?usp=sharing
Back to [Archive] Private discussions
We should probably write this up. :)
We should probably write this up. :)
Yes but I wanna carry out the same study I did for the phase 1 data. That is I wanna control for migration effects by using random alleles as I did here: http://dx.doi.org/10.1101/008011
and also within continent analysis, etc....but I think the GWAS is the priority now. People are only gonna be persuaded if the GWAS works!
Gilfoyle, if you have any questions on the next steps of the GWAS let me know
I'll probably get results for chromosome one next week.
I'll probably get results for chromosome one next week.
I guess the threshold for inclusion should be a correlation between the IQ PC and the SNP with r>0.9. Then we'll see how many SNPs with that correlation we get. If there are too many, we'll increase it, otherwise we'll decrease it.
Of course we will exclude those that are in proximity (linkage) of the 4 SNPs, but that can be done after.
By the way, obviously what matters is the absolute magnitude of the correlation. If for a given SNP, allele A has a correlation of -0.95, then allele B will have opposite sign but same magnitude (+0.95). Since we need all IQ, increasing alleles, the other allele will have to be picked when the correlation is negative.
I will keep all results, we can parse the list later.
I am wondering about the next part of this study. It seems that you would need a very large high-IQ control group. Current models suggest that the SNP difference between the smart and non-smart isn't that great so to find out whether a SNP is found more often in bright people would require lots of data.
About my progress:
I have a script that computes the R2 between the IQPC and allele frequencies. Problem is that most SNPs in 1kg do not have a name so I need to add some way of identifying them (I will just call them Chr_position).
About my progress:
I have a script that computes the R2 between the IQPC and allele frequencies. Problem is that most SNPs in 1kg do not have a name so I need to add some way of identifying them (I will just call them Chr_position).
I am wondering about the next part of this study. It seems that you would need a very large high-IQ control group. Current models suggest that the SNP difference between the smart and non-smart isn't that great so to find out whether a SNP is found more often in bright people would require lots of data.
About my progress:
I have a script that computes the R2 between the IQPC and allele frequencies. Problem is that most SNPs in 1kg do not have a name so I need to add some way of identifying them (I will just call them Chr_position).
A goal is to validate my method, and for this we don't need to identify a single SNP. It's sufficient to show that a group of SNPs selected with my method has higher frequencies in the high IQ group. For these, even a few people are sufficient.
Next goal is to identify which individual SNPs in the group have an association with IQ and this will require a bigger sample. But if we could achieve the first step of identifying a set of candidate SNPs, it would be a great achievement. With that in our hands, we could approach the big names or an existing data set and a sound research proposal (http://www.bristol.ac.uk/alspac/researchers/data-access/)
Now I have a script that computes the R2 between the IQPC and population allele frequencies. It is currently running.
I do not know how you want the results (currently not done)? Should I post the file here or what?
And is there a way to sanity check the results I get?
Ps. I had to participate in a conference call last week so I downloaded Skype. PM me if you want my nick.
I do not know how you want the results (currently not done)? Should I post the file here or what?
And is there a way to sanity check the results I get?
Ps. I had to participate in a conference call last week so I downloaded Skype. PM me if you want my nick.
We will compare control group frequencies vs high IQ group frequencies, with a 2x2 contingency table, using Fisher's exact test.
An outline here (numbers are not necessarily real): https://docs.google.com/spreadsheets/d/1frtYhaKqqD6jncNjdYXw28crkvKFmTHC9U8vTQQ3LkU/edit?usp=sharing
An outline here (numbers are not necessarily real): https://docs.google.com/spreadsheets/d/1frtYhaKqqD6jncNjdYXw28crkvKFmTHC9U8vTQQ3LkU/edit?usp=sharing
Should we hide this and the other thread until after a potential publication or what?