Back to [Archive] Free GWAS project

Piffer's Freewas
Changed back to 0.9.

Okidoki, fine. I do not see any blocks having an average over 0.8 or 0.9, but we will see.


We may have to considerably reduce the threshold then, like to 0.5 or any other values that gives a big enough N.
4SNP analysis:

Venter middling

1621732 1557184
try({print(fem)})
[,1] [,2]
[1,] 1621732 1007968
[2,] 1557184 1076572

try({fisher.test(fem, alternative="two.sided")})

Fisher's Exact Test for Count Data

data: fem
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.108443 1.116225
sample estimates:
odds ratio
1.112333


Watson middling

1618178 1557184
try({print(fem)})
[,1] [,2]
[1,] 1618178 1011382
[2,] 1557184 1076572

try({fisher.test(fem, alternative="two.sided")})

Fisher's Exact Test for Count Data

data: fem
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.102295 1.110056
sample estimates:
odds ratio
1.106149


Watson low

81983 82330
try({print(fem)})
[,1] [,2]
[1,] 81983 64309
[2,] 82330 64199

try({fisher.test(fem, alternative="two.sided")})

Fisher's Exact Test for Count Data

data: fem
p-value = 0.4255
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.9796666 1.0087081
sample estimates:
odds ratio
0.9940815


Venter low

82150 82330
try({print(fem)})
[,1] [,2]
[1,] 82150 64150
[2,] 82330 64199

try({fisher.test(fem, alternative="two.sided")})

Fisher's Exact Test for Count Data

data: fem
p-value = 0.8494
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.984084 1.013277
sample estimates:
odds ratio
0.9985758


Venter high

11213 10397
try({print(fem)})
[,1] [,2]
[1,] 11213 4379
[2,] 10397 5203

try({fisher.test(fem, alternative="two.sided")})

Fisher's Exact Test for Count Data

data: fem
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.220715 1.345173
sample estimates:
odds ratio
1.281403


Watson high

11193 10397
try({print(fem)})
[,1] [,2]
[1,] 11193 4401
[2,] 10397 5203

try({fisher.test(fem, alternative="two.sided")})

Fisher's Exact Test for Count Data

data: fem
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.212489 1.335961
sample estimates:
odds ratio
1.272724
Excellent. I'll add this to the paper. Do you think it'd be easy to calculate the average frequency of derived alleles for the 1KG populations? For some odd reason none has done this before...I could find only published frequencies for derived Alu elements but not for SNPs.
It should be simple. I have computed the snp frequencies in 1kg and have computed the derived alleles. Where on the priority list should I put it?

Ps. I think we should only do the rs-numbered SNPs, the other variants are often local to just one population.
It should be simple. I have computed the snp frequencies in 1kg and have computed the derived alleles. Where on the priority list should I put it?

Ps. I think we should only do the rs-numbered SNPs, the other variants are often local to just one population.


I'd put it after the linkage analysis.
Computing the number of blocks for the reference populations will be time consuming to code and to run: I have to split the reference population into each person, compute the number of good/bad/medium blocks for each person in the reference population and then compare against the brainiacs.

When we just counted the number of alleles you could use an aggregate score which was much easier.
Computing the number of blocks for the reference populations will be time consuming to code and to run: I have to split the reference population into each person, compute the number of good/bad/medium blocks for each person in the reference population and then compare against the brainiacs.

When we just counted the number of alleles you could use an aggregate score which was much easier.


Ok so do you want to do this at a later stage? In the meantime you could calculate the average frequency of derived alleles for the 26 populations, since it's much easier?
Do you think it will be okay if I include the whole reference population in each block instead of creating one block for each person in the reference population?

This will mean that I create blocks with the mean value of the reference panel alleles in the block, so that if there are 220 haplotypes in the reference population each beneficial /detrimental allele in the block has a frequency of beneficial count / 220 and detrimental count / 220.

The blocks won't be true haplotype blocks though, but is this okay? This takes care of LD and is much easier to program.

Hope my question is understandable.
Do you think it will be okay if I include the whole reference population in each block instead of creating one block for each person in the reference population?

This will mean that I create blocks with the mean value of the reference panel alleles in the block, so that if there are 220 haplotypes in the reference population each beneficial /detrimental allele in the block has a frequency of beneficial count / 220 and detrimental count / 220.

The blocks won't be true haplotype blocks though, but is this okay? This takes care of LD and is much easier to program.

Hope my question is understandable.


Yes I think it's fine to use the mean value of the reference panel.
But I do not see how to do this.

If we have a block with these snps (8 people):

Snpname beneficial detrimental:
Snp1 10 6
Snp2 8 8
Snp3 15 1
Snp4 7 9

What should I count this as? Two beneficial blocks? One of each? And how do I compute it?
But I do not see how to do this.

If we have a block with these snps (8 people):

Snpname beneficial detrimental:
Snp1 10 6
Snp2 8 8
Snp3 15 1
Snp4 7 9

What should I count this as? Two beneficial blocks? One of each? And how do I compute it?



10+8+15+7=40
6+8+1+9=24
(40/8):(24/8)= 5:3

Hence, 5 beneficial and 3 detrimental

Chunking the genome. We've given priority to chunking the genome to control for linkage. I'd like to highlight that this will not alter the odds ratios (if the calculations are done properly)..it'll simply give a better (although too conservative) estimate of the significance level. We've got the true odds ratios now but with inflated significance. I guess it depends on how much you care about significance. I think it's more important to have estimated the odds ratio. I mean, our p value is extremely low (^-16) and it's clear that the result cannot be due to chance. And there is the problem that if we do it wrong we're gonna screw up the results.
If you think this is too much work for now, perhaps do the ancestral:derived thing first?
Im taking the rest of the weekend off, but I can skip the linkage part. It isn't that much work if I can do all ref people at once like you said. I imagine that if we do not some reviewer is gonna quibble. And if we get the same odds ratio that is a confidence inducing result we should mention in the paper.

I can do the ancestral:derived first, no problem.
Im taking the rest of the weekend off, but I can skip the linkage part. It isn't that much work if I can do all ref people at once like you said. I imagine that if we do not some reviewer is gonna quibble. And if we get the same odds ratio that is a confidence inducing result we should mention in the paper.

I can do the ancestral:derived first, no problem.


Ok. And please send me the derived allele freq. average for the 26 populations. I am really curious to see what we get. There was a paper showing more ancestral alleles among Africans (although they didn't use SNPs and they didn't use 1KG data). I guess it's not very PC to say that Africans are more "ancestral" so that's why it's so hard to come across such a study. Nonetheless, I'd like to see the partial correlation betweeen the PC and IQ after controlling for derived average frequency frequency.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274485/
I also need the number of alleles that are derived and ancestral for two groups of alleles: one with alleles very strongly (r>0.9) correlated to the PC and the other with alleles not correlated to it (p<0.005) and the middling 0.005-0.9 alleles.
I attach the paper with Meisenberg's comments.
His longest comment is both interesting and intelligent ("...Now let’s assume that there are not only “IQ genes”, but also genes that do not affect intelligence directly, but incline people to use their intelligence for real-world problem solving, as opposed to theoretical and philosophical pursuits..."), but I do not think we should spend too much time on such details, a small note should suffice. What are your thoughts?
His longest comment is both interesting and intelligent ("...Now let’s assume that there are not only “IQ genes”, but also genes that do not affect intelligence directly, but incline people to use their intelligence for real-world problem solving, as opposed to theoretical and philosophical pursuits..."), but I do not think we should spend too much time on such details, a small note should suffice. What are your thoughts?


I agree with you.
Admin
The schooling IQ gains are somewhat dubious. If you read the papers they are from, the data is also interpretable on a developmental theory. Basically, there are some natural experiments where some children get tested at e.g. age 13, and then some of the kept in school while others left, and they were then tested later at e.g. age 19. One can then control for the initial scores and see the correlation between schooling length and IQ. This is where the estimate is from, assuming that all these gains in IQ points are due to real increases in g. However, it may have been simply that people that stayed in school presumably due to at least some choice of theirs would have gained regardless of schooling or not because they were on a developmental path towards that. A self-selection effect.

Alternatively, it may be that the IQ gains are not gains in g, but in other abilities. I have not seen a MCV study of school gains so examine this.
Ward et al. (2014) showed that the 3 SNPs are positively associated with performance in Math and Reading also among elementary school children, so this disproves the schooling IQ gains hypothesis.
Okay, then my short term todo-list is to

  • Find the ancestral allele frequency for 1kg.
  • Find the ancestral allele frequency for the three groups of SNPs


I should also include the number of SNPs within each group that has no ancestral info.