My current todo list is:
- Clean up the code so that it is possible to add new analyses and steps.
- Implement a bin-method that takes care of the LD-problem.
- Add the ancestral allele analysis to the pipeline instead of doing that as a separate analysis afterwards.
- Make sure the ancestral analysis is correct...
- Do the Watson/Venter analysis only on ancestral alleles.
Things that have come to me during my time off. These might be good ideas, it is up to you to decide:
- Now that we are switching to a 300 kb "bin"-based method, using the reference panels in ALFRED seems like it could work better; even if we get many fewer SNPs there will still be many SNPs within each bin I suspect.
- What about using all SNPs with passable R2s, but weighting them somehow, so that a SNP with correlation 0.45 is used, but not given as much weight as one with an R2 of 0.9? I imagine this might give better signal; as we saw, the results were good for lower R2 SNPs too.
- Related to the last point; perhaps we should just count all plus and minus alleles in each 300kb-bin instead of looking much at R2?
- How about crowdfunding a programmer (not me; I am lacking time and energy, not money)*? There are plenty of blogs that might link to such a charity.
*Furthermore, your offer to pay was kind, but it would be a serious breach of ethics for me to use the school server to carry out analyses and then receive payment for it.