Genetic association studies often suffer from positive and negative confounding due to stratification and admixture in populations. Positive and negative confounding can be alleviated but not solved using modern statistical epidemiology. As a result, many geneticists have suggested that genetic association has not lived up to its promise. Because the major goal of genetic association is to find ancestral haplotypes associated with genetic disease, haplotype phasing could significantly increase the power of genetic association. We propose a method for high throughput haplotyping using single sperm, which has the potential to revolutionize genetic association studies. The method is derived from a family of technologies (GigaLink) for molecular co-localization of two or more nucleic acid targets in millions of single cells in parallel. Broadly, the technology isolates single cells into aqueous-in-oil picoliter reactors, fuses two or more single nucleotide polymorphisms (SNPs) by intermolecular hybridization, and then sequences linked loci in reversed emulsions by next-generation sequencing. This enables far more complicated biological analysis than is possible if analyzing only a single locus in a single cell or even a single locus across many single cells. In Phase I, our objective is to build a proof-of-concept method for single sperm capture and intermolecular linkage between two nucleic acid targets. We propose to conduct a thorough optimization of overlap extension primers and then build a custom microfluidics device for sperm capture. We will test the accuracy of the method by comparing phase revealed by single sperm microarray data to phase revealed by single sperm GigaLink. In Phase II, we will extend the technology to highly multiplexed sperm haplotyping by affixing overlap extension primer sets to beads and then ejecting one bead per microdroplet. The commercialized technology will revolutionize genome-wide association studies by finally providing accessible and accurate phased genetic data.
Geneticists study large populations of unrelated individuals to understand the genetic cause of common diseases. These studies rely on error-prone statistical approaches. We are developing a new molecular method that will improve these statistical approaches and therefore help geneticists understand genetic causes of common disease.