Genome-wide association studies (GWAS) mine vast amounts of genomic data to detect correlations between markers and traits. Datasets gathered from different genotyping platforms invariably contain a significant fraction of missing genotypes. Genotype imputation fills in the missing genotypes. Unfortunately, imputation is computationally slow and prone to Mendelian inconsistencies when applied to family data. Most imputation methods also require large haplotype reference panels and phased data. A related problem is that standard GWAS analysis methods ignore haplotype structure. By including haplotype information in the form of ?haplosnps,? short sequences of single nucleotide polymorphisms (SNPs) located on the same chromosome strand, additional associations related to long-range genomic interactions can be detected. I have developed a fast and accurate genotype imputation matrix completion program in Julia that employs an accelerated Nesterov gradient method. This method also applies a post-processing projection to Mendelian consistency, as well as a fast reference panel based haplotyping option. I will add an option for haplotype estimation without a reference panel. This will provide the set of tools necessary for preparing raw sequence data to be used for haplosnp GWAS analysis, which I will develop in Julia.
Genome-Wide Association Studies (GWAS) analyses provide lists of SNPs correlated with a disease or trait of interest and are an important step in identifying the underlying genetic causes of disease. Long-range genetic interactions on a single chromosome strand suggest that sequences of SNPs are more informative in detecting sequence-phenotype associations than SNPs analyzed independently. This project will develop the complete set of tools necessary to conduct GWAS analyses informed by haplotype structure from raw sequence data with missing or uncertain entries.