Genome-wide association studies (GWAS) mine vast amounts of genomic data to detect correlations between markers and traits. Datasets gathered from different genotyping platforms invariably contain a significant fraction of missing genotypes. Genotype imputation fills in the missing genotypes. Unfortunately, imputation is computationally slow and prone to Mendelian inconsistencies when applied to family data. Most imputation methods also require large haplotype reference panels and phased data. A related problem is that standard GWAS analysis methods ignore haplotype structure. By including haplotype information in the form of ?haplosnps,? short sequences of single nucleotide polymorphisms (SNPs) located on the same chromosome strand, additional associations related to long-range genomic interactions can be detected. I have developed a fast and accurate genotype imputation matrix completion program in Julia that employs an accelerated Nesterov gradient method. This method also applies a post-processing projection to Mendelian consistency, as well as a fast reference panel based haplotyping option. I will add an option for haplotype estimation without a reference panel. This will provide the set of tools necessary for preparing raw sequence data to be used for haplosnp GWAS analysis, which I will develop in Julia.

Public Health Relevance

Genome-Wide Association Studies (GWAS) analyses provide lists of SNPs correlated with a disease or trait of interest and are an important step in identifying the underlying genetic causes of disease. Long-range genetic interactions on a single chromosome strand suggest that sequences of SNPs are more informative in detecting sequence-phenotype associations than SNPs analyzed independently. This project will develop the complete set of tools necessary to conduct GWAS analyses informed by haplotype structure from raw sequence data with missing or uncertain entries.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
5F31HG009621-02
Application #
9503629
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gatlin, Christine L
Project Start
2017-07-01
Project End
2019-06-30
Budget Start
2018-07-01
Budget End
2019-06-30
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of California Los Angeles
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095