Statistical Methods for imputation and genome wide association studies

Wasiolek, Rory

Abstract

Genome-wide association studies (GWAS) mine vast amounts of genomic data to detect correlations between markers and traits. Datasets gathered from different genotyping platforms invariably contain a significant fraction of missing genotypes. Genotype imputation fills in the missing genotypes. Unfortunately, imputation is computationally slow and prone to Mendelian inconsistencies when applied to family data. Most imputation methods also require large haplotype reference panels and phased data. A related problem is that standard GWAS analysis methods ignore haplotype structure. By including haplotype information in the form of ?haplosnps,? short sequences of single nucleotide polymorphisms (SNPs) located on the same chromosome strand, additional associations related to long-range genomic interactions can be detected. I have developed a fast and accurate genotype imputation matrix completion program in Julia that employs an accelerated Nesterov gradient method. This method also applies a post-processing projection to Mendelian consistency, as well as a fast reference panel based haplotyping option. I will add an option for haplotype estimation without a reference panel. This will provide the set of tools necessary for preparing raw sequence data to be used for haplosnp GWAS analysis, which I will develop in Julia.

Public Health Relevance

Genome-Wide Association Studies (GWAS) analyses provide lists of SNPs correlated with a disease or trait of interest and are an important step in identifying the underlying genetic causes of disease. Long-range genetic interactions on a single chromosome strand suggest that sequences of SNPs are more informative in detecting sequence-phenotype associations than SNPs analyzed independently. This project will develop the complete set of tools necessary to conduct GWAS analyses informed by haplotype structure from raw sequence data with missing or uncertain entries.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Predoctoral Individual National Research Service Award (F31)
Project #: 5F31HG009621-02
Application #: 9503629
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Gatlin, Christine L

Project Start: 2017-07-01
Project End: 2019-06-30
Budget Start: 2018-07-01
Budget End: 2019-06-30
Support Year: 2
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of California Los Angeles
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 092530369

City: Los Angeles
State: CA
Country: United States
Zip Code: 90095

Related projects


NIH 2018 F31 HG	Statistical Methods for imputation and genome wide association studies Wasiolek, Rory Esteban / University of California Los Angeles
NIH 2017 F31 HG	Statistical Methods for imputation and genome wide association studies Wasiolek, Rory Esteban / University of California Los Angeles

Comments

Be the first to comment on Rory Wasiolek's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: