Massively parallel sequencing has transformed the field of genomic studies. These new technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They also hold the promise to help explain some of the missing heritability from genomewide association studies of complex traits. However, the development of robust statistical and computational methods has fallen seriously behind the technological advances particularly for application to the study of complex human traits. The methodological work lags in at least three major areas. First, there are few, if any, publications on the optimal design of sequencing-based studies for complex traits that take into account the complex dynamic of sequencing cost to allow for exploration of the full range sample size and sequencing depth. Second, there are no published methods for the analysis of low coverage (in the range of 2-4X) sequencing data. Low coverage sequencing is being used to study complex diseases and traits because it can lead to substantial gains in power by increasing the effective sample size, critical for the detection of moderate genetic effects for typical complex human traits. Third, the field needs statistical methods that can efficiently analyze rare variants derived from various designs of sequencing-based studies. In this application, we will establish a comprehensive statistical framework for the design and analysis of sequencing-based studies for complex human traits. To do so, we propose the following four specific aims: 1) Develop a unified statistical framework for SNP calling, genotyping, and haplotyping from sequencing and genotyping data. 2) Provide alternative design options for sequencing-based genetic studies. 3) Develop statistical methods for the analysis of rare variants. 4) Develop, distribute and support freely available software packages for the methods proposed in this application. The proposed methods will be evaluated through analytical approaches, computer simulations and applications to multiple real datasets.

Public Health Relevance

Massively parallel sequencing has transformed the field of genomic studies. These new technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders and hold the promise to help explain some of the missing heritability from genomewide association studies of complex traits. However, the development of robust statistical and computational methods has fallen seriously behind the technological advances particularly for application to the study of complex human traits. In this application, we will establish a comprehensive statistical framework for the design and analysis of sequencing-based studies for complex human traits.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006292-03
Application #
8471743
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2011-08-23
Project End
2016-05-31
Budget Start
2013-06-01
Budget End
2014-05-31
Support Year
3
Fiscal Year
2013
Total Cost
$350,433
Indirect Cost
$111,683
Name
University of North Carolina Chapel Hill
Department
Genetics
Type
Schools of Medicine
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Huang, Kuan-Chieh; Sun, Wei; Wu, Ying et al. (2014) Association studies with imputed variants using expectation-maximization likelihood-ratio tests. PLoS One 9:e110679
Yan, Song; Li, Yun (2014) BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing. Bioinformatics 30:480-7
Bizon, Chris; Spiegel, Michael; Chasse, Scott A et al. (2014) Variant calling in low-coverage whole genome sequencing of a Native American population sample. BMC Genomics 15:85
Kang, Jian; Huang, Kuan-Chieh; Xu, Zheng et al. (2013) AbCD: arbitrary coverage design for sequencing-based genetic studies. Bioinformatics 29:799-801
Byrnes, Andrea E; Wu, Michael C; Wright, Fred A et al. (2013) The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 37:666-74
Mao, Xianyun; Li, Yun; Liu, Yichuan et al. (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37:38-47
Huang, Jie; Liu, Eric Y; Welch, Ryan et al. (2013) WikiGWA: an open platform for collecting and using genome-wide association results. Eur J Hum Genet 21:471-3
Duan, Qing; Liu, Eric Yi; Croteau-Chonka, Damien C et al. (2013) A comprehensive SNP and indel imputability database. Bioinformatics 29:528-31
Chen, Wei; Li, Bingshan; Zeng, Zhen et al. (2013) Genotype calling and haplotyping in parent-offspring trios. Genome Res 23:142-51
Liu, Eric Yi; Li, Mingyao; Wang, Wei et al. (2013) MaCH-admix: genotype imputation for admixed populations. Genet Epidemiol 37:25-37