Genome-wide association studies (GWAS) have become the primary approach for dissecting the genetic basis of complex diseases and are a powerful approach for detecting common alleles that influence disease risk. To date, hundreds of putative disease gene loci have been identified in GWAS. Despite this progress, these newly discovered loci typically account for only a small fraction of disease heritability. This raises new questions about where and how we can find the remaining genetic variation contributing to the susceptibility of complex and common diseases. Potential sources of missing heritability are (1) the contribution of rare variants, (2) gene-gene and gene-environment interaction, (3) combination of multiple SNPs, each with small genetic effect, but collectively conferring large risk, (4) structural variation. Current statistical methods for genetic analysis are well suited for detecting common variants, but new models and methods of analysis are needed for revealing the sources of missing disease heritability. To this end, the goals of this proposal are to develop novel and powerful statistical methods for studying rare variants and gene-gene interactions in the context of next-generation sequencing and GWAS data. Specifically, the methods we will develop will provide a unified analytical framework for testing associations with both common and rare alleles as well as their interaction with genetic and environmental factors. We will also develop graphical models and other statistical methods for co-association and interaction network analysis. The power of these methods will be rigorously analyzed by theoretical and simulation approaches, and will be applied to existing GWAS data sets (psoriasis and rheumatoid arthritis) and next generation sequencing data of extreme cardiovascular phenotypes funded by NIH grant 1RC2 HL02419-01.
This project aims to develop novel and powerful statistical methods for genetic association and interaction analysis of next-generation sequencing data and finding missing heritability unexplained by the current GWAS. Application of these methods to the sequence data will facilitate to identify entire spectrum of genetic variations that influence diseases and provide potential valuable tools for the development of diagnostic and interventional strategies.
|Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric et al. (2015) Pathway analysis with next-generation sequencing data. Eur J Hum Genet 23:507-15|
|Tang, Hongwei; Wei, Peng; Duell, Eric J et al. (2014) Genes-environment interactions in obesity- and diabetes-associated pancreatic cancer: a GWAS data analysis. Cancer Epidemiol Biomarkers Prev 23:98-106|
|Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao (2014) Epistasis analysis for quantitative traits by functional regression model. Genome Res 24:989-98|
|Wei, Peng; Tang, Hongwei; Li, Donghui (2014) Functional logistic regression approach to detecting gene by longitudinal environmental exposure interaction in a case-control study. Genet Epidemiol 38:638-51|
|Hong, Shengjun; Chen, Xiangning; Jin, Li et al. (2013) Canonical correlation analysis for RNA-seq co-expression networks. Nucleic Acids Res 41:e95|
|Fu, Wenqing; Akey, Joshua M (2013) Selection and adaptation in the human genome. Annu Rev Genomics Hum Genet 14:467-89|
|Siu, Hoicheong; Jin, Li; Xiong, Momiao (2012) Manifold learning for human population structure studies. PLoS One 7:e29901|
|Luo, Li; Boerwinkle, Eric; Xiong, Momiao (2011) Association studies for next-generation sequencing. Genome Res 21:1099-108|
|Siu, Hoicheong; Zhu, Yun; Jin, Li et al. (2011) Implication of next-generation sequencing on association studies. BMC Genomics 12:322|
|Chen, Gary K; Chen, Gary; Wei, Peng et al. (2011) Incorporating biological information into association studies of sequencing data. Genet Epidemiol 35 Suppl 1:S29-34|