Unified Statistical Methods for Sequence-based Association Studies. Fast and economic next generation sequencing (NGS) technologies will generate unprecedentedly massive (thousands of individuals) and high-dimensional (ten millions) genomic and epigenomic variation data that allow nearly complete evaluation of genomic and epigenomic variation including common and rare variants, RNA-seq, mRNA-seq and methylation-seq data. As a consequence, these genomic variation data are so densely distributed across the genome that the genetic variants can be considered as genomic variation observations varying over a continuum. The emergence of NGS technologies is not only changing our view of genomics from independently segregating discrete model to hybrid (both discrete and continuous) models, but also causing great changing in analytic methods for genomic and epigenomic analysis from standard multivariate data analysis to functional data analysis, from independent sampling to dependent sampling, from low dimensional data analysis to high dimensional data analysis, from single genomic or epigenomic variant analysis to integrated genomic and epigenomic analysis. To address the great challenges we are facing in NGS data analysis, the goals of this proposal are to develop novel and powerful statistical methods for sequence-based association studies and QTL (eQTL) analysis which leverage high dimensional data reduction, causal inference and functional data analysis techniques to identify both common and rare risk variants across the genome, investigate their function via intermediate phenotypes and expressions, estimate the total effects (intervention effects) and direct effects of variants on the phenotypes, and unify family and population-based designs. We will evaluate the performance of these methods by simulated and real datasets.
This application is to employ high dimensional data reduction and functional data analysis techniques to develop and test innovative genetic models, statistical methods and computational algorithms for sequence-based association studies and QTL analysis, and unify family and population-based designs using various types of family and unrelated individual data sampled from any population structures.
|Zhao, Jinying; Zhu, Yun; Xiong, Momiao (2016) Genome-wide gene-gene interaction analysis for next-generation sequencing. Eur J Hum Genet 24:421-8|
|Jiang, Junhai; Lin, Nan; Guo, Shicheng et al. (2015) Multiple functional linear model for association analysis of RNA-seq with imaging. Quant Biol 3:90-102|
|Wang, Yifan; Liu, Aiyi; Mills, James L et al. (2015) Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 39:259-75|
|Huang, Jinyan; Chen, Jun; Esparza, Jorge et al. (2015) eQTL mapping identifies insertion- and deletion-specific eQTLs in multiple tissues. Nat Commun 6:6821|
|Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric et al. (2015) Pathway analysis with next-generation sequencing data. Eur J Hum Genet 23:507-15|
|Dong, Chengliang; Wei, Peng; Jian, Xueqiu et al. (2015) Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 24:2125-37|
|Guo, Shicheng; Yan, Fengyang; Xu, Jibin et al. (2015) Identification and validation of the methylation biomarkers of non-small cell lung cancer (NSCLC). Clin Epigenetics 7:3|
|Fan, Ruzong; Wang, Yifan; Boehnke, Michael et al. (2015) Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models. Genetics 200:1089-104|
|Tang, Hongwei; Wei, Peng; Duell, Eric J et al. (2014) Genes-environment interactions in obesity- and diabetes-associated pancreatic cancer: a GWAS data analysis. Cancer Epidemiol Biomarkers Prev 23:98-106|
|Guo, Shicheng; Wang, Yu-Long; Li, Yi et al. (2014) Significant SNPs have limited prediction ability for thyroid cancer. Cancer Med 3:731-5|
Showing the most recent 10 out of 31 publications