Unified Statistical Methods for Sequence-based Association Studies. Fast and economic next generation sequencing (NGS) technologies will generate unprecedentedly massive (thousands of individuals) and high-dimensional (ten millions) genomic and epigenomic variation data that allow nearly complete evaluation of genomic and epigenomic variation including common and rare variants, RNA-seq, mRNA-seq and methylation-seq data. As a consequence, these genomic variation data are so densely distributed across the genome that the genetic variants can be considered as genomic variation observations varying over a continuum. The emergence of NGS technologies is not only changing our view of genomics from independently segregating discrete model to hybrid (both discrete and continuous) models, but also causing great changing in analytic methods for genomic and epigenomic analysis from standard multivariate data analysis to functional data analysis, from independent sampling to dependent sampling, from low dimensional data analysis to high dimensional data analysis, from single genomic or epigenomic variant analysis to integrated genomic and epigenomic analysis. To address the great challenges we are facing in NGS data analysis, the goals of this proposal are to develop novel and powerful statistical methods for sequence-based association studies and QTL (eQTL) analysis which leverage high dimensional data reduction, causal inference and functional data analysis techniques to identify both common and rare risk variants across the genome, investigate their function via intermediate phenotypes and expressions, estimate the total effects (intervention effects) and direct effects of variants on the phenotypes, and unify family and population-based designs. We will evaluate the performance of these methods by simulated and real datasets.

Public Health Relevance

This application is to employ high dimensional data reduction and functional data analysis techniques to develop and test innovative genetic models, statistical methods and computational algorithms for sequence-based association studies and QTL analysis, and unify family and population-based designs using various types of family and unrelated individual data sampled from any population structures.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM104411-01
Application #
8430847
Study Section
Special Emphasis Panel (ZGM1-GDB-7 (CP))
Program Officer
Krasnewich, Donna M
Project Start
2013-04-01
Project End
2017-01-31
Budget Start
2013-04-01
Budget End
2014-01-31
Support Year
1
Fiscal Year
2013
Total Cost
$360,450
Indirect Cost
$94,667
Name
University of Texas Health Science Center Houston
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
800771594
City
Houston
State
TX
Country
United States
Zip Code
77225
Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric et al. (2015) Pathway analysis with next-generation sequencing data. Eur J Hum Genet 23:507-15
Ma, Baoshan; Wilker, Elissa H; Willis-Owen, Saffron A G et al. (2014) Predicting DNA methylation level across human tissues. Nucleic Acids Res 42:3515-28
Ma, Baoshan; Huang, Jinyan; Liang, Liming (2014) RTeQTL: Real-Time Online Engine for Expression Quantitative Trait Loci Analyses. Database (Oxford) 2014:
Fan, Ruzong; Wang, Yifan; Mills, James L et al. (2013) Functional linear models for association analysis of quantitative traits. Genet Epidemiol 37:726-42
Liang, Faming; Xiong, Momiao (2013) Bayesian detection of causal rare variants under posterior consistency. PLoS One 8:e69633