This proposal focuses on the development of hierarchical models and parallelized Bayesian inference for the analysis of RNA sequencing (RNAseq) data. Special emphasis is placed on gene expression profiling of parental inbred lines and their hybrid offspring for the discovery of key genes underlying heterosis, the genetic phenomenon otherwise known as hybrid vigor. The project will be led by a collaborative team of researchers with expertise in the analysis of high-dimensional gene expression data, Bayesian inference, bioinformatics, biology, computational methods, genetics, genomics, and statistics. The proposed research provides new tools for the analysis of high-dimension and low-sample-size count data generated by RNAseq technology. Hierarchical modeling allows for flexible information sharing across dimensions to extract as much information as possible from data. Parallel methods for Bayesian inference harness the power of modern computing to produce comprehensive results in a timely manner. Specific methods will be developed for (i) the identification of genes that exhibit expression heterosis, (ii) the detection of expressed and non-expressed genes, and (iii) the discovery of differential allele usage in hybrids. These methods will provide a deeper understanding of the molecular mechanisms of heterosis and lead to the discovery of key genes whose expression patterns provide hybrids with advantages over their parents. This information can be used to efficiently predict which of thousands of possible crosses will result in top performing hybrids. In addition to the specific methods mentioned above, hierarchical generalized linear models for the simultaneous analysis of tens of thousands of response variables will be developed. This work will permit the analysis of RNAseq data from complex designs with multiple sources of variability and will greatly extend the range of applicability for the funded research to encompass a variety of challenges in high-dimensional data analysis.

Public Health Relevance

The proposed work will provide medical researchers with advanced tools for studying the functions of genes in complex biological systems. The enhanced understanding of gene functions obtained with the developed tools can deepen understanding of diseases and lead to new treatments for the improvement of public health.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM109458-01
Application #
8639666
Study Section
Special Emphasis Panel (ZGM1-BBCB-5 (BM))
Program Officer
Brazhnik, Paul
Project Start
2013-09-01
Project End
2017-05-31
Budget Start
2013-09-01
Budget End
2014-05-31
Support Year
1
Fiscal Year
2013
Total Cost
$272,039
Indirect Cost
$83,732
Name
Iowa State University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
005309844
City
Ames
State
IA
Country
United States
Zip Code
50011
Liang, Kun; Du, Chuanlong; You, Hankun et al. (2018) A hidden Markov tree model for testing multiple hypotheses corresponding to Gene Ontology gene sets. BMC Bioinformatics 19:107
Kusmec, Aaron; Srinivasan, Srikant; Nettleton, Dan et al. (2017) Distinct genetic architectures for phenotype means and plasticities in Zea mays. Nat Plants 3:715-723
Lin, Hung-Ying; Liu, Qiang; Li, Xiao et al. (2017) Substantial contribution of genetic variation in the expression of transcription factors to phenotypic variation revealed by eRD-GWAS. Genome Biol 18:192
Guan, Xin; Okazaki, Yozo; Lithio, Andrew et al. (2017) Discovery and Characterization of the 3-Hydroxyacyl-ACP Dehydratase Component of the Plant Mitochondrial Fatty Acid Synthase System. Plant Physiol 173:2010-2028
Nguyen, Yet; Nettleton, Dan; Liu, Haibo et al. (2015) Detecting Differentially Expressed Genes with RNA-seq Data Using Backward Selection to Account for the Effects of Relevant Covariates. J Agric Biol Environ Stat 20:577-597
Benidt, Sam; Nettleton, Dan (2015) SimSeq: a nonparametric approach to simulation of RNA-sequence datasets. Bioinformatics 31:2131-40
Niemi, Jarad; Mittman, Eric; Landau, Will et al. (2015) Empirical Bayes analysis of RNA-seq data for detection of gene expression heterosis. J Agric Biol Environ Stat 20:614-628
Liu, Fangfang; Wang, Chong; Liu, Peng (2015) A Semi-parametric Bayesian Approach for Differential Expression Analysis of RNA-seq Data. J Agric Biol Environ Stat 20:555-576
Lithio, Andrew; Nettleton, Dan (2015) Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes. J Agric Biol Environ Stat 20:598-613
Ji, Tieming; Liu, Peng; Nettleton, Dan (2014) Estimation and Testing of Gene Expression Heterosis. J Agric Biol Environ Stat 19:319-337