This proposal focuses on the development of hierarchical models and parallelized Bayesian inference for the analysis of RNA sequencing (RNAseq) data. Special emphasis is placed on gene expression profiling of parental inbred lines and their hybrid offspring for the discovery of key genes underlying heterosis, the genetic phenomenon otherwise known as hybrid vigor. The project will be led by a collaborative team of researchers with expertise in the analysis of high-dimensional gene expression data, Bayesian inference, bioinformatics, biology, computational methods, genetics, genomics, and statistics. The proposed research provides new tools for the analysis of high-dimension and low-sample-size count data generated by RNAseq technology. Hierarchical modeling allows for flexible information sharing across dimensions to extract as much information as possible from data. Parallel methods for Bayesian inference harness the power of modern computing to produce comprehensive results in a timely manner. Specific methods will be developed for (i) the identification of genes that exhibit expression heterosis, (ii) the detection of expressed and non-expressed genes, and (iii) the discovery of differential allele usage in hybrids. These methods will provide a deeper understanding of the molecular mechanisms of heterosis and lead to the discovery of key genes whose expression patterns provide hybrids with advantages over their parents. This information can be used to efficiently predict which of thousands of possible crosses will result in top performing hybrids. In addition to the specific methods mentioned above, hierarchical generalized linear models for the simultaneous analysis of tens of thousands of response variables will be developed. This work will permit the analysis of RNAseq data from complex designs with multiple sources of variability and will greatly extend the range of applicability for the funded research to encompass a variety of challenges in high-dimensional data analysis.
The proposed work will provide medical researchers with advanced tools for studying the functions of genes in complex biological systems. The enhanced understanding of gene functions obtained with the developed tools can deepen understanding of diseases and lead to new treatments for the improvement of public health.
|Guan, Xin; Okazaki, Yozo; Lithio, Andrew et al. (2017) Discovery and Characterization of the 3-Hydroxyacyl-ACP Dehydratase Component of the Plant Mitochondrial Fatty Acid Synthase System. Plant Physiol 173:2010-2028|
|Lin, Hung-Ying; Liu, Qiang; Li, Xiao et al. (2017) Substantial contribution of genetic variation in the expression of transcription factors to phenotypic variation revealed by eRD-GWAS. Genome Biol 18:192|
|Nguyen, Yet; Nettleton, Dan; Liu, Haibo et al. (2015) Detecting Differentially Expressed Genes with RNA-seq Data Using Backward Selection to Account for the Effects of Relevant Covariates. J Agric Biol Environ Stat 20:577-597|
|Benidt, Sam; Nettleton, Dan (2015) SimSeq: a nonparametric approach to simulation of RNA-sequence datasets. Bioinformatics 31:2131-40|
|Niemi, Jarad; Mittman, Eric; Landau, Will et al. (2015) Empirical Bayes analysis of RNA-seq data for detection of gene expression heterosis. J Agric Biol Environ Stat 20:614-628|
|Liu, Fangfang; Wang, Chong; Liu, Peng (2015) A Semi-parametric Bayesian Approach for Differential Expression Analysis of RNA-seq Data. J Agric Biol Environ Stat 20:555-576|
|Lithio, Andrew; Nettleton, Dan (2015) Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes. J Agric Biol Environ Stat 20:598-613|
|Ji, Tieming; Liu, Peng; Nettleton, Dan (2014) Estimation and Testing of Gene Expression Heterosis. J Agric Biol Environ Stat 19:319-337|