Biologists are rapidly adopting RNA-Sequencing (RNA-Seq) to study transcriptomes for basic understanding of cellular functions and to address important needs in such areas as food production, food security, pharmacy, human health, disease treatment, and disease prevention. Statistical tools for complete and specific analysis of RNA-Seq data, however, have been slow to emerge, and the use of off-the-shelf tools developed for other applications has the strong potential to produce misleading conclusions. Germane to the goals of this proposal, sophisticated methods for assessing differential gene expression from RNA-Seq- based on a negative binomial (NB) exact test for two-group comparisons-have not yet been extended to regression analysis. Such methods are required for assessing differential gene expression after accounting for covariates, for analyzing the dependence of expression on explanatory variables, and for studying interactive effects on expression of multiple factors. The objectives of this proposal are to address this need in the following ways: 1) develop, assess, and implement higher-order asymptotic (HOA) adjustments to likelihood ratio inference for NB regression analysis of RNA-Seq data, including the preparation of a publicly-available R package for complete regression analysis of RNA-Seq data, and the inclusion of the inferential computations in an already publicly available, Perl-based computational pipeline for complete analysis of RNA-Seq data;2) clarify the power of optimal inference for RNA-Seq studies and provide a computer program for assessing sample size needs;and 3) develop an interactive, dynamic visualization program for conveying RNA-Seq data, NB regression model results, and associated uncertainties. The methods used include the application of higher-order asymptotic theory, Monte Carlo simulation, the development of Level of Detail (LOD) "focus plus context" visualization methods, and serious attention to real RNA-Seq datasets.
The tools developed from the work proposed herein have direct relevance to human health because RNASeq-based transcriptome profiling has broad applications in nearly all areas of biological inquiry, including human health, disease treatment, and disease prevention.
|Wong, Carmen P; Hsu, Anna; Buchanan, Alex et al. (2014) Effects of sulforaphane and 3,3'-diindolylmethane on genome-wide promoter methylation in normal prostate epithelial cells and prostate cancer cells. PLoS One 9:e86787|
|Beaver, Laura M; Buchanan, Alex; Sokolowski, Elizabeth I et al. (2014) Transcriptome analysis reveals a dynamic and differential transcriptional response to sulforaphane in normal and prostate cancer cells and suggests a role for Sp1 in chemoprevention. Mol Nutr Food Res 58:2001-13|
|Chang, Jeff H; Desveaux, Darrell; Creason, Allison L (2014) The ABCs and 123s of bacterial secretion systems in plant pathogenesis. Annu Rev Phytopathol 52:317-45|
|Di, Yanming; Emerson, Sarah C; Schafer, Daniel W et al. (2013) Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data. Stat Appl Genet Mol Biol 12:49-70|