Biologists are rapidly adopting RNA-Sequencing (RNA-Seq) to study transcriptomes for basic understanding of cellular functions and to address important needs in such areas as food production, food security, pharmacy, human health, disease treatment, and disease prevention. Statistical tools for complete and specific analysis of RNA-Seq data, however, have been slow to emerge, and the use of off-the-shelf tools developed for other applications has the strong potential to produce misleading conclusions. Germane to the goals of this proposal, sophisticated methods for assessing differential gene expression from RNA-Seq- based on a negative binomial (NB) exact test for two-group comparisons-have not yet been extended to regression analysis. Such methods are required for assessing differential gene expression after accounting for covariates, for analyzing the dependence of expression on explanatory variables, and for studying interactive effects on expression of multiple factors. The objectives of this proposal are to address this need in the following ways: 1) develop, assess, and implement higher-order asymptotic (HOA) adjustments to likelihood ratio inference for NB regression analysis of RNA-Seq data, including the preparation of a publicly-available R package for complete regression analysis of RNA-Seq data, and the inclusion of the inferential computations in an already publicly available, Perl-based computational pipeline for complete analysis of RNA-Seq data;2) clarify the power of optimal inference for RNA-Seq studies and provide a computer program for assessing sample size needs;and 3) develop an interactive, dynamic visualization program for conveying RNA-Seq data, NB regression model results, and associated uncertainties. The methods used include the application of higher-order asymptotic theory, Monte Carlo simulation, the development of Level of Detail (LOD) """"""""focus plus context"""""""" visualization methods, and serious attention to real RNA-Seq datasets.
The tools developed from the work proposed herein have direct relevance to human health because RNASeq-based transcriptome profiling has broad applications in nearly all areas of biological inquiry, including human health, disease treatment, and disease prevention.
Showing the most recent 10 out of 15 publications