Studies carried out at the genome-wide level now play a central role in modern biology and medicine. There continues to be a substantial need for new statistical methods that can be applied in these studies, particularly as study designs become more ambitious, sample sizes increase, and new technologies emerge. The overall goal of the proposed research is to develop statistical methods and software useful in understanding high- throughput molecular profiling data centered around characterizing genome-wide gene expression. We propose to develop statistical models, methods, and software that allow one to rigorously characterize variation of gene expression in terms of both study design and latent variables. Our proposed research is particularly focused on the most modern form of gene expression profiling, RNA-Seq, as well as the most ambitious and biologically fruitful problems currently being studied. We will develop rigorous, flexible, and robust models of variation in high-throughput data that encompass: (i) latent sources of systematic variation and more general sources of dependence among features, (ii) a principled dissection of sources of variation in next-generation RNA-Seq data and new methods for emerging RNA-Seq data, (iii) rigorous multiple hypothesis testing of associations between genomics features and latent variables, (iv) simultaneous inference of complex, yet commonly sought after statistical hypotheses, and (v) dissemination to the greater research community through user-friendly and platform independent software packages.
Measuring genome-wide gene expression variation has been a revolutionary tool in biomedicine over the past decade. This is a data-centric endeavor that requires new and sophisticated statistical methods in order to arrive at sound biological conclusions. The proposed work will make novel contributions to statistical methods and software that will be applied to genome-wide gene expression studies in humans and model organisms, for both microarray data and next generation sequencing data.
Showing the most recent 10 out of 25 publications