The main objective of this project is to develop new methodology and theory for statistical challenges motivated by integrative genomics, a collection of quantitative approaches in genomics research that centers around the joint analysis of multiple datasets. Integrative analysis has tremendous potential to precipitate the next wave of scientific discoveries in genomics and is also a crucial component of emerging conceptions of data science in general. This project aims to develop new analysis procedures for 1) detecting whether two or more datasets share the same significant genomic features, 2) identifying these features, and 3) leveraging them to improve genomic prediction models.
This project opens important questions in integrative genomics up to rigorous methodological and theoretical development by framing them in terms of cutting-edge statistical issues, including signal detection, multiple testing, and high-dimensional classification. The proposed methodological research will develop new nonparametric tests and false discovery rate control procedures for detecting and identifying shared genomic features, as well as new nonparametric empirical Bayes approaches for high-dimensional integrative classification and regression. The proposed theoretical research will explore the fundamental limits of these problems. The results of this project will lead to more powerful and rigorous methods and theory for integrative analysis in genomics and elsewhere.