The main objective of this project is to develop new methodology and theory for statistical challenges motivated by integrative genomics, a collection of quantitative approaches in genomics research that centers around the joint analysis of multiple datasets. Integrative analysis has tremendous potential to precipitate the next wave of scientific discoveries in genomics and is also a crucial component of emerging conceptions of data science in general. This project aims to develop new analysis procedures for 1) detecting whether two or more datasets share the same significant genomic features, 2) identifying these features, and 3) leveraging them to improve genomic prediction models.

This project opens important questions in integrative genomics up to rigorous methodological and theoretical development by framing them in terms of cutting-edge statistical issues, including signal detection, multiple testing, and high-dimensional classification. The proposed methodological research will develop new nonparametric tests and false discovery rate control procedures for detecting and identifying shared genomic features, as well as new nonparametric empirical Bayes approaches for high-dimensional integrative classification and regression. The proposed theoretical research will explore the fundamental limits of these problems. The results of this project will lead to more powerful and rigorous methods and theory for integrative analysis in genomics and elsewhere.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1613005
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-08-15
Budget End
2019-07-31
Support Year
Fiscal Year
2016
Total Cost
$353,663
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820