New technologies in many scientific sectors have led to data sets of increasing complexity and size, where the ability to measure and store these vast troves of data has far outpaced the ability to analyze the data to make reproducible scientific discoveries. Examples include genomics and proteomics, neuroimaging, and neural recordings data. Analyzing this big biomedical data is critical to discovering disease biomarkers, making advances in personalized medicine, and understanding the basic workings of complex biological systems. In this work, we seek to develop and study novel statistical learning and multivariate analysis techniques that directly address unresolved problems critical for making discoveries from big scientific data. Additionally, we will use statistical learning techniques to improve introductory statistics education by developing an online personalized learning system for assignment and content delivery.

More specifically, this work will focus on using algorithms for large-scale sparse optimization to inspire and develop a new framework for statistical learning that will prove to have superior empirical and theoretical performance for high-dimensional and highly correlated data. Such data is common in genomics and neuroimaging; the new techniques will be used to identify potential genomic drug targets, to model genetic and brain networks, and for brain decoding from neuroimaging and neural recordings data. We will also use Kronecker product covariances to develop new multivariate analysis models for coupled matrix and tensor data. These techniques will be used to find joint patterns in integrative genomics data and find patterns of brain activity indicative of behavioral or clinical covariates. Overall, this work will develop several critically needed statistical techniques to understand large and complex data, have direct impacts in genomics and brain science where the techniques will be applied in collaboration with scientists, and lead to improvements in introductory statistics education. This award is co-funded by the Directorate for Mathematical and Physical Sciences (MPS) Division of Mathematical Sciences (DMS) and the Directorate for Biological Sciences (BIO) Divisions of Integrative Organismal Systems (IOS) and Emerging Frontiers (EF).

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1554821
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-07-01
Budget End
2021-06-30
Support Year
Fiscal Year
2015
Total Cost
$400,000
Indirect Cost
Name
Rice University
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77005