In many application areas, high-throughput devices enable one to gather thousands and sometimes millions of measurements in very short periods of time. This research project addresses statistical problems that arise from Big Data areas such as brain imaging, genomics and genetics, and social networks. The research contains several components: (a) collect and clean large-scale data sets for social networks; (b) develop new models, methods, and theory for analyzing large social networks; and (c) develop new methods and theory for analyzing genomic and genetic data and brain imaging data. The research will have impact in cancer research, neuroscience, and social sciences.

The flood of high-throughput measurements is driving a new branch of statistical practice called Large-Scale Inference. This research project aims to exploit various types of sparsity (for example, signal sparsity, graphical sparsity, sparsity in eigenvalues and leading eigenvectors) to address (a) network data collection and preprocessing, (b) statistical modeling for different types of networks, (c) computationally feasible approaches to precision matrix estimation, (d) cancer classification and clustering, and (e) sparse principal component analysis and post-selection random matrix theory. The research will lead to new data sets and new methods and theory in analyzing data in application areas such as social networks, genomics and genetics, and brain imaging.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1513414
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2015-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2015
Total Cost
$149,998
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213