The amount and complexity of data generated to support modern scientific studies continues to grow rapidly. Large data sets, characterized by many variables or features and/or many samples, are now commonly studied in fields ranging from finance and biomedical sciences to geoscience and engineering. Such large complex data pose a number of statistical and computational challenges that are absent in more traditional statistical tools, where sample size is required to be much larger than the number of features or variables. At the same time, they present unprecedented opportunities to statistics discipline. For these super-high dimensional data, practical statistical methods with rigorously-established properties, while remain difficult, become more important than ever to many frontier scientific studies like climate modeling, portfolio allocation and risk management, quantum computation and quantum communication, gene expression study, and image understanding. This project studies the estimation of (i) large covariance matrices; (ii) large volatility matrices in high-frequency finance; (iii) large density matrices in quantum information science. The investigator intends to develop novel statistical methodologies and theories via sparsity for the large matrix inference problems based on complex super-high dimensional data. The research project has great potential to make a significant impact on the broad scientific community.

Digital revolution has a profound impact on data collections in scientific research and knowledge discovery, and technological advances make it possible to collect data with relatively low costs. As a result, the amount and complexity of data generated to support modern scientific studies continues to grow rapidly. Large data sets are now commonly used in fields ranging from finance and biomedical sciences to geoscience and engineering. Such large scale, complex data pose a number of statistical and computational challenges that are absent in more traditional statistical tools. At the same time, they present unprecedented opportunities to statistics. For these data sets, valid statistical methods become more important than ever to many frontier scientific studies like climate modeling, portfolio allocation and risk management, quantum computation and quantum communication, gene expression study, and image understanding. The research project creates advanced effective statistical tools for the analysis of such vast complex data. The investigator actively engages in activities to integrate research with student training and address applications in the fields of biomedical sciences, geoscience, finance, and quantum information science.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1005635
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-07-01
Budget End
2015-06-30
Support Year
Fiscal Year
2010
Total Cost
$529,978
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715