The constantly increasing dimensionality and complexity of modern data has motivated many new data analysis tools in various fields, and urgently call for rigorous theoretical investigation, such as robustness against different sources of model misspecification,uncertainty quantification in classification and prediction, and statistical performance guarantee of conventional methods under non-standard settings. Although most classical theory are not directly applicable to methods developed for complex data, partially due to highly specialized model assumptions and diversified algorithms, the profound statistical thinking carried in these long-established results can still provide deep theoretical insights. When combined with cutting-edge results in modern context such as random matrix theory, matrix concentration, and convex geometry, these classical theory will lead to novel principled methods for a general class of problems ranging from high dimensional regression and classification to network data analysis and subspace learning. All methods developed in the proposed research will be implemented as standard R packages freely available and will have high pedagogical value and will be used to develop new courses. The proposed research has applications in astronomy and medical screening data. The proposal also provides new inference tools for applied areas in genetics, psychiatry, brain sciences. Integrated educational activities include designing courses on new perspectives in nonparametric statistics and modern multivariate analysis.

The proposed work will further integrate classical nonparametric and multivariate analysis theory with modern elements in four major areas of statistical research, including assumption-free prediction bands in high dimensional regression; a generalized Neyman-Pearson framework for set-valued multi-class classification; statistical performance guarantee of some greedy algorithms in network community detection as well as goodness-of-fit tests for network model selection; and a unified singular value decomposition framework for structured subspace estimation formulated as a convex optimization problem. These research activities will lead to modernized nonparametric and multivariate analysis courses, featuring new theoretical frameworks such as computationally constrained minimax analysis, additional topics such as functional data analysis, and cutting-edge examples in genetics, brain imaging, traffic, and astronomy.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1553884
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2015
Total Cost
$400,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213