This research project contains a specific research agenda that aims at a general understanding of high dimensional nonparametric eigenanalysis through three fundamental and complementary directions: (1) regularized estimation and variable selection, (2) statistical inference of eigenstructure, including hypothesis testing and confidence statements, and (3) minimax rates and complexity theoretic limits. The first direction will offer flexible regularization and variable selection of eigenstructure through regression based techniques, regularized power methods and other efficient algorithms. The second direction will lead to valid nonparametric testing procedures and confidence sets for eigenstructure with intrinsic sparsity. The third direction will characterize the fundamental limits of inferential accuracy in high dimensional eigenanalysis and its difference from what can be achieved by computationally efficient algorithms. The research will bring in new perspectives from other disciplines such as convex geometry, information theory and theoretical computer science to break new ground in high dimensional statistics. An effort will be made to seek out collaborations with scientists in topics that are related to the methodological and theoretical work.

Advances in science and technology have generated datasets of increasingly high dimensionality in many fields such as medical imaging, climate studies, and bioinformatics. When processing data arising in these applications, eigenanalysis of certain matrix objects, such as the covariance matrix, plays a central role in summarizing and unveiling the underlying structure of data, which often has intrinsic sparsity. This research project spans multiple directions, each aiming at advancing the methodological development and the theoretical understanding of high dimensional eigenanalysis from a different fundamental aspect. Upon completion, it will lead to a comprehensive general understanding of high dimensional eigenanalysis that will significantly enhance the ability to employ it in a wide range of scientific settings. The methodological development will be disseminated to the broader scientific community through user-friendly software packages that will serve as the basis for potential applications. The research output will also be effectively integrated with educational activities such as course development and mentoring at both undergraduate and graduate levels.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Application #
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
United States
Zip Code