The main research objectives of this proposal are related to the field of high-dimensional statistics such as sparse regression or low rank matrix estimation problems which have recently attracted a lot of attention. The investigator intends to develop new methodologies and novel applications and extend the scope of applications for penalized empirical risk minimization, empirical processes and exponential weights estimators. The investigator studies in particular the minimax rates in the noisy matrix completion problem and intends to adapt successful techniques from matrix completion problem to the covariance estimation problem and determine the minimax rate for this problem under the low rank assumption.

The theoretical results developed in this research project are expected to have broader applications as well. The new research results can be applied in many fields: econometrics, marketing, data mining, quantum physics, cosmology, genomic, tomography, climatology and many other fields that require efficient tools for exploring high-dimensional data sets. In particular, a question of crucial interest in all these applications is to determine the set of active variables among a huge set of potential candidates. In genomic, micro-array chip contain the expression of thousands of genes and the goal is to find the few genes responsible for the synthesis of a particular molecule among the entire pool of tested genes. This difficult problem can be tackled efficiently through the techniques studied in this research project.

Project Report

The intellectual merit of the research was that it developed new methodologies and novel applications in the field of high-dimensional statistics. The research made significant contributions to statistics literature on sparse regression, low rank matrix and covariance matrix estimation and extended the scope of applications for penalized empirical risk minimization, empirical processes and exponential weights estimators. New theoretical results were established for these procedures when used in sparse regression and low rank matrix estimation problems and their efficiency was also demonstrated in practice via simulations and applications. Broader Impacts: The research results can be applied in many fields: econometrics, marketing, data mining, quantum physics, cosmology, genomic, tomography, climatology and many other fields that require efficient tools for exploring high-dimensional data sets. The main educational plans were (1) to train PhD students in statistics to become capable researchers in related topics; (2) continue developing new courses on high-dimensional statistics and low rank matrix estimation for students in statistics, quantitative finance, economic, engineering. The research was disseminated through organizing a workshop, conference presentations and publications. The project provided the opportunity to attend a number of conferences and workshops including but not limited to the International Conference Asymptotic Geometric Analysis II in St. Petersburg, Russia and the 43rd Saint-Flour Probability Summer School in France.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1106644
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-08-15
Budget End
2014-07-31
Support Year
Fiscal Year
2011
Total Cost
$100,001
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332