The investigator studies statistical machine learning with sparse regularization in the setting of high dimensional statistical estimation. A number of research directions will be explored, including improved performance bounds for sparse regularization, new sparse learning formulations, and the statistical theory for several important computational algorithms.

In the information age, more and more data become available electronically, and these data need to be automatically analyzed by computers in order to filter out the most important information. Statistical machine learning is the main technical tool for analyzing electronic data. Many modern applications involve data in very high dimension that cannot be handled by traditional algorithms. Sparse regularization is an important new statistical machine learning technique that can deal with this issue by effectively identifying the most significant patterns from a vast amount of available information. This research develops new sparse regularization algorithms that will significantly enhance the capability for modern computer systems to find critical information from available electronic data.

Project Report

Many real world applications involve data with very high dimensionality. To extract values from these data, we have to employ machine learning algorithms using complex statistical models with many parameters. However, these problems are difficult because the number of parameters is much larger than the number of data points. A modern solution to overcome this difficulty is by using sparse regularization, in which the number of nonzero parameters is much smaller than the data dimensionality. The proposed research is to gain better understanding of sparse regularization by studying sparse learning algorithms in the setting of high dimensionalstatistical estimation. A number of research directions have been explored, including improved theoretical understanding for sparse regularization, new sparse learning formulations, and the statistical theory for several important computational algorithms. We have published more than ten scientific papers and disseminated our results via academic conferences, web sites, courses and lectures. Software based on this research has been distributed publicly, and used by many people to analyze real world data. Our methods lead to more accuracy prediction models and faster computational time in a wide range of data analysis applications. These techniques have been used in the industry, and have made significant impact on our modern society where big data and thus methods to analyzing these data become more and more important.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007527
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2010
Total Cost
$250,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854