This research project is to reconstruct high-dimensional sparse signals based on a small number of measurements, possibly corrupted by noise. More precisely, four of the main objectives of the program are: 1) further weaken the conditions and strengthen the results of current methods. The investigator aims to push the boundary of the field by developing new theoretical tools to analyze the current algorithms; 2) analyze the connections among different methods to get a deeper understanding of the nature of sparse signal recovery problem; 3) extend the research to multichannel setup which involves simultaneously recovery of a set of signals; 4) develop courses for both graduate student and undergraduate student. Build research projects for graduate students. Make the undergraduate students at least be aware of the possible limitation of classical methods and suitable alternatives.

Due to advances in science and technology, scientists and engineers are now able to collect and process enormously large data sets of all kinds. Such data sets pose many statistical challenges not encountered in smaller scale studies. One of the key problems in this area is the reconstructing of high-dimensional sparse signals, which is a fundamental problem in signal processing. This and other related problems have attracted much interest in a number of fields including applied mathematics, electrical engineering, statistics, finance, and bioinformatics. The proposed research will benefit applications in these scientific areas, for instance the compression of audio, images, and video signals and the analysis of microarray data. It is also of critical importance in linear regression, signal modeling, and machine learning.

Project Report

Due to the development of science and technology, we are entering the era of high dimensional data. Scientists are now able to generate, collect, and store huge amount of data, of any structure, at a much lower cost than before. These high dimensional data structures bring new challenges to the field of statistics. The work under this research project generated new ideas and techniques for dealing with high dimensional data. It shed a light on the new directions of statistical thinking about large scale data related problems. We have worked on several different data structures, from the high dimensional linear model, nonparametric regression, to large matrices. For each problem, we have proposed new methodologies or improved existing ones. New theories were developed to demonstrate the optimality or near optimality properties of the proposed methods. Algorithms and numerical experiments were carefully designed to show the practical usefulness of these methods. The results under this projects have been applied to finance, neural science, and electronic engineering, and have great impact on these fields. In particular, we have mainly focused on three types of ideas under this research project. The first one is the greedy or stepwise algorithm. This method allows us to select significant features or important variables from a large collection of data one by one. It is intuitive and easy to implement, and the significance of the selected features is naturally presented to researchers and scientists. The second one is the penalized algorithms. We have developed several new penalization methods to solve the high dimensional regression related problems. We demonstrated that these methods have huge advantages over the existing ones in the sense that they are much more robust, much easier to calibrate, and with a much lower computational cost. The third one is the tuning free methods for large matrices. We proposed new methods for large matrices estimation with the properties of tuning free and self calibration, where both of them are extremely desired for applied scientists. The work under this research project connected researchers from different areas such as economics, financial engineering, biostatistics and computer science. It also brings future interdisciplinary collaboration opportunities. This projects also generated many research opportunities for both graduate and undergraduate students. Our work has been integrated into educational activities by developing new courses and modifying existing ones.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1005539
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-07-01
Budget End
2013-06-30
Support Year
Fiscal Year
2010
Total Cost
$150,000
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139