Historically statistics has dealt with extracting as much information as possible from a small data set. However, much of modern statistical research focusses on data sets that have enormous numbers of predictors. This phenomenon is a direct result of recent technological advances that have affected various fields of research, such as image processing, computational biology, and finance. This proposal addresses a very important question of fitting nonlinear regression models in high-dimensional situations, where the number predictors may be much larger than the number of observations. Unlike linear or generalized linear models, high-dimensional nonlinear regression is a very young research area that requires systematic and extensive development. Due to the curse of dimensionality, most of the work in this area has been conducted under the assumption that the regression function has a simple additive structure. The investigator proposes novel methodology for fitting index type regression models in high dimensions. The new methods cover models that are either complimentary or strictly more general than the additive models studied before. For each of the methods the proposal presents a computationally efficient fitting algorithm and lays out a plan for establishing theoretical results.

The proposed research is expected to have a broad impact on the practice and education of statistics and related fields. Disciplines such as Computational Biology, Finance, Marketing and Machine Learning are highly interested in the type of methodology that is targeted in this proposal. The investigator plans to systematically develop software for implementing the proposed methods through free software packages and then make them readily available to researchers in the aforementioned fields. The proposed research will also have an impact on the growth and development of the new USC Statistics Ph.D. program. Several students in this young program will be involved in methodology research, algorithm development, and theoretical investigation. They will get hands on experience and guidance in the very important field of high-dimensional statistical inference.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1209057
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2012-06-01
Budget End
2015-05-31
Support Year
Fiscal Year
2012
Total Cost
$129,585
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089