With the ever-growing amount of data in many application areas, effective methods for detecting factors influencing the value of a response variable are in high demand. It is of growing importance to develop methods for detecting variables that exert significant nonlinear response. Inspired by the sliced inverse regression method developed in the early 1990s, the PI proposes a general framework for developing effective variable selection strategies in nonlinear systems of high dimension. The PI will further study theoretical properties of these variable selection algorithms. The proposed theoretical investigation will provide theoretical understanding of limitations of existing dimension-reduction techniques when the dimensionality grows with the sample size.

With the ever-growing amount of data in many application areas, effective methods for detecting factors that may influence the value of a target quantity of interest (response variable) are in high demand. The problem is termed as "variable (or feature) selection" in regression modeling and statistical learning, and is a long-standing problem in statistics and machine learning. The PI focuses here on the detection of factors that may exert nonlinear and/or interactive effects on the response variable. Recent studies from the PI's group reveal that the sliced inverse regression (SIR) and inverse modeling strategies provide a powerful framework for developing effective variable selection strategies in nonlinear systems of high dimension. The PI aims at developing more robust and effective tools for detecting such complex relationships and studying theoretical properties of SIR-based algorithms. The proposed method will also be applicable to do robust variable selection for classification problems. The proposed theoretical investigations will provide (a) theoretical understanding of limitations of existing dimension-reduction techniques when the dimensionality grows with the sample size; (b) guidance on the construction of necessary sparsity conditions that can guarantee consistency of variable selections in ultra-high dimensional nonlinear problems; (c) the optimal convergence rate of that the best possible learning algorithm can achieve in such settings; and (d) theoretical justifications whether the proposed algorithms can achieve or are not far from the optimality.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1613035
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-08-01
Budget End
2020-07-31
Support Year
Fiscal Year
2016
Total Cost
$200,000
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138