The investigator studies the variable selection problem in nonparametric smoothing and regression models. In particular, a class of new regularization methods is developed for simultaneous variable selection and model fitting in the smoothing spline ANOVA models. One such method is the "cosso", which applies a novel soft thresholding type operation to the functional components in a reproducing kernel Hilbert space. In Gaussian regression, the cosso selects the correct model structure with the probability tending to one under certain mild conditions. To handle complex heterogeneous datasets with various types of responses, the investigator further extends the new methods to more complicated statistical models, such as generalized regression models, support vector machines, and proportional hazards regression models. Theoretical properties of the estimators like model consistency and the rate of convergence are investigated. Efficient numerical algorithms and user-friendly software are developed for public use.

Variable selection helps to reduce the dimension of model building, to improve the model accuracy, and to better understand the underlying mechanism that generates data. This research is motivated by the lack of theoretical work in nonparametric variable selection and the limits of existing approaches. The investigator establishes a unified framework for simultaneous variable selection and model estimation in smoothing spline ANOVA models, and contributes new theories to related variational methods. This work broadens the traditional understanding of nonparametric smoothing approaches, and eventually will help to generate new methods in statistical inference. In practice, high dimensional large datasets produced in modern sciences such as in medicine and biology, often with tens or hundreds of variables, demand more sophisticated tools for dimension reduction and model estimation. The methodology developed in this work already has successful applications in some real problems, and it will potentially make a significant impact in various fields.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0405913
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2004-07-15
Budget End
2008-06-30
Support Year
Fiscal Year
2004
Total Cost
$124,936
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695