The investigator studies the variable selection problem in nonparametric smoothing and regression models. In particular, a class of new regularization methods is developed for simultaneous variable selection and model fitting in the smoothing spline ANOVA models. One such method is the "cosso", which applies a novel soft thresholding type operation to the functional components in a reproducing kernel Hilbert space. In Gaussian regression, the cosso selects the correct model structure with the probability tending to one under certain mild conditions. To handle complex heterogeneous datasets with various types of responses, the investigator further extends the new methods to more complicated statistical models, such as generalized regression models, support vector machines, and proportional hazards regression models. Theoretical properties of the estimators like model consistency and the rate of convergence are investigated. Efficient numerical algorithms and user-friendly software are developed for public use.
Variable selection helps to reduce the dimension of model building, to improve the model accuracy, and to better understand the underlying mechanism that generates data. This research is motivated by the lack of theoretical work in nonparametric variable selection and the limits of existing approaches. The investigator establishes a unified framework for simultaneous variable selection and model estimation in smoothing spline ANOVA models, and contributes new theories to related variational methods. This work broadens the traditional understanding of nonparametric smoothing approaches, and eventually will help to generate new methods in statistical inference. In practice, high dimensional large datasets produced in modern sciences such as in medicine and biology, often with tens or hundreds of variables, demand more sophisticated tools for dimension reduction and model estimation. The methodology developed in this work already has successful applications in some real problems, and it will potentially make a significant impact in various fields.