The project aims to develop effective penalization methods for screening, dimension reduction, and variable selection in high dimensional regression. The investigators focus mainly on multiple index models, because this type of models combines the strengths of linear and nonparametric regression while avoiding their drawbacks. A novel penalization approach is employed for model fitting, which regularizes both the parametric and nonparametric components of a multiple index model. A pilot study shows that this approach is more advantageous than other existing ones. When facing ultra-high dimensionality, the investigators use a forward variable screening procedure to reduce the dimension to a manageable size before applying the proposed penalization. The investigators plan to study the theoretical properties of this approach and develop fast and efficient computing algorithms for its implementation. The proposed approach is further extended to applications involving categorical responses or random effects.

Advances in science and technology have led to an explosive growth of massive data across a variety of areas such as bioinformatics, climate research, internet, etc. Traditional statistical methods for clustering, regression and classification become ineffective when dealing with a large number of variables. Lately, a tremendous amount of research effort has been dedicated to the development of statistical methods such as dimension reduction and variable selection for analyzing this type of massive data. The investigators join the effort by proposing a novel penalization approach and developing efficient computing algorithms. The results from this project not only advance statistical research but also help other scientists and researchers better understand and analyze their massive data and hence enhance their scientific discovery.

Project Report

This project focused on the development of novel statistical theory, methods, and algorithms for analyzing high-dimensional data, which are common nowadays in scientific research, engineering improvement, business and government decision, etc. Particularly, throughout this project, PI has been working with his collaborators on the penalization methods for screening, variable selection and dimension reduction in different regression models. PI has proposed efficient algorithms for variable selection in generalized index models, variable selection in regression models with linear constraints, variable selection for Cox regression for survival data, dimension reduction for regression with tensor-valued predictors, etc. The theoretical properties of the estimates were also obtained. The results have been summarized in papers for publications, and have also been presented in various professional conferences and meetings. These theory and algorithms contribute greatly to the literatures on variable selection and dimension reduction in regression. They provide valuable insight on the modeling and application of semi-parametric regression models, and broaden the scope of the existing theory and algorithms. The proposed algorithms have been implemented as computer software, which are freely accessible at PI’s website. They provide novel and efficient tools for scientists and investigators in other disciplines to analyze the data they collected, which indirectly enhance the advance of science and technology. Some results are also included as course materials for teaching, which help students learn how to apply these methods in high-dimensional data analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1107029
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-06-15
Budget End
2014-05-31
Support Year
Fiscal Year
2011
Total Cost
$100,000
Indirect Cost
Name
Auburn University
Department
Type
DUNS #
City
Auburn
State
AL
Country
United States
Zip Code
36849