This proposal aims to develop new theory and methodology for sufficient dimension reduction. Through a series of well-defined research problems, the investigator develops specific dimension reduction methods for many important applications. In particular, this research proceeds in three main directions. First, the investigator proposes a general approach based on estimating equations to facilitate sufficient dimension reduction. Under this effective and flexible estimating equations framework, the investigator relaxes a restrictive distributional assumption for the classical dimension reduction techniques, and proposes a novel sufficient dimension reduction methodology that handles challenging problems such as missing observations and heteroscedastic modeling. Secondly, outliers are commonly observed in high-dimensional data and yet studied occasionally in the sufficient dimension reduction context. The investigator develops new sufficient dimension reduction procedures which are robust to outliers in the observations. Thirdly, most variable selection techniques in the current literature are model-based. The investigator proposes to extend sufficient dimension reduction methodology that can be applied to variable selection and testing the significance of subsets of predictors in a model-free fashion. The success of this project not only provides effective practical tools for high-dimensional data analysis but also represents an advance in the theory and methodology of semiparametric inference.
The scale and complexity of data sets have increased drastically in light of the development in modern technology. High-dimensional data that involve a large amount of variables are nowadays routinely generated and collected in areas such as environmental studies, human health and medical research, homeland security, government and business administration. High-dimensional data pose many challenges for statisticians and provide considerable momentum in the statistics community to develop new theories and methodologies. Sufficient dimension reduction methodology effectively transforms a high dimensional data problem to a low dimensional one, and thus facilitates many existing statistical methods which used to be hindered by the curse of dimensionality. The investigator proposes a new paradigm that synthesizes and broadens the theories and methodologies of sufficient dimension reduction. The results from this project can be widely applied for regression and classification problems in areas which involve a large number of variables, such as econometrics, finance and bioinformatics. The investigator also has the plan to provide free software packages to academic and industrial users of the proposed procedures. The support of the proposed research helps the investigator in the training of the students at Temple University, where few senior faculty members are available to supervise a large number of graduate students, including many minority and female students.
Sufficient dimension reduction is a supervised dimension reduction methodology, where one aims to find lower-dimensional projections of the multivariate predictors, such that the regression or classification information between the univariate response and the predictors are fully retained by these projections. With the ubiquity of high-dimensional data collected through different subject fields, the methodological development of sufficient dimension reduction and related applications have garnered interests in mathematical statistics, bioinformatics, marketing research, data mining and machine learning, etc. The main outcomes of the project are the theoretical development of the sufficient dimension reduction methodology in the following areas: 1. Connect sufficient dimension reduction with estimation equations. Specifically, this connection allows us to deal with the challenging case where predictor dimension p is larger than the sample size n, and helps relaxation of several stringent assumptions in the existing sufficient dimension reduction literature. 2. Perform robust sufficient dimension reduction. Specifically, complex data structures such as missing values, contaminated data and heteroscedasticity are common in high-dimensional data, and the new research findings can accommodate these challenging data structure and perform robust dimension reduction. 3. Achieve model-free variable selection through sufficient dimension reduction. The new findings greatly extend the scope of the current variable selection and feature screening literature, where most methods are model-based and rely on specific parametric assumptions. Initial findings about interaction detection in the model-free setting are also promising. The findings in the project have been published in leading statistical journals. The principal investigator has presented these findings in several international conferences as well as department colloquium at research institutions both in US and overseas.