The proposed research aims to developing a general formulation and the related methods for sufficient dimension reduction (SDR) where a specific functional (or parameter) of the conditional distribution is of interest. The past two decades have seen vigorous development of the SDR methods and have accrued a striking record of their successful applications. However, to a large extent these methods treat the conditional distribution as the object of interest, without discriminating between parameter of interest and nuisance parameter. While there are methods that target statistical functionals, they are specific to the parameter in consideration and as such are difficult to apply to other parameters. The investigators propose a new paradigm for SDR that focuses on a functional of the conditional distribution, which can be any one in a very wide class that covers most of applications. In addition, the investigators propose to develop a coherent collection of associated techniques for estimation, computation, and asymptotic inference.

High throughput technologies that produce massive amount of complex and high-dimensional data are increasingly prevalent in such diverse areas as business, government administration, environmental studies, machine learning, and bioinformatics. These provide considerable momentum in the Statistics community to develop new theories and methodologies, and to reformulate the existing ones, that are capable of discovering critical evidence from high-dimensional and massive data. SDR is a recent area of statistical research that arose amidst, and has been propelled by, these new demands. The investigators propose to reformulate the theories and methodologies of SDR so that they can be specifically tailored to target to be estimated. This new paradigm not only synthesizes, broadens, and deepens the recent advances in SDR, but brings the understanding of SDR on a par with classical statistical inference theory, by following the tradition of sufficiency, efficiency, information, parameter of interests, and nuisance parameters, which are the key ideas that has helped to propel classical inference to its maturity

Project Report

Recent developments in scientific research and computing technology, particularly those related to machine learning, bioinformatics, pattern recognition, and market analysis, often create large quantities of high dimensional data. They raise new questions and provide fresh momentum for contemporary statistical research. One of the new features of these data is that they are often collected without specific designs - or with designs not sufficiently rigorous to be accommodated by classical statistical theories and methodologies. In other words, with increased volumes and dimensions also come the increased redundancy and irrelevancy. As a result, how to deal with redundancy and irrelevancy in vast amount of data by appropriately reducing the data, and thereby single out useful information and connections, has become one of the focal points of contemporary statistical research. Dimension reduction and sparse variable selection are but two fast growing areas that reflect these new challenges. The main goals of proposed research are to investigate and develop nonparametric methods of dimension reduction, and in particular a systematic method that can target specific aspects of the underlying distribution, such as means, medians, quantiles, and variances. We have made the following developments towards, or related to, the proposed research: 1. We have investigated the local nature of dimension reduction as well as the rules it follows when we aggregate the local results (with Xiangrong Yin). 2. We have introduced a class of ensemble estimators that incorporate the dimension reduction estimators for conditional mean functions to retrieve the information about the whole conditional distribution (with Xiangrong Yin). 3. We have introduced a groupwise dimension reduction technique to incorporate the domain information in the predictor (with Lexin Li and Lixing Zhu). 4. We have investigated and quantified the predictive potential of kernal principal components (with Andreas Artemiou). 5. We have also introduced dimension reduction methods for the situations where the predictors are not vectors, but matrices or multi-dimensional arrays. Such type of predictors are increasingly common --- for example, the EEG data, an image, or a video clip are all of this type (with Minkyung Kim and Naomi Altman). 6. We have developed a dimension reduction method that does not make strong assumptions on the distribution of the predictors (with Yuexiao Dong). 7. We have modified the powerful method of support vector machine perform sufficient dimension reduction in a regression setting (with Andreas Artemiou and Lexin Li). 8. We have introduced a reproducing kernel Hilbert space method for sparse estimation of conditional graphical model, and applied the method to gene network analysis (with Hyonho Chun and Hongyu Zhao). Most of these works have appeared in, or have been in the reviewing process of, leading statistical journals. In particular, Project 1 is in second revision for the Annals of Statistics; Project 3 and 8 have appeared in or is accepted by the Journal of American Statistical Association; Project 5 and 7 have appeared in or is accepted by the Annals of Statistics; Project 6 has appeared in Biometrika. Project 1 is near completion.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0806058
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2008-07-01
Budget End
2011-06-30
Support Year
Fiscal Year
2008
Total Cost
$47,039
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802