The proposed research aims to developing a general formulation and the related methods for sufficient dimension reduction (SDR) where a specific functional (or parameter) of the conditional distribution is of interest. The past two decades have seen vigorous development of the SDR methods and have accrued a striking record of their successful applications. However, to a large extent these methods treat the conditional distribution as the object of interest, without discriminating between parameter of interest and nuisance parameter. While there are methods that target statistical functionals, they are specific to the parameter in consideration and as such are difficult to apply to other parameters. The investigators propose a new paradigm for SDR that focuses on a functional of the conditional distribution, which can be any one in a very wide class that covers most of applications. In addition, the investigators propose to develop a coherent collection of associated techniques for estimation, computation, and asymptotic inference.

High throughput technologies that produce massive amount of complex and high-dimensional data are increasingly prevalent in such diverse areas as business, government administration, environmental studies, machine learning, and bioinformatics. These provide considerable momentum in the Statistics community to develop new theories and methodologies, and to reformulate the existing ones, that are capable of discovering critical evidence from high-dimensional and massive data. SDR is a recent area of statistical research that arose amidst, and has been propelled by, these new demands. The investigators propose to reformulate the theories and methodologies of SDR so that they can be specifically tailored to target to be estimated. This new paradigm not only synthesizes, broadens, and deepens the recent advances in SDR, but brings the understanding of SDR on a par with classical statistical inference theory, by following the tradition of sufficiency, efficiency, information, parameter of interests, and nuisance parameters, which are the key ideas that has helped to propel classical inference to its maturity.

Project Report

Modern technology in scientific research and computing, particularly those related to machine learning, bioinformatics, pattern recognition, and finance, often create large quantities of high dimensional data. But they raise new questions and provide fresh momentum for contemporary statistical research. That is, with increased volumes and dimensions, data sets often come with increased redundancy and irrelevancy. As a result, how to deal with redundancy and irrelevancy in vast amount of data by appropriately reducing the data, and thereby single out useful and informative variables, has become one of the focal points of contemporary statistical research. Dimension reduction and variable selection are the two fast growing areas that reflect these new challenges. The main goals of proposed research are to investigate and develop nonparametric methods of dimension reduction and variable selection, and in particular a systematic method that can target specific aspects of the underlying distribution, such as means, medians and quantiles. We have made the following developments towards, or related to, the proposed research: 1. We have investigated the local nature of dimension reduction as well as the rules it follows when we aggregate the local results (with Bing Li). 2. We have introduced a class of ensemble estimators that incorporate the dimension reduction estimators for conditional mean functions to retrieve the information about the whole conditional distribution (with Bing Li). 3. We have developed variable selection approaches using dimension reduction of forward approach and with two types of variables, and inverse approach (with Qin Wang). 4. We have introduced variable selection with dimension reduction for correlated and large p small n data, as well as to longitudinal data (with Lexin Li; Li-Ping Zhu and Lixing Zhu). 5. We have initialed research of dimension reduction and provide foundation of such research in time series data (With TN Sriram and JH Park). 6. We have introduced more general association and dimension reduction to multivariate response variables (with TN Sriram; TN Sriram and Ross Iaci; TN Sriram, Ross Iaci and Chris P. Klingenberg); 7. We have introduced information theory with dimension reduction (With Bing Li and Dennis Cook). 8. We have applied dimension reduction methods and local approaches to experimental design data, and stock market data (with Lynne Seymour; Chongshan Zhang) Most of these works have appeared in, or accepted by, leading or top quality statistical journals. In particular, Project 1 is near completion; Project 2 has appeared in the Annals of Statistics; Projects in item 3 have appeared in Computational Statistics and Data Analysis, and Statistics and Probability Letters; Projects in item 4 have appeared in Biometrics, Computational Statistics and Data Analysis, and Journal of Computational and Graphical Statistics; Projects in item 5 have appeared in Journal of Computational and Graphical Statistics, and Statistica Sinica; Projects in item 6 have appeared in Statistica Sinica, Biometrics and Journal of American Statistical Association; Project 7 has appeared in the Journal of Multivariate Analysis; Projects in item 8 have appeared in Journal of Quality Technology and Quantitative Management, and have been accept by Quantitative Finance. The outcomes resulted from this project have strong impacts on modeling modern data and advanced research in data analysis in general. The project also supports education and training of new researchers: under the period of this grant, the investigator has advised two PhD graduates who are working for the federal government and a research university, while currently advising five PhD candidates. The investigator also developed a new course on dimension reduction for graduate students, which will be further modified to materials for users encountering large data.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0806120
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2008-07-01
Budget End
2012-06-30
Support Year
Fiscal Year
2008
Total Cost
$124,197
Indirect Cost
Name
University of Georgia
Department
Type
DUNS #
City
Athens
State
GA
Country
United States
Zip Code
30602