The demands on statistical methodology have grown relentlessly as new technologies for data collection appear. Many of these datasets are unusual by statistical standards: they are massive; they are highly nonlinear; they are contaminated; they contain data which are in fact functions; or the data come from a mechanism which is only partially known. The tasks of estimation, testing, functional testing, pattern discovery, feature extraction, visualization, and comparison require the statistician look at each problem anew. Nonparametric methodology, which has been widely used in one and two dimensions, is also appropriate in these higher dimensions. Particular emphasis will be given to multivariate regression and density estimation problems, and closely related applications such as clustering, mixture estimation, pattern recognition, robust estimation, and dimension reduction. The statistician's view of the scientific method is a continuously improving process of model building, data collection, estimation, criticism, and refinement. However, many practicing statisticians are stymied by an inability to repair poorly fitting models. Of particular interest in this research are methods which provide critical diagnostic information as part of the model estimation task. A focus of this research is a relatively new minimum-distance data-based parametric estimation algorithm, which has been investigated for its robustness properties. The algorithm can be applied to mixture models and spline fitting. An incomplete density model may be fitted, a highly unusual capability that will be explored fully in the context of regression, image processing, clustering, outlier detection, and density estimation. Other novel potential applications include adaptive wavelet thresholding, solution of the mixture of regression problems, and application to models which apply to only a subset of the data. capability that will be explored fully in the context of regression, image processing, clustering, outlier detection, and density estimation. Other novel potential applications include adaptive wavelet thresholding, solution of the mixture of regression problems, and application to models which apply to only a subset of the data.

Research in data analysis and statistical modeling provides intellectual challenges with deep applications in almost every field of natural and social sciences and engineering. The field of nonparametric statistics has made a significant contribution to the success of science with algorithms that are hidden but critical even in the inner workings of cell phones. At a recent National Research Council workshop, numerous scientists identified critical statistical needs in their work with massive data sets: new dimension reduction algorithms, specialized visualization tools for exploring massive data, better clustering algorithms, and techniques for handling nonstationary data. Results from this proposed research directly impact three of these four critical opportunities. This program represents a comprehensive and long-term attack on a host of important data analytic problems in multivariate estimation. Graduate training is significant component of this project. The results will be of long-term theoretical interest and will provide short-term solutions to real-world problems.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0505584
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2005-08-01
Budget End
2009-07-31
Support Year
Fiscal Year
2005
Total Cost
$280,000
Indirect Cost
Name
Rice University
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77005