High dimension and/or complexity is now standard in applications of statistical data analysis, and typically data now is multivariate. Advances in computational resources make it feasible to implement quite sophisticated methods. This supports the development of powerful approaches that systematically take into account the special geometric features intrinsic to multivariate data sets. Especially important is the setting of nonparametric multivariate methods. This, of course, presents conceptual challenges. In particular, multivariate depth and quantile functions now provide a major approach that has become well-established in recent years but also is in active further development. In this project, the PI addresses significant open issues and directions in both the foundations and the applications of this approach. The latter inspire the former, and the former yields tools for the latter. This project advances core statistical science by developing useful extended foundations and underpinnings for multivariate depth and quantile functions. The results have even wider application and broadly enhance the role of statistical science in applications, permitting new kinds of problems to be treated more meaningfully and more powerfully. Central themes of the project are: I. Transformations to Produce Equivariance and Invariance of Statistical Procedures, II. Spatial Depth-Based Trimming to Produce Robustness without Undue Computational Burden, and III. Development and Exploitation of a New Synergy Between Depth Function Methods and Level Set Methods for Treating Contours. Topic I provides tools for the modification of statistical procedures so that they acquire desired certain equivariance or invariance properties that may not hold otherwise. Topic II investigates recent solutions to two related but different problems: (i) robustification of the spatial quantile and outlyingness functions, and (ii) simultaneously computationally easy, robust, and affine equivariant scatter estimators. Topic III investigates a promising but hitherto unexplored synergy between depth function methods on one hand and level set methods on the other. Besides these major thrusts, the project also addresses formulation of multivariate L-statistics, systematic exploration of a depth-outlyingness-quantile-rank paradigm, studies on integrated data depth, and studies on depth methods in functional data analysis. As a whole, the project is intended to have transformative impacts on modern approaches to data handling through statistical science.

Statistical data analysis and modeling now accommodates pressing new arenas of application involving data that is multivariate, using many variables taken together. All areas of science, engineering, government, and industry now routinely involve multivariate data, typically complex in structure and high in number of variables. The three key technical thrusts of this project address important and timely concerns arising in dealing with multivariate data. For example, in dealing with outliers in multivariate data, we need the classification of which points are outliers not to change simply when there is a simple change of coordinate system, such as metric to British. Also, for example, the contours that delineate the middle 50% or 75% or 90% of a data set should be determined efficiently and accurately without undue interference from extreme outlying data points not central to the data. Or, for example, when striking geometric features or patterns are discovered as in data mining, it is necessary to determine whether such findings are genuine features inherently meaningful or whether they are simply artifacts of the particular coordinate system that has been adopted and to be ignored. Another key effort of this project is to develop a new framework that brings together two different but related methodologies in multivariate analysis that have been recently developed independently (level sets and depth functions) and enables them to be applied together in a coordinated manner. This strengthens the understanding and the roles of these methodologies in their particular domains of application. The project also contributes to education and development of human resources in statistical science by involving graduate students and undergraduate students. Cross-stimulation among project participants, which may include visiting researchers, is achieved through regular meetings and a team approach. The participation of underrepresented groups and junior faculty and professionals is encouraged and fostered by the PI. The results and findings of the project are disseminated by the PI through high-profile conference presentations, journal publications, website postings, and introduction into the graduate curriculum.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1106691
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$280,506
Indirect Cost
Name
University of Texas at Dallas
Department
Type
DUNS #
City
Richardson
State
TX
Country
United States
Zip Code
75080