Today's real-life experiments generate massive multivariate datasets. Statistical analysis tools that efficiently and accurately capture the multivariate features of such experiments are needed. Classical statistical analysis requires a preliminary assumption as to the underlying probability distribution of the data. This preliminary assumption affects the analysis. Increasingly, statisticians are advocating the geometric notion of data depth (DD) for multivariate data analysis as it requires no prior assumptions on the probability distribution of data and handles outliers. Data-depth-based analysis methods exist, but most of them are not yet sufficiently efficient to handle large datasets. Computational Geometry (CG) focuses on the complexity analysis of geometric problems and the design of effective algorithmic solutions. This project applies CG techniques to develop more efficient tools for DD analysis. Aviation safety analysis, bioinformatics, clinical data mining and statistical process control are potential applications.

This research addresses underlying CG issues in the development of efficient practical algorithms for DD and undertakes these major tasks: resolve computational problems related to half-space-depth and simplicial-depth contours; evaluate the applicability of depth contours to quantification of multivariate features of dataset and to their visualization; expand two-dimensional algorithms and assess approximation algorithms for high dimensions; explore new approaches for outlier detection and assess their validity. Intellectual merit derives from the dependence of practical implementations of DD on solutions to complex geometric questions, the demand from applied scientists for new methods of statistical analysis and visualization, and the prior recorded success of comparable investigations. The practical algorithms for scientists to apply DD to larger real-world datasets and extract fresh insights, the fresh visualization tools for enhanced understanding, the refinements in the statisticians' DD formulations based on new computational insights, and the training of diverse pre-college, college, and graduate students provide the broader impact.

Project Start
Project End
Budget Start
2004-08-15
Budget End
2008-07-31
Support Year
Fiscal Year
2004
Total Cost
$224,723
Indirect Cost
Name
Tufts University
Department
Type
DUNS #
City
Medford
State
MA
Country
United States
Zip Code
02155