This project aims to develop a systematic data mining procedure for exploring some large non-standard data sets by automatic means, with the purpose of discovering meaningful patterns and useful features. The procedure includes four particular research areas: text analysis, risk analysis, data depth and multivariate nonparametric analysis. The PI proposes to introduce and investigate several new data extracting and tracking methodologies. She plans to use two aviation safety report repositories ("Program Tracking Report Subsystem" from the FAA and "Aviation Accident Statistics" from NTSB) to illustrate problem statements as well as applications of the proposed research to aviation risk management. The data mining procedures and methods for constructing and tracking performance measures or risk indicators developed in this project can be a critical component of any effective decision-support systems. Also, included in this project are: a research plan for establishing a general theory of multivariate spacings based on data depth, and some new nonparametric statistical inference methods using the concept of depth-ranking.

The recent advances in computing and data acquisition technologies have made the collection of massive amounts of data a routine practice in many fields. Besides the voluminous size, the types of data are also often less traditional. They may be textual, image, or unstructured high dimensional data. Scientists face increasingly the task of analyzing such massive non-standard data sets. Moreover, with the low cost of implementing automated data collection systems, many data collection systems are often designed to accumulate maximum amounts of data without clearly defined missions. Consequently, the data analysis required of statisticians often includes the new challenge of mining a sea of unstructured data. The goal of this project is to develop a comprehensive statistical mining scheme that should have a broad applicability to many fields. The investigator plans to use some aviation safety report repositories from the NTSB (National Transportation Safety Board) to illustrate problem statements as well as applications of the proposed research to aviation risk management. The data mining procedures and methods for constructing and tracking performance measures or risk indicators developed in this project can be a critical component of any effective decision-support systems.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0306008
Program Officer
Grace Yang
Project Start
Project End
Budget Start
2003-07-01
Budget End
2007-06-30
Support Year
Fiscal Year
2003
Total Cost
$220,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901