The problem of classifying entities into one of several groups has been one of the main goals of many scientific investigations. This is a very important statistical problem with many applications in science and engineering. This is an inherently multivariate problem since measurements are made on several aspects (variables) of the entity as an attempt to best capture its place among others. As it is not generally possible to obtain a measure all variables pertaining to an entity we wish to classify or even get perfect measurements of the measured variables, classifications are usually performed in the presence of uncertainty. It is important that this activity is done in a manner that minimizes the misclassification error rate with efficiency and in a way that is robust to outlying cases. In this project the investigators develops a new classification technique that is based on statistical depth function, an extension of the robust rank-based nonparametric procedures. Issues of asymptotic optimality, robustness, computational algorithms and finite sample performance are dealt with step by step and methodically throughout the duration of the project. The project also includes implementations of the newly developed classifier using genetic data sets that are publicly available and comparing its performance to existing classifiers.

The problem of classifying entities into one of several groups has been one of the main goals of many scientific investigations. For instance, identifying a tumor or a case of flu as one of the many different possibilities is potentially life-saving and hence is indespensable to physicians. This is an inherently multivariate problem since measurements are made on several aspects (variables) of the entity as an attempt to best capture its place among others. As it is not generally possible to obtain a measure all variables pertaining to an entity we wish to classify or even get perfect measurements of the measured variables, classifications are usually performed in the presence of uncertainty. It is important that this activity is done in a manner that minimizes the misclassification error rate with efficiency and in a way that is robust to outlying cases. In this research the investigators develops a new classification technique that is insensitive to erroneous or extraordinary data. Such methods are very important to for researchers in areas such as genetics and oncology where these qualities are of the utmost importance, as well as cancer classification using gene expression data where a precise and accurate classification of tumors is essential for successful treatment of cancer.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0604726
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2006-07-01
Budget End
2009-06-30
Support Year
Fiscal Year
2006
Total Cost
$119,977
Indirect Cost
Name
Auburn University
Department
Type
DUNS #
City
Auburn
State
AL
Country
United States
Zip Code
36849