Classification Based on Data Depth Ordering

Abebe, Asheber; Billor, Nedret

Abstract

The problem of classifying entities into one of several groups has been one of the main goals of many scientific investigations. This is a very important statistical problem with many applications in science and engineering. This is an inherently multivariate problem since measurements are made on several aspects (variables) of the entity as an attempt to best capture its place among others. As it is not generally possible to obtain a measure all variables pertaining to an entity we wish to classify or even get perfect measurements of the measured variables, classifications are usually performed in the presence of uncertainty. It is important that this activity is done in a manner that minimizes the misclassification error rate with efficiency and in a way that is robust to outlying cases. In this project the investigators develops a new classification technique that is based on statistical depth function, an extension of the robust rank-based nonparametric procedures. Issues of asymptotic optimality, robustness, computational algorithms and finite sample performance are dealt with step by step and methodically throughout the duration of the project. The project also includes implementations of the newly developed classifier using genetic data sets that are publicly available and comparing its performance to existing classifiers.

The problem of classifying entities into one of several groups has been one of the main goals of many scientific investigations. For instance, identifying a tumor or a case of flu as one of the many different possibilities is potentially life-saving and hence is indespensable to physicians. This is an inherently multivariate problem since measurements are made on several aspects (variables) of the entity as an attempt to best capture its place among others. As it is not generally possible to obtain a measure all variables pertaining to an entity we wish to classify or even get perfect measurements of the measured variables, classifications are usually performed in the presence of uncertainty. It is important that this activity is done in a manner that minimizes the misclassification error rate with efficiency and in a way that is robust to outlying cases. In this research the investigators develops a new classification technique that is insensitive to erroneous or extraordinary data. Such methods are very important to for researchers in areas such as genetics and oncology where these qualities are of the utmost importance, as well as cancer classification using gene expression data where a precise and accurate classification of tumors is essential for successful treatment of cancer.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Type: Standard Grant (Standard)
Application #: 0604726
Program Officer: Gabor J. Szekely

Project Start
Project End
Budget Start: 2006-07-01
Budget End: 2009-06-30
Support Year
Fiscal Year: 2006
Total Cost: $119,977
Indirect Cost

Classification Based on Data Depth Ordering
Abebe, Asheber Billor, Nedret
Auburn University, Auburn, AL, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments