This project focuses on two new universally applicable methods for clustering. A method is proposed for the clustering of explanatory variables in a response oriented fashion, ROVAC (Response Oriented Variable Clustering). The method uses a response model to generate the variable clustering, via a novel application of bagging. The clustering does not rely on a distribution assumption for the explanatory variables. The method is flexible, allowing for the use of different model selection criteria. It generalizes to a wide variety of response models. The PI is also investigating extensions of a clustering visualization and validation tool recently developed, the Relative Data Depth (ReD). Building on the concept of the depth relative to regression the PI develops methods for selecting the number of clusters in a data set and selecting the features that are related to a specific clustering.

This project is largely motivated by interdisciplinary research. The goal is to provide scientists in related fields with new and flexible clustering tools for analyzing high-dimensional data. Standard methods for clustering or grouping of features require the definition of a measure of similarity. This is often a non-trivial and highly subjective task. In this project the PI focuses on the development of two clustering techniques based on intuitively simple concepts. The first method uses the knowledge of another measured quantity, a response. The method groups features together that are similarly related to the response. The second method uses a concept of depth, a measure of how representative a feature is with respect to its group. Prototype algorithms are being implemented on real data with examples from, but not limited to, gene expression data. Preliminary results are competitive with current leading methodologies.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0306360
Program Officer
Grace Yang
Project Start
Project End
Budget Start
2003-06-01
Budget End
2007-05-31
Support Year
Fiscal Year
2003
Total Cost
$73,499
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901