Matrix Algorithms for Data Clustering and Nonlinear Dimension Reduction

Zha, Hongyuan

Abstract

This research is to investigate statistical and computational methods for simultaneous clustering of data matrices and their extensions to handle data hypercubes using spectral techniques. It uses techniques of cluster analysis, which is method of extracting information from large data sets. Many existing methods rely on clustering the data objects using all the attributes for similarity measurement. However, often the natural groupings of the data objects do not involve all attributes. For example, in gene expression analysis, for data generated by DNA chips under various conditions (tissue types, individuals, external stimuli etc.), it is unlikely that biologically related genes will behave similarly across all conditions, nor related set of conditions will involve all the genes. The focus of this project is to develop clustering algorithms that can, for example, find subsets of genes that behave similarly across subsets of conditions. The techniques are based on spectral methods for simultaneous clustering of data matrices. The results, algorithms, and techniques developed will have applications in bioinformatics and text analysis applications.

The focus of the research is on identifying classes of cluster patterns in a data matrix that can be recovered by the spectral information of the data matrix. The research explores various objective functions that characterize desirable grouping patterns, and develops algorithms for cluster membership assignment from the spectral information. The methodology is being extended to the case of multi-way data hypercubes and to exploring connection of clustering and dimension reduction.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Application #: 0305879
Program Officer: Robert B Grafton

Project Start
Project End
Budget Start: 2003-09-15
Budget End: 2006-12-31
Support Year
Fiscal Year: 2003
Total Cost: $215,845
Indirect Cost

Matrix Algorithms for Data Clustering and Nonlinear Dimension Reduction
Zha, Hongyuan
Pennsylvania State University, University Park, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments