This research is to investigate statistical and computational methods for simultaneous clustering of data matrices and their extensions to handle data hypercubes using spectral techniques. It uses techniques of cluster analysis, which is method of extracting information from large data sets. Many existing methods rely on clustering the data objects using all the attributes for similarity measurement. However, often the natural groupings of the data objects do not involve all attributes. For example, in gene expression analysis, for data generated by DNA chips under various conditions (tissue types, individuals, external stimuli etc.), it is unlikely that biologically related genes will behave similarly across all conditions, nor related set of conditions will involve all the genes. The focus of this project is to develop clustering algorithms that can, for example, find subsets of genes that behave similarly across subsets of conditions. The techniques are based on spectral methods for simultaneous clustering of data matrices. The results, algorithms, and techniques developed will have applications in bioinformatics and text analysis applications.
The focus of the research is on identifying classes of cluster patterns in a data matrix that can be recovered by the spectral information of the data matrix. The research explores various objective functions that characterize desirable grouping patterns, and develops algorithms for cluster membership assignment from the spectral information. The methodology is being extended to the case of multi-way data hypercubes and to exploring connection of clustering and dimension reduction.