Recent advances in high-throughput technologies have unleashed a torrent of data with a large number of dimensions. Examples include gene expression pattern images, microarray gene expression data, protein/gene sequences, and neuroimages. Dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information, is crucial for the analysis of these data. The goal of this project is to develop efficient and effective dimensionality reduction algorithms for multi-label classification. Multi-label dimensionality reduction poses a number of exciting research questions that will be studied in this project: How to fully exploit the class label correlation for effective dimensionality reduction? How to scale dimensionality reduction algorithms to large-scale multi-label problems? How to effectively combine dimensionality reduction with classification? How to derive sparse dimensionality reduction algorithms to enhance model interpretability? How to derive multi-label dimensionality reduction algorithms for multiple data sources?

To address these questions, a hypergraph spectral learning formulation will be developed for multi-label dimensionality reduction, in which a hypergraph is used to capture the class label correlation. A joint learning formulation will be developed, in which dimensionality reduction and multi-label classification are performed simultaneously. In addition, a multi-source dimensionality reduction framework is developed for learning from multiple heterogeneous data sources.

The success of this project will largely improve the state-of-the-art in dimensionality reduction for multi-label classification, and broaden this research area by opening up and addressing many new research themes. The algorithms and tools developed in this project will directly impact biological research, as they will be used to annotate FlyExpress images; FlyExpress is the only digital library of standardized fruit fly embryonic expression patterns. The educational component of this project includes developing a new curriculum that incorporates research into the classroom and provides students from under-represented groups with opportunities to participate research. Project results, including open source software and data sets will be disseminated via project Web site (www.public.asu.edu/~jye02/Project/CAREER).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0953662
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2010-04-01
Budget End
2015-03-31
Support Year
Fiscal Year
2009
Total Cost
$317,275
Indirect Cost
Name
Arizona State University
Department
Type
DUNS #
City
Tempe
State
AZ
Country
United States
Zip Code
85281