MSPA-MCS: Sparse Multivariate Data Analysis

El Ghaoui, Laurent; Jordan, Michael

Abstract

This proposal develops and studies sparse variants of classic multivariate data analysis methods. It primarily focuses on sparse principal component analysis (PCA) and the related sparse canonical correlation analysis (CCA), but also intends to explore sparse variants of methods such as correspondence analysis and discriminant analysis. The motivation for developing sparse multivariate analysis algorithms is their potential for yielding statistical results that are more interpretable and more robust than classical analyses, while giving up as little as possible in the way of statistical efficiency and expressive power. The investigators have derived a convex relaxation for sparse PCA as a large-scale semidefinite program. The proposed research first studies the theoretical and practical performance of this relaxation as well as the computational complexity involved in solving large-scale instances of the corresponding semidefinite programs. In a next step, it focuses on extending these results to the other multivariate data analysis methods cited above.

Principal Component Analysis (or PCA) is a classic statistical tool used to study experimental data with a very large number of variables (meteorological records, gene expression coefficients, the interest rate curve, social networks, etc). It is primarily used as a dimensionality reduction tool: PCA produces a reduced set of synthetic variables that captures a maximum amount of information on the data. This makes it possible to represent data sets with thousands of variables on a three dimensional graph while still capturing most of the features of the original data, thus making visualization and interpretation easier. Unfortunately, the key shortcoming of PCA is that these new synthetic variables are a weighted sum of all the original variables making their physical interpretation difficult. The proposed research will study algorithms for computing sparse PCA, i.e., computing new synthetic variables that are the weighted sum of only a few problem variables while keeping most of the features of the original data set. Sparse PCA is a hard combinatorial problem but the investigators have produced a relaxation that can be solved efficiently using recent results in convex optimization. The investigators plan to study the theoretical and practical performance of this relaxation and extend these results to other statistical methods.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Type: Standard Grant (Standard)
Application #: 0625371
Program Officer: Gabor J. Szekely

Project Start
Project End
Budget Start: 2006-09-15
Budget End: 2009-08-31
Support Year
Fiscal Year: 2006
Total Cost: $230,000
Indirect Cost

MSPA-MCS: Sparse Multivariate Data Analysis
El Ghaoui, Laurent Jordan, Michael
University of California Berkeley, Berkeley, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments