Randomized methods have recently proven highly useful in efficiently analyzing big data sets, and this project covers mathematically rigorous techniques for developing such algorithms to analyze and store such data efficiently. In particular this project focuses on furthering applications of recent randomized methods for large-scale computational linear algebra. Applications of this research include: randomized linear algebra, manifold learning, and model-based compressed sensing. Many of the developed technologies on problems in these areas are unified by the common tool of randomized "oblivious subspace embeddings."

This research attacks the big data problem in randomized linear algebra, manifold learning, and model-based compressed sensing. In randomized linear algebra one imagines that the input is an extremely large matrix A, and the goal is to efficiently process this input, e.g., in the form of regression, principal component analysis, (approximate) matrix multiplication, eigenvalue estimation, k-means clustering, etc. First proposed by Sarlos was the idea of using "oblivious subspace embeddings" to speed up computation, i.e., picking a random matrix S (from an appropriate distribution) such that solving the problem on SA instead of A still yields an almost optimal solution to the original problem (where S is chosen so that SA has many fewer rows than A, thus compressing the massive data). This project develops novel methods to obtain more efficient such S, as well as to find new applications to kernelized and regularized regression problems.In manifold learning one imagines that the input data lies on a low-dimensional manifold in a high-dimensional space. For example, pixelated handwritten images can be viewed as high-dimensional vectors (indexed by pixels), whereas empirically it has been observed that such images tend to lie near a much lower dimensional manifold. By learning these parameters ("manifold learning"), one can do more efficient classifier training as well as achieve data compression. This project explores more efficient ways to use randomized methods to do manifold learning, e.g., by using efficient subspace embeddings. In model-based compressed sensing one wishes to acquire sparse signals with structured sparsity patterns efficiently using few linear measurements, for later (approximate) recovery. Organizing these measurements as the rows of a measurement matrix S, it is known that such S are closely connected to subspace embeddings. This project aims to explore this connection to obtain more efficient model-based compressed sensing and recovery algorithms.

Project Start
Project End
Budget Start
2014-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2014
Total Cost
$287,800
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138