Massive data sets typically have structure that allows them to be compressed with little loss in information. Examples include image, video, and audio data, as reflected by the ubiquitous .jpg, .mpeg, and .mp3 file formats, respectively. A major goal of modern data science is to exploit this compressibility to enable data processing algorithms to run more efficiently, thereby reducing the amount of data that is discarded or left unanalyzed. In other words, data scientists aim to adapt their algorithms to operate directly on the compressed data without loss in performance. Projections are important computational tool for effecting compression, but thus far they have only been incorporated into a handful of data processing algorithms. This project aims to substantially increase the range of data processing algorithms that can benefit from projections, and is specifically motivated by applications to medical image processing, computational biology, and analysis of surveillance video. The project will result in algorithms that are broadly applicable across the whole of data science. Furthermore, this research will support the cross-disciplinary development of a diverse cohort of PhD and undergraduate students at the University of Michigan, and the development of a graduate-level course on large-scale optimization for machine learning at the University of Michigan.

The technical aims of the project are divided into two thrusts. The first thrust develops a general approach for incorporating random projections into iterative optimization solvers for broad classes of optimization problems arising in machine learning. Our work focuses on nonconvex problems such as matrix factorization, manifold optimization, and training neural networks, building on recent work by the project team and others on convex problems. In particular, we will extend the principle of iterative sketching, which has been developed for convex optimization, to nonconvex problems. This principle finds ways to insert sketches (random projections) into iterative algorithms so as to reduce memory or computational complexity, while leveraging structures in the data to avoid losses in performance. The second thrust develops new adaptive projections for improved statistical and computational performance in big data, leveraging the algorithms developed under the first thrust. In particular, this thrust will develop new methods for supervised and streaming principle components analysis, and for learning sketches for optimization problems that must be solved repeatedly, as in medical image formation.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1838179
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2019-01-01
Budget End
2021-12-31
Support Year
Fiscal Year
2018
Total Cost
$1,032,000
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109