Many areas of unsupervised learning (i.e, learning with data that has not been labeled by humans) currently rely on heuristic algorithms that lack provable guarantees on solution quality or running time. In fact the underlying problems - as currently formulated - are often known to be computationally intractable (NP-hard, to use a technical term). This proposal identifies a big set of these problems that can be seen as twists - involving constraints such as nonnegativity and sparsity - on classical linear algebra problems like solving linear systems, rank computation, and eigenvalues/eigenvectors. The PI has proposed calling this set of problems collectively as Linear Algebra++. The project will seek to develop algorithms with provable guarantees for these problems. The methodology will be to make suitable assumptions about the structure of realistic inputs. The algorithms will be applied to problems in unsupervised learning, in areas such as topic modeling, natural language processing, semantic embeddings, sparse principle components analysis (PCA), deep nets, etc.

The work will lead to new and more efficient algorithms for big data processing that will come with provable guarantees of quality. It will bring new rigorous approaches to machine learning. (Some recent work of the PI shows that the new rigorous approaches can be quite practical.) It will advance the state of art in theoretical computer science by expanding its range and its standard toolkit of algorithms. It will contribute fundamentally new primitives to classical linear algebra and applied mathematics.

The project will train a new breed of graduate students who will be fluent both in theoretical algorithms and machine learning. The PI has a track record in this kind of work and training during the past few years and will continue this including working with undergrads and Research Experiences for Undergraduates (REU) students. Any new algorithms discovered as part of this project will be released as open source code. The PI also plans a series of other outreach activities in the next few years including (a) A workshop. (b) A special semester or year at the Simons Institute in 2016-17 on provable bounds in machine learning which he will coorganize. (c) A new book on graduate algorithms based upon his new grad course, which tries to re-orient algorithms training for today's computer science problems. (d) A series of talks aimed at broad audiences, of which he gives several each year.

The techniques will build upon recent progress by the PI and others on problems such as nonnegative matrix factorization, sparse coding, alternating minimization etc. They involve average case analysis, classical linear algebra, convex optimization, numerical analysis, etc., as well involve completely new ideas. They could have a transformative effect on machine learning and algorithms.

Project Start
Project End
Budget Start
2015-06-15
Budget End
2019-05-31
Support Year
Fiscal Year
2015
Total Cost
$466,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08544