Machine learning concerns designing and understanding computer programs that learn from experience. Modern complex settings (for example natural language) require the use of flexible probability models that permit one to entertain large numbers of possible hypotheses (semantics) underlying the observations (sentences). In such models likely structures (parse trees) are guided by functions that assess the suitability of structures by breaking them into smaller pieces. Richer models require larger subsets making it challenging to efficiently explore large sets of possible hypotheses.

This project takes a fresh look at structured modeling by developing a new paradigm for modeling by combining randomization of parameters and combinatorial optimization. The combination provides a mechanism for inducing complex distributions over structures yet explicitly maintaining easy generation of likely structures. We pursue a comprehensive plan to understand, extend, and design these perturbation models towards the end goal of solving significant cross-cutting applied problems in natural language processing such as parsing or structured recommender tasks such as paraphrasing. Beyond modeling, the proposed work has the potential to merge tools and techniques across areas from theoretical computer science (stability, tractability), combinatorial optimization (relaxations, certificates), to probability (sampling from convex bodies). The tools developed will be broadly useful across prominent areas, from computer vision, natural language processing, to medical informatics and computational biology. The proposed work by its very nature compels strong collaborative relationships across disciplinary boundaries, from theory to applications. The PI will actively pursue these opportunities. All the software produced in this project will be open-sourced, and made available for download. The PI will also engage in outreach activities that enable high school students to participate.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1524427
Program Officer
Weng-keen Wong
Project Start
Project End
Budget Start
2015-09-01
Budget End
2018-08-31
Support Year
Fiscal Year
2015
Total Cost
$407,091
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139