Recent progress in bioinformatics, natural language understanding, computer vision, information retrieval and other areas has been significantly enabled by "conditional random fields" (CRFs)---machine learning models of structured outputs, such as sequences, trees and grids. However, many of the fundamental problems in these application areas involve not just fixed structures, but structures that must be inferred. This structural ambiguity arises from interacting choices at different levels of representation (e.g. from character sequences to meaning, or from pixels to scene interpretation). The project will move conditional random fields (CRFs) beyond fixed graphical structures to structures that are constructed dynamically during inference. Such a capability will be key to building next-generation systems that solve, not just an individual piece of a problem, but complex multi-step problems, as found in natural language understanding and computer vision, in a unified way.

Project Report

The need to analyze and make effective use of the vast amounts of data has made machine learning indispensable in building intelligent, adaptive systems. Major improvements in accuracy and robustness, in computer vision and computational linguistics in particular, have been propelled by advances in supervised machine learning. Research enabling intelligent systems through machine learning has shown great scientific, technological, and economic importance. This grant addresses fundamental barriers in creation of richer models able to handle dramatically more complex and difficult predictions. Computational limitations of complex models are a major obstacle to creating more accurate machine learning systems in many fields; in particular, impact on computational linguistics and computer vision is a key goal. To address this, we have developed a novel framework using data-driven coarse-to-fine reasoning using structured prediction cascades. Our research on structured cascades tackles complex structured prediction problems by learning a sequence of models of increasing complexity that progressively filter a given structured output space, minimizing overall error and computational effort at prediction time according to a desired trade-off. Each level corresponds to a structured model and levels manipulate exponentially large sets of filtered outputs using max marginals. A novel convex loss function was designed for learning cascaded models that balances filtering error with filtering efficiency. Initial analysis shows that the use of max-marginals allows derivation of generalization bounds for both accuracy and efficiency of the cascade. Experimental results show that the proposed cascade formulation is very effective in several problems, including analysis of human activity in videos, by reducing the complexity of inference by up to five orders of magnitude and enabling scaling up to much richer and more accurate models. Our parallel work on structured determinantal processes addresses the difficult problem of predicting multiple structured outputs that are diverse. We have introduced determinantal point processes (DPPs) to the machine learning field and have made several important representational, inference and learning contributions. Arising in quantum physics and random matrix theory, DPPs are elegant probabilistic modelsof global, negative correlations, and off er e fficient algorithms for sampling, marginalization, conditioning, and other inference tasks. While they have been studied extensively by mathematicians, giving rise to a deep and beautiful theory, DPPs have never been used in statisitcal or machine learning applications. We have extended and developed the DPP framework to greatly enhance search and organization of documents, images, and video and we have made major progress in applications to natural language processing (document summarization and visualization) and computer vision (articulated human pose estimation in videos, modeling diversity in image search). Research supported by this grant has been widely publsihed by top tier journals and conferences in machine learning, computer vision and natural language processing. This work was presented in numerous invited talks by the principal investigator. In addition, several workshops and tutorials were organized around these topics. In the course of the grant, multiple large-scale datasets were collected, curated and shared with the community. Code used in the project is publically available as well. The grant supported several PhD students who have graduated and are working in academia and industry.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0803256
Program Officer
Todd Leen
Project Start
Project End
Budget Start
2008-08-01
Budget End
2012-07-31
Support Year
Fiscal Year
2008
Total Cost
$450,000
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104