Machine learning is transforming the way many fields make sense of data, from engineering and science to medicine and business. Machine learning has vastly improved speech recognition, machine translation, robotic navigation and many other prediction tasks. A crucial goal of machine learning is automating intelligent processing of information: this project will focus on automatically describing videos by detecting objects, people, actions and interactions between them, and parsing documents by extracting entities, events and relationships between them. All these prediction tasks require more than just true-false or multiple-choice answers, but have an exponential number of possible answers to consider. Breaking these joint predictions up into independent decisions (for example, translating each word on its own, recognizing a phoneme at a time, detecting each object separately) ignores critical correlations and leads to poor accuracy.
Structured models, such as grammars and graphical models, can capture strong dependencies but at considerable computational costs. The barrier to improving accuracy in such structured prediction problems is the prohibitive cost of inference. Structured prediction problems present a fundamental trade-off between approximation error and inference error due to computational constraints as we consider models of increasing complexity. This trade-off is poorly understood but is constantly encountered in machine learning applications.
The primary outcome of this project will be a framework for addressing very large scale structured prediction using a novel coarse-to-fine architecture. This architecture will enable explicit, data-driven control of the approximation/computation trade-off. It promises to drastically advance state-of-the-art accuracy in computer vision and natural language applications and greatly enhance search and organization of documents, images, and video. The PI's plan includes an active role in the machine learning community, disseminating results through tutorials, code and data and organizing workshops.