Intelligent systems, both artificial and biological, must find effective ways to organize a complex visual world. The cross-disciplinary field of scene understanding is in need of a comprehensive framework in which to integrate cognitive, computational and neural approaches to the organization of knowledge.

This research program aims to create a framework for organizing knowledge of visual environments that human and artificial systems encounter when navigating in the world or browsing visual databases. The aim is to determine which taxonomies are best suited for solving different visual tasks, and use computer vision algorithms to organize visual environments as humans do. For example, semantic relationships between scenes are well captured by a hierarchical tree (e.g. a basilica is a type of church, which is a type of building) but functional similarities between different environments may be best represented as clusters (e.g. restaurants, kitchens and picnic areas clustered as places to eat; offices and internet cafés as places to work).

Because hierarchies and taxonomies provide a way of formalizing many types of contextual information (spatial, temporal, and semantic), they can be used to enhance the performance of computer vision systems at object and scene recognition, and aid in the development of smarter image search algorithms.

Besides serving as a unified benchmark for comparing different models and theories, this enterprise offers new teaching and applied tools for research and courses, which will be made available through websites and symposia.

Project Report

The research landscape in computer vision is poised for massive change in the next few years. With the success of new computational architectures for visual processing, and access to unparalleled image databases, the state of the art in computer vision is advancing rapidly. Publicly available databases have become an important resource in many fields because they provide a standard for comparing different models and theories. Within the tenure of this award, we provided two benchmarks for the field of scene understanding (the SUN dataset and the Places dataset, with more than seven millions labeled images), with additional attributes (categorical hierarchy, function, possible actions, types of objects in the scene, etc) as well as computational models of automatic scene recognition. We also released a web version of an automatic scene recognition system. Recent advancements in the arenas of image capture and storage have enabled researchers, and the public in general, to acquire an unprecedented amount of high quality images and video. Our datasets and models offer a meaningful platform for developing real-world systems of visual scene understanding. The benefit of a smart system of visual understanding could have an unprecedented impact on internet-based and wireless technologies, by enabling everyone to accomplish tasks that used to take days or minutes in a fraction of the time.

Project Start
Project End
Budget Start
2010-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$449,184
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139