What does it mean to "understand" an image? One popular answer is simply naming the objects seen in the image. During the last decade most computer vision researchers have focused on this "object naming" problem. While there has been great progress in detecting things like "cars" and "people", such a level of understanding still cannot answer even basic questions about an image such as "What is the geometric structure of the scene?" "Where in the image can I walk"?

The goal of this project is to develop a geometric and functional representation of our visual world for scene understanding. This project aims to develop this functional representation by learning relationships between the physical/visual representation of the scene and the space of the interactions an agent can perform in that scene. The key advantage of functional scene representation is that it is subjective, explicitly task-based and takes into account the agent?s capabilities.

This project is anticipated to result in major advances within the image understanding community, bringing it closer to researchers in robotics. It is anticipated to result in improvements in: (a) 3D Scene Understanding; (b) Recognition; (c) Human Activity Understanding, and hence could be a critical enabling technology for applications such as autonomous systems, surveillance, and personal robotics. This project is also expected to contribute to education through course development, student projects, workshops, and tutorials involving a broader audience as well as using popular online media (e.g., YouTube) and interactive web demos to involve young children.

Project Start
Project End
Budget Start
2013-08-15
Budget End
2017-07-31
Support Year
Fiscal Year
2013
Total Cost
$450,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213