This project develops a novel framework for jointly understanding the 3D spatial and semantic structure of complex scenes from images. The state-of-the-art computer vision methods deal with these two tasks separately. Methods for object recognition typically describe the scene as a list of class labels, but are unable to account for the 3D spatial structure. Methods for scene 3D modeling produce accurate metric reconstructions but are unable to infer the semantic content of its components. This project seeks to fill this gap and creates the foundations for a new framework for coherently describing objects, object components and their 3D spatial arrangement in the scene's physical space. The research of this project makes two main contributions. First, novel models for representing the intrinsic multi-view nature of object categories and for measuring critical object geometrical attributes are explored. Second, a new coherent probabilistic formulation that is capable to use these measurements for simultaneously estimating the most likely 3D configuration of scene elements and the critical semantic phenomena of the scene are investigated. This research has potential to play a transformative role in many strategic areas such as autonomous navigation, robotics, and 3D automatic modeling of urban environments. Moreover, it is crucial in designing technology for assisting people with reduced functional capabilities. The project integrates research and education by involving undergraduates or high school students in projects whose primary application goal is to develop technology for people with disabilities.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1419433
Program Officer
Jie Yang
Project Start
Project End
Budget Start
2013-10-01
Budget End
2015-12-31
Support Year
Fiscal Year
2014
Total Cost
$259,472
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305