This project develops improved computer vision methods for automatic recognition of arbitrary objects in images from realistic environments. Object recognition is typically performed by fitting a function that maps an image to likely object locations and labels. Such a function is fitted (trained) on a database of example images along with their human-assigned object locations and labels. This research can result in more accurate visual perception for socially relevant applications, such as robots performing household tasks, assisting the elderly, responding to disasters and quickly learning new manufacturing and service skills. It can also provide a common codebase for the wider community, new dataset challenges for domain adaptation problems, the dissemination of scientific and technical results and associated courseware, and specific outreach to ensure broad participation of underrepresented groups.

The specific research agenda is structured around two aims. The first aim is to establish bounds on the coverage of latent physical factors in datasets needed for human-level performance on arbitrary domains. The study involves both existing datasets and new datasets generated using graphics rendering techniques at various degrees of photorealism. The goal is to develop a theory of the physical complexity of a given dataset and how it affects generalization to real world object recognition tasks, with respect to a given image representation and learning framework. Physical parameters include but are not limited to: 3D shape, surface color, texture, background/scene, camera viewpoint, sensor noise, lighting, specularities and cast shadows. The second research aim is to learn image representations invariant to some of the physical causes of data bias. The goal is to develop model and representation learning methods that are able to learn from a combination of real and non-photorealistic synthetic data, and are resistant to common sources of data bias. The representations include simple edge-based descriptors, and more generally hierarchical representations based on layers of convolution and pooling operations.

Project Start
Project End
Budget Start
2016-09-01
Budget End
2018-05-31
Support Year
Fiscal Year
2017
Total Cost
$55,074
Indirect Cost
Name
Boston University
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215