This project develops new techniques for visually interpreting an image in a way that specifically leverages large image collections, now common on the web and elsewhere. The research team uses an approach whose performance directly scales with the size of the dataset, unlike many existing approaches to image understanding. The basic approach is to build a copy of a query image by assembling pieces of image from a large set of training images, in the manner of a jigsaw. Each region in the query is classified by copying labels from the matched regions. The larger the training set, the more jigsaw pieces there are to choose from, thus the more accurate the match.

The initial work of the project focuses on developing efficient methods for performing the matching that allow the incorporating of various desirable constraints. The approach is then extended to handle training data with incomplete labels -- important since few datasets have labels for every region. The research plan also includes building better embeddings for the regions which place semantically similar regions closer together than current representations do, and developing efficient binary matching schemes along with further work on the region embeddings.

Robust techniques for visual recognition have widespread applicability, in such areas as image search, robotics and surveillance. The project also involves extensive outreach activities, including high-school internships and the organization of a NY-area vision day for students and researchers

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1149633
Program Officer
Jie Yang
Project Start
Project End
Budget Start
2012-03-01
Budget End
2019-02-28
Support Year
Fiscal Year
2011
Total Cost
$499,999
Indirect Cost
Name
New York University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10012