The 40 year history of object recognition in computer vision has been dominated by bottom-up approaches where local features are first detected in an image and then those features are matched to geometric models of objects. The proposed project investigates methods that formulate the object recognition problem as a single overall optimization rather than as successive stages of feature detection and matching. The approach combines bottom-up information about the appearance of local image patches with top-down information about geometric relations between those patches. The main focus is on recognizing generic classes of objects such as bicycles, people, motorbikes, or cars. Each object class is modeled as a collection of parts arranged in a deformable configuration, where certain pairs of parts are connected by springs. Recognition is formulated in terms of energy minimization, where there is a cost for placing each patch at each possible location in the image, and a cost for placing pairs of patches in a manner that stretches the springs connecting them.

Such an energy minimization formulation was proposed in the 1970's under the name Pictorial Structures, but was abandoned due to its computational complexity. Recent algorithmic advances have made it possible to further investigate this kind of approach. Initial results on detecting and localizing objects have been promising, but also demonstrate how much remains to be done for this approach to form a viable alternative to feature-based object recognition. The proposed project investigates some of the key initial questions in determining whether the energy minimization approach to object recognition could be a viable alternative to current feature-based approaches, including how to learn such models with minimal supervision, and how to incorporate global geometric information such as object scale and orientation into the models.

The proposed approach computes cost maps that determine how well each part matches at each possible location in the image. These cost maps are then combined together in the energy minimization process. In contrast, traditional feature detection approaches find a small number of locations where each feature or part might be present in the image. While the sparse nature of feature locations may seem to require less computation than working with entire cost maps, the necessity of handling spurious and missed feature detections in fact makes such feature-based methods quite computationally intensive.

Project URL www.cs.cornell.edu/~dph/simulrec/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0629447
Program Officer
Daniel F. DeMenthon
Project Start
Project End
Budget Start
2006-05-01
Budget End
2007-10-31
Support Year
Fiscal Year
2006
Total Cost
$100,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850