The primary goal of this project is to achieve a qualitative improvement in the robustness of object category recognition and localization, by formulating the problem as a single overall estimation problem. In contrast, most current approaches rely on successive stages of processing, in which individual features are first detected and then those features are combined in order to detect objects. A central focus of the project is not only to determine which objects are present in an image but also to localize those objects and their subparts. Objects are modeled as a collection of local patches arranged in a deformable configuration, where certain pairs of parts are connected by spring-like connections. These models provide a way of exploiting local contextual information, delaying decisions about the presence or absence of individual features until more is known about other features and the spatial relations between them. Such models can further be adapted to the larger problem of representing scene-level context, encoding both the context immediately around an object and more long-range relationships between objects in a scene. This project is investigating both the use of local context to improve detection of features and objects, and the use of scene context to improve the detection of objects and relations between objects, within a single overall optimization-based framework.

Accurate recognition and localization of objects is of central importance for applications and systems that use computer vision to interact with the world, such as mobile robots, autonomous vehicles, interactive games, animation and film-making, tele-operation for hazardous situations, and remote surgery. In such applications, a computer vision system must not only determine whether objects are present in a scene, but also identify where the objects are and what pose or configuration they are in. For instance, detecting a pedestrian for an automotive safety system should also inform the car and driver where the pedestrian is located. In applications such as tele-operation and interactive games, further detail about a person's pose and gestures are required to enable hands-free control of complex systems. This project seeks to advance the capability of such systems by taking an approach based on simultaneously combining multiple sources of information into a single overall decision, rather than making multiple smaller decisions that are each potentially error-prone

Progress on this project will be regularly reported at http:// www.cs.cornell.edu/~dph/context/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0713185
Program Officer
Jie Yang
Project Start
Project End
Budget Start
2007-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2007
Total Cost
$448,819
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850