This proposal addresses the problem of recognizing three-dimensional (3D) objects in photographs and image sequences. It revisits viewpoint invariants as a local representation of shape and appearance. The key insight is that, although smooth surfaces are almost never planar in the large, and thus do not (in general) admit global invariants, they are always planar in the small---that is, sufficiently small surface patches can always be thought of as being comprised of coplanar points---and thus can be represented locally by planar invariants. This is the basis for a new, unified approach to object recognition where object models consist of a collection of small (planar) patches, their invariants, and a description of their 3D spatial relationship. Specifically, the local invariants used in this proposal are the affine-invariant descriptions of the image brightness pattern in the neighborhood of interest points recently developed by Lindeberg and Garding and by Mikolajczyk and Schmid. These affine-invariant patches provide a normalized representation of the local object appearance, invariant under viewpoint and illumination changes, that can be used as a local measure of image, part, or object similarity. The spatial relationship between local invariants is used to represent the global object structure and drive the recognition process. The proposed approach is applied to four fundamental instances of the 3D object recognition problem: (1) modeling rigid 3D objects from a small set of unregistered pictures and recognizing them in cluttered photographs taken from unconstrained viewpoints; (2) representing and recognizing non-uniform texture patterns under non-rigid transformations; (3) modeling and recognizing articulated objects in image sequences, with applications to the identification of shots that depict the same scene (shot matching) in video clips; and (4) learning and recognizing part-based descriptions of object classes in photographs and video clips.
Intellectual Merit: The main scientific contributions of the proposed project will be (a) a unified framework for 3D object recognition that combines the advantages of geometric and appearance-based approaches to recognition; (b) fundamental advances in object recognition technology in the four target domains, including the wide open problem of category-level recognition; (c) effective algorithms for a number of practical applications, including shot matching in video analysis. Large, representative datasets will be gathered for each of the problems addressed in the project. They will be used to systematically evaluate the algorithms developed in its course, and be made available to the computer vision community at large on the World Wide Web.
Broader Impacts: With the ever expanding array of imagery sources, some form of automatic object recognition technology must eventually be an integral part of every information system. However, today's recognition systems are still largely unable to handle the extraordinarily wide range of appearances assumed by common objects in typical images, and fundamental advances are needed before 3D object recognition fulfills its potential as a critical enabling technology in domains such as surveillance and security, image retrieval and data mining, and video analysis and annotation. The research conducted in this project will be a stepping stone in that direction.