With the ever faster growing number of images and videos, the main bottleneck in extracting the information contained in them is their analysis (indexing) and retrieval. Nowadays image and video search engines are based on textual descriptions, since visual cues are at too low level to provide useful retrieval results when dealing with a large variety of images and videos. For example, if a human submits a query image with the request to find similar images, she focuses on a certain object or a group of objects in the query image. Thus, the meaning of similarity is given by the images that contain similar objects. Therefore, extraction of objects in images (and videos) is a key factor for true progress in content based image/video retrieval (CBIR). However, object extraction belongs to unsolved problems in Computer Vision (CV). This fact led to the development of a huge number of approaches that try to do CBIR without object extraction. However, although such approaches may be successful in some restricted application domains, in which case low level features may be sufficient to replace object extraction, they have not been successful in general purpose CBIR. The PIs believe solving the object extraction problem will lead to a breakthrough in CBIR. Therefore, the PIs propose to work on object extraction in images. There have been a large number of attempts to solve the object extraction problem in CV, and none provided a satisfactory solution. Why will our approach provide a good solution? A new methodology and a computation framework proposed by the PIs provide solid evidence that the breakthrough in object extraction is possible. On the cognitive and geometric modeling side, the PIs propose to use a higher level knowledge of shape similarity and a mid level knowledge of local and global symmetry as cognitively motivated constraints for object extraction. Constraints are essential because object extraction is known to be an ill-posed inverse problem. The human visual system solves this problem very well and we are getting close to a full understanding of how this is done. On the computational side, the PIs propose a new framework for a simultaneous estimation of medial axes and the contours. The proposed approach is inspired by the SLAM (Simultaneous Localization and Mapping) approaches in the field of robot mapping. Recent breakthrough solutions in robot mapping are based on the SLAM computation with particle filters. SLAM computation iterates over the processes of localization of the robot in the existing partial map (trajectory estimation), followed by a map update based on new observations and the estimated trajectory. The PIs treat the medial axis as trajectory of a virtual robot and the partial boundary as the map that is composed of edge segments associated with the medial axis. A first successful application of this framework is demonstrated by the PIs in the preliminary results.

Project URL: http://knight.cis.temple.edu/~shape/

Project Report

Object detection and recognition is one of the main skills that will allow computers and robots to understand images and their surrounding the way humans do. While color and appearance are important features that characterize objects, the shape of objects appears to be the dominant feature for object recognition. The project significantly contributed to the recent progress in shape based-object recognition. In order to be able to detect object contours we have developed a novel statistical framework for simultaneous medial axis estimation and contour grouping based on particle filter. We have also extended this inference framework to other tasks such as learning deformable object models. We have proposed a novel framework for contour based object detection. Compared to previous work, our contribution is three-fold. 1) A novel shape matching scheme suitable for partial matching of edge fragments. The shape descriptor has the same geometric units as shape context but our shape representation is not histogram based. 2) Grouping of partial matching hypotheses to object detection hypotheses is expressed as maximum clique inference on a weighted graph. 3) A novel local affine-transformation to utilize the holistic shape information for scoring and ranking the shape similarity hypotheses. Consequently, each detection result not only identifies the location of the target object in the image, but also provides a precise location of its contours, since we transform a complete model contour to the image. We proposed novel inference framework for finding maximum weight cliques in weighted graphs. In addition to the contour based object detection, it has been applied to other problems in computer vision including feature point matching and dense cluster detection. We have also considered the problem of shape-based similarity of objects in images with the focus on context-sensitive shape similarity. We proposed a new approach for learning such similarity measures that utilizes diffusion on the tensor product graph of the shape graph with itself.

Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$210,000
Indirect Cost
Name
Temple University
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19122