This project is about automated, visual object recognition. It is aimed at a computational approach which has two parts. First, it learns whether a given set of previously unseen images, say supplied by a user, contains any dominant themes, namely, subimages, that occur frequently and look similar. Such themes, and the associated subimages, are called categories and objects, respectively. Second, given a set of categories automatically inferred during the aforementioned training, and a new, test image, the approach recognizes all occurrences in the image of objects belonging to any of the learned categories. It delineates each such object in the image, and labels it with its category name. Both learning and subsequent recognition do not need human supervision. The subimages defining a category can be small or large, simple or complex. It is reasonable to expect that low-complexity categories, e.g., containing small/few/simple subimages are more common in real-world images. For example, the simple category of elongated shapes occurs as a part of legged animals, stools and scissors. More complex categories consist of large/many/complicated regions and are less common. Simple categories, e.g., the ``leg'' are thus shared by more complex ones, e.g., all legged animals, and, in turn, ``leg'' is an articulated combination of the category of elongated shapes (limbs). Therefore, category representation can be made easier by expressing it as a configuration of simpler categories, instead of subimages directly, thus yielding a hierarchical, subpart model. Accordingly, the proposed approach learns and recognizes categories as image hierarchies. The use of hierarchical embedding of regions as the defining image features results in several advantages the proposed approach offers over existing other methods which mostly use local features: (1) The proposed approach requires no supervision, e.g., labeling or segmenting of training images, or other input parameters from the user. (2) It simultaneously provides category detection and high-accuracy segmentation. (3) Training is feasible with very few examples, and not all training images must contain objects from the categories. (4) The use of hierarchical models makes explicit the relationship of a specific category to other categories of similar, lower and higher complexities; it also serves as a semantic explanation of why a category is detected when detected. Expected major contributions of the work include computational formulations of: (1) Accurate extraction of image regions; (2) Image representation by connected segmentation tree; (3) Robust image matching amidst structural noise in images; (4) Unsupervised extraction of hierarchical category models; (5) Efficient recognition of a large number of categories; (6) Unsupervised estimation of the relevance weights of subcategory detections to category recognition, and (7) Generalization of the proposed approach to extraction of texture elements, as an example of how the proposed work may impact other challenging vision problems involving hierarchy.
The progress made on this project can be seen at the website: http://vision.ai.uiuc.edu/ahuja.html
PROJECT OUTCOMES 1. Work Done Major activities under this research are aimed at automatic discovery, learning and recognition of objects. The work done is in four major parts. The first three parts are focused on unsupervised learning based solutions to three central problems, and the fourth part uses quality of low-level image segmentation to develop a novel approach to compressive sampling of images. Local Isotropy Based Clustering: An unsupervised, nonparametric data clustering algorithm has been developed for point set clustering in feature space (e.g., defined by the segment properties extracted from image segmentation). A cluster is defined as set of contiguous interior points surrounded by border points. These points are identified by testing their uniformity in spherical neighborhoods in high dimensions. An approach to integrating border points and interior points into clusters has been proposed. The approach presented outperforms popular other methods on commonly used benchmark datasets. Region Based Texture Analysis: Clustering algorithms are useful when the feature space contains clusters, and these clusters are appropriately related to the underlying meaningful structure of the data, and thereby, to image classes of interest. One approach to learning a useful feature space is, for a given set of features, by finding the best relative weights of the features to enhance the underlying structure. An approach to such weight learning has been developed and applied it to the problem of recognizing dynamic textures, such as videos of smoke, rain, etc. Given an arbitrary image, an approach has also been also developed to identify the building blocks of a given image texture, or texels, and segment a multi-texture image into subimages corresponding to the distinct textures contained within the image. The texture is represented by the spatial repetition of its elementary, defining patterns, called texels, and the textured parts are distinguished from other subimages containing highly random or relatively smooth variations of pixel values. From Regions to Higher Level Contents: Most of the methods for visual learning developed by the PI’s group utilize regions as image primitives to learn representations. They have therefore evaluated the semantic information content of multiscale, low-level image segmentation in past work. As a method of doing this, they use selected features of segmentation for semantic classi?cation of real images. To estimate the relative measure of the information content of our features, they compare the results of classi?cations obtained using them with those obtained by others using the commonly used patch/grid based features. The algorithms they have developed outperform previously reported results on a publicly available scene classi?cation dataset. These results suggest further experimentation in evaluating the promise of low level segmentation for image classi?cation. Their work also includes analysis of multiple image frames in a long video sequence or only an image pair. For example, a video sequence is analyzed to identify periodic motions, estimate their parameters and segment the moving objects. The multiscale low level segmentation is also used to match regions across an image pair. Region Based Compressive Sampling: This work does not directly address the recognition problem but shows how the same segmentation structure as we use for recognition can also serve a useful purpose for concise image representation in other contexts, in this case compressive sampling. Compressive sampling is aimed at acquiring a signal or image from data which is deemed insufficient by the standard approach based on Nyquist/Shannon sampling theorem. The main idea is to recover a signal from limited measurements by exploring the prior knowledge that the signal is sparse or compressible in some domain, in our case the domain being expressed in terms of the segmentation. 2. Intellectual Merit Overall, this work has contributed to the field of Automated Object Recognition, by using a novel representation of objects that explicitly captures the spatial and topological structure of the object parts, and extended the use of the representation to the field of Compressive Sampling. Clustering is enhanced significantly by representing spatial structure defined by the point locations, instead of using just the usual interpoint distances. Experiments conducted on several real-world image sets demonstrate that accounting for the structural properties of regions is critical for processing both static and dynamic textures, relating multiple views of the same object in multiple images, relating segmentation to scene classes, and compressive sampling, and leads to competitive performance vs. the state of the art on benchmark datasets. Three datasets have been created for the use of the research community, for testing: 1. Object discovery and recognition algorithms. 2. Low level segmentation algorithms. 3. Dynamic texture recognition. 3. Broader Impact Contributed to Statistical and Structural Pattern Recognition, related aspects of Machine Learning, and to Image Acquisition. 4. Human Resources Five Ph.D. students, each partially (in different semesters), have been supported by the grant.