Object categorization and segmentation are arguably among the most important and challenging problems in computer vision. While these two problems are clearly related, most of the existing literature treats these tasks separately. This project bridges this gap by developing algorithms for joint categorization and segmentation of objects, which simultaneously use category-level information (top-down) and pixel-level information (bottom-up). This project develops a novel graph-theoretic paradigm that combines principles from conditional random fields and sparse representation theory. In this framework, each semantic region is represented in terms of an over-complete dictionary of objects, object parts, subparts and superpixels. To simultaneously estimate both the segmentation and a sparse representation for each region, the research team defines an energy function for the random field, which includes new higher order potentials obtained as the output of a classifier applied to the sparse representation of a segmented region. The research team also explores methods based on structured-sparse dictionary learning and latent support vector machines for learning the dictionaries and the classifier parameters. Furthermore, the research team investigates efficient discrete optimization techniques for minimizing the new energies resulting from the combination of structured-sparse models with different classifiers.
Applications of this research include image search, autonomous navigation (localization and identification of the road, street signs, pedestrians and vehicles), medical diagnostic tools (detection, localization and classification of lesions and tumors in medical images), surveillance (localization of suspicious people, weapons and vehicles) and robotics (identifying the boundaries and extent of objects to be interacted with). The project involves students of different levels.