This project focuses on the task of providing a consistent, semantic interpretation of all components of an image of an outdoor scene. The image is automatically segmented into large regions, each of which is a coherent scene component that is labeled with a rough geometric configuration (distance from the camera and surface normal), and with one of a subset of semantic classes, which include both background classes (such as water, grass, or road) and specific object classes (such as person, car, cow, or boat). The approach is based on the development of a holistic probabilistic model (a Markov random field) whose parameters are automatically learned from data. The model exploits both scene features and contextual relationships between scene components (e.g., cows are typically found on grass and boats on water). It also utilizes object shape and appearance models to identify specific object instances in the image. To address the complexities of reasoning using these richly structured models, new probabilistic inference algorithms are developed.
This project helps train graduate and undergraduate students within the PI's group, as well as students in an annual project class in this area. The project also develops significant infrastructure, including an extensive data set of labeled images and efficient inference algorithms, which are freely distributed to the research community.
The ability to provide a coherent interpretation of a scene composition is an important step towards automatic image annotation, with benefits both for image retrieval and for providing image summaries to visually impaired users.