It has been widely acknowledged that recognizing objects in images, and human activities in video - the basic problems in computer vision - can be significantly improved by accounting for object (activity) parts, context, and their spatiotemporal relationships. This is because these constraints facilitate resolving ambiguous hypotheses in the face of uncertainty. Since parts and contexts can be efficiently modeled by graphical models (e.g., Conditional Random Field), object and activity recognition are often formulated as probabilistic inference of graphical models. The project develops a new theoretical framework of graphical models that explicitly encodes high-order, spatiotemporal, hierarchical, and contextual interactions among objects (activities) as Quadratic Mutual-Exclusion Constraints (QMCs), for the purposes of object and activity recognition in images and video.
The key contributions of the project work include: 1) Approaches to view-invariant object and activity recognition; 2) Formulations of learning and inference of graphical models representing objects and human activities, as finding a maximum weight subgraph (MWS) under the QMCs; 3) Polynomial-time algorithms for solving the MWS problem subject to QMCs; and 4) Explicit performance bounds and theoretical guarantees of tightness and convergence of the proposed learning and inference algorithms.
The project framework encodes hard constraints from the domain of interest that have never been used in prior work, and uses principled, polynomial-time algorithms for learning and inference. The research of this project advances the state of the art in object and activity recognition, and enables new applications including video surveillance, retrieval from large datasets, and perception of mobile robots.