This project develops core techniques for improving the performance of key tasks in computer vision, such as recognizing objects, understanding scenes and events. Improving the performance of these tasks is able to generate broader impacts to the following applications: (1) video surveillance for security and timely intelligence; (2) intelligent robots for rescue in disaster areas; and (3) aerial scene and activity understanding from videos taken by unmanned aerial vehicles. In these applications, a significant portion of the contents in images, including i) entities such as objects, stuff like liquid, human actions, and scenes; and ii) relations, such as intents of humans, causal effects of actions, physical fields and attractions in a scene, cannot be recognized by the geometry and appearance features that are commonly used in current computer vision research. These entities and relations are referred as the "dark matter" and "dark energy," by analogy to cosmology models in physics, and plans to develop a unified representation that integrate the "visible" and the "dark" in a common model where the visible can be used to infer the dark, and the dark pose constraints for the inference of the visible in return. The research team is collaborating with industrial partner for technology transfer.

More specifically, the project studies the following topics: i) Representing causal knowledge to go beyond associational knowledge in computer vision. Casual models are a large part of human knowledge and crucial for answering deeper questions on why, why not, what if (counterfactual). This research is the first formal study of causality (learning, modeling, and reasoning) in the vision literature. ii) Reasoning the dark entities and relations to go beyond the current geometry and appearance-based paradigm. Perceptual causality, human intents and physics are generally applicable to all categories of object, scene, action and events, i.e., transportable across datasets. These entities and relations are deeper, and more invariant, than geometry and appearance - the dominating features used in visual recognition. iii) Developing joint representation and joint inference algorithm. The rich contextual and causal links in this joint representation are essential for building robust vision systems where each visual entity can be inferred through multi-routes, but are not systematically studied and integrated in the existing paradigm.

Project Start
Project End
Budget Start
2014-07-15
Budget End
2018-06-30
Support Year
Fiscal Year
2014
Total Cost
$454,400
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095