This project investigates a fundamental and critical, but largely unexplored issue: automatically identifying visual contexts and discovering visual patterns. Many contemporary approaches that attempt to divide and conquer the video scenes by analyzing the visual objects separately are largely confronted. Exploring visual context has shown its promise for video scene understanding. Discovering visual contexts is a challenging task, due to the content uncertainty in visual data, structure uncertainty in visual contexts, and semantic uncertainty in visual patterns. The goal of this project is to lay the foundation of contextual mining and learning for video scene understanding, by pursuing innovative approaches to discovering collocation visual patterns, to empowering contextual matching of visual patterns, and to facilitating contextual modeling for visual recognition. The research team develops a unified approach to mining visual collocation patterns and learning visual contexts, and to provide methods and tools that facilitate contextual matching and modeling.
This research significantly advances video scene modeling and understanding, and produces an important enabling technology for a wide range of applications including image/video management and search, intelligent surveillance and security, human-computer interaction, social networks, etc. This research program contributes to education through curriculum development, student involvements, and workshops and tutorials outside the vision community. This project also outreaches to K-12 education, and it provides datasets and software on its website to the community.