Video clips and corresponding narrations together provide much richer information than either in isolation, yet most current recognition systems process visual and textual information separately. The PIs focus on the task of learning how to recognize corresponding actions in videos and textual narrative accurately and robustly. In particular, they focus on semantic descriptions of human actions. This research will have broad impact on applications including video retrieval in digital libraries, human behavior modeling, and video surveillance.

The PIs' research will tightly couple methods in computer vision, natural-language processing, and machine learning through robust, automatically learned correspondences. With a collection of loosely aligned video-text annotation pairs (such as movies or TV shows with their associated screenplays), the task is to learn how to associate action descriptions in text with actions, objects and actors in videos. This correspondence is essential for semantic grounding of text using visual action appearance. The fundamental challenge is bridging the semantic gap of images and of text: images depict geometrical relationships and properties of image regions, while natural language encodes abstract semantic relationships in grammatical structures. Bridging this semantic gap in the context of action understanding is the focus of our research effort.

The eventual goal is to be able to recognize actions in videos and create text description for actions in videos. While this goal challenges both computer vision and natural language processing, it also opens up an exciting new and very fruitful collaboration between the two research areas where the task of recognition is achieved by simultaneous learning and inference in both domains.

Information on this project, including papers, results, database and open source codes, will be available at www.seas.upenn.edu/~jshi/#research

Project Start
Project End
Budget Start
2008-09-01
Budget End
2011-08-31
Support Year
Fiscal Year
2008
Total Cost
$575,801
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104