Traditional video analysis research has been centered on detection and recognition tasks for objects and activities from known sources with a fairly narrow range of content. This effort would extend the predictable dual view hashing algorithm developed in previous work from images to videos. Many videos can be naturally associated with text annotations by the producer and consumer comments (tags), language derived from speech tracks using speech to text methods or the semantic words associated with applying vision models like human detectors and local activity detectors. The team will combine appearance based methods for video classification with language models derived from these text sources so that videos can be retrieved via a natural language like interface. This will involve investigating ways of fusing these different text sources in one vector space language model and then applying the dual view hashing methods to a database of videos. They can then investigate retrieval performance using the text codes for a form of zero shot category definition. The research is driven by the need for intelligence analysts to be able to express video queries more efficiently than traditional relevance feedback and to be able to provide more expressive queries that include nouns and verbs as they would with human language. While still constrained the approach goes a long way toward bridging the gap between traditional relevance feedback based only on assumed relationships in the image, and full human language queries.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1359900
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-10-01
Budget End
2015-09-30
Support Year
Fiscal Year
2013
Total Cost
$234,225
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742