Traditional video analysis research has been centered on detection and recognition tasks for objects and activities from known sources with a fairly narrow range of content. This effort would extend the predictable dual view hashing algorithm developed in previous work from images to videos. Many videos can be naturally associated with text annotations by the producer and consumer comments (tags), language derived from speech tracks using speech to text methods or the semantic words associated with applying vision models like human detectors and local activity detectors. The team will combine appearance based methods for video classification with language models derived from these text sources so that videos can be retrieved via a natural language like interface. This will involve investigating ways of fusing these different text sources in one vector space language model and then applying the dual view hashing methods to a database of videos. They can then investigate retrieval performance using the text codes for a form of zero shot category definition. The research is driven by the need for intelligence analysts to be able to express video queries more efficiently than traditional relevance feedback and to be able to provide more expressive queries that include nouns and verbs as they would with human language. While still constrained the approach goes a long way toward bridging the gap between traditional relevance feedback based only on assumed relationships in the image, and full human language queries.