Video has been very difficult to understand by a machine. While humans have a seemingly "direct" way of seeing something and understanding a scene in terms of background, foreground objects, and motions, video understanding has been one of the perplexing problems of automatic video and image analysis to date.
This project proposes a "divide and conquer" approach, where several hundred general-purpose concepts (e.g., outdoors, animals) will be used to describe and annotate a very large universe of scenes commonly depicted in video. Analogously to a limited vocabulary set of indexing terms one might find in a library card catalog, each video scene can be annotated through a combination of these concepts ("metadata"). To go beyond a mere listing of objects, actions and scenes visible in the video, carefully chosen concepts also allow the description of relationships between them ("ontology"), which allows for much richer composite descriptions. The challenge will be to define these concepts so that they satisfy several criteria at once: * The concepts must represent things frequently visible in video broadcasts. * The concepts must be clearly identifiable to give computer algorithms a chance to detect them automatically. * The concepts must be linkable into an ontology that defines how concepts are related.
Since video annotations, whether done by a computer or a human, always will contain errors, this work will incorporate probabilistic confidence metrics ("fuzzy metadata") into the annotation. No existing indexing and classification schemes have explicitly defined standards for measuring and reporting errors and omissions of indexing annotations; since librarians and archivists have traditionally assumed that an index contains only complete, trusted and verified metadata.
Furthermore, the project will assess to what extent these concepts can be automatically extracted with state of the art video analysis techniques. Using footage from documentaries and television news, the project will perform video search and retrieval experiments to determine the usefulness of the ontology and the confidence of the annotations.