This research creates and evaluates algorithms, representations, and display structures that analyze---and help the human video browser understand and search---the scenic and thematic structure of extended (half-hour or more) video sequences. These algorithms and representations segment dramas, comedies, newscasts, sporting events, and talk shows into three levels of structural significance: camera shots; scenes, or sets; and the thematic repetitions of scenes, or acts, thus providing a means to recapture something like the originating ``storyboard'' level of the video's outline. These three levels of semantically coherent units are then displayed in a novel musical score-like graphic format, with user variable precision and detail. This work explores four aspects of analysis: the selection of appropriate low-level image features, the design of representations and relationships of structure at each of the three levels based on tunable human memory models, the exploration of display configurations and video subsequence selection for effective user browsing, and the validation of algorithmic optimizations and user feedback over a large corpus of videos. This work makes explicit some of the structural rules and heuristics that govern the creation of thematically coherent long video sequences. Educationally, these issues have already inspired a new upper division course, "Visual Interfaces to Computers".

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9812026
Program Officer
Vladimir J. Lumelsky
Project Start
Project End
Budget Start
1998-09-01
Budget End
2001-08-31
Support Year
Fiscal Year
1998
Total Cost
$240,000
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027