This research creates and evaluates algorithms, representations, and display structures that analyze---and help the human video browser understand and search---the scenic and thematic structure of extended (half-hour or more) video sequences. These algorithms and representations segment dramas, comedies, newscasts, sporting events, and talk shows into three levels of structural significance: camera shots; scenes, or sets; and the thematic repetitions of scenes, or acts, thus providing a means to recapture something like the originating ``storyboard'' level of the video's outline. These three levels of semantically coherent units are then displayed in a novel musical score-like graphic format, with user variable precision and detail. This work explores four aspects of analysis: the selection of appropriate low-level image features, the design of representations and relationships of structure at each of the three levels based on tunable human memory models, the exploration of display configurations and video subsequence selection for effective user browsing, and the validation of algorithmic optimizations and user feedback over a large corpus of videos. This work makes explicit some of the structural rules and heuristics that govern the creation of thematically coherent long video sequences. Educationally, these issues have already inspired a new upper division course, "Visual Interfaces to Computers".