A video sequence is a rich multimodal information source, including speech, text, audio (non-speech portion), color patterns and shapes of imaged objects (reflected in individual frames), and motion of these objects (revealed by changes between frames). Although the human being can quickly interpret the embedded semantic content from the information carried by different modalities, computer understanding of a video sequence is still in a primitive stage. The aim of this project is to develop new theory and techniques for scene segmentation and classification in a video sequence, which is key to video understanding. Research in this arena has in the past several years focused on the use of text, speech and image information. The proposed research explores the use of motion and audio characteristics, which will provide important complimentary information. New results are anticipated both in the general theory of feature analysis and classification, and in practical techniques for video understanding and scene classification. These new developments will have direct applications in information indexing and retrieval in multimedia databases, spotting and tracking of special events in surveillance video, video editing and movie stratification, etc.