Technical Overview. This project addresses the design of features extracted from video streams to enable detection and classification of visual events. The emphasis of the project is on unsupervised learning of features and classifiers from training data and on the ability to deal efficiently with large volumes of data. Careful design of the features is crucial to ensure that they can handle a large volume of data efficiently while, at the same time, accurately extracting events that are relevant to the applications. This project focuses on the development of spatio-temporal features that can be efficiently and accurately extracted from video streams. This is combined with recent developments in the area of architectures for distributed search based on active storage, which are specifically designed for processing very large databases of images and videos, are ideally suited for making effective use of the video feature detectors. Key objectives of the project include integrating the feature extraction approach with current active storage approaches and evaluating the resulting systems in the context of applications such as video retrieval, surveillance, and forensic video reconstruction. The project addresses the fundamental questions related to the analysis of video streams: (1) What makes a good feature representation and is there a single choice of representation? (2) What spatio-temporal primitives exist in video? (3) How to efficiently detect spatio-temporal primitives? These questions are addressed in the course of the project by (1) developing a novel approach for the automatic segmentation of spatio-temporal regions that are consistent in both appearance and motion; and (2) developing a novel approach for extracting spatio-temporal features based on new "volumetric" operators that operate directly on the spatio-temporal cube and for using these features for event classification. The segmentation algorithm is an extension of the classical mean shift, and the volumetric operators are spatio-temporal extensions of box operators that have been successful in extracting and classifying features in 2D images. These two developments provide the fundamental modules for future video analysis systems. These techniques are important building blocks for efficient object recognition in video. In addition to the fundamental contributions in algorithms and feature design, the scope of the project includes validating the approaches by demonstrating their integration with emerging approaches in distributed search systems that employ active storage to enable efficient processing of 100 terabyte-sized video collections.

Broader Impact. The amount of digital video data has grown exponentially in recent years due to the increasing affordability of digital consumer video cameras, large-scale deployment of video surveillance systems, ease of digital content creation, and availability of high-speed networks and high-capacity storage devices. Manual organization and annotation of this content is becoming infeasible. Unfortunately, the technology for searching, indexing and retrieving video content has failed to keep pace. In particular, the processing of very large volumes of video data requires efficient ways of pre-processing the videos to extract features corresponding to interesting temporal events. The project addresses this need directly by providing new technology that will enable the development of large-scale video analysis tools. The products of project have potential impact on all virtually all applications of video analysis. The project focuses on a few broad classes of applications with substantial societal impact in the areas of improved access to information and security. In particular, the project contributes to video retrieval (e.g., for education), video surveillance (e.g., for homeland security), forensic video reconstruction (e.g., for law enforcement) and smart environments (e.g., for business and homes). In order to assist in the application of the technology to these areas, the project includes a plan for transfer through an external partner whose role is to provide data sets (e.g., the IRP speaker video dataset), scenarios, computing resources, software and guidance for the evaluation of the algorithms, as well as access to state of the art active storage technology. This collaboration will enable the integration our the video processing elements with new developments in active storage technology, and to demonstrate the applicability of our approach to large-scale distributed video analysis systems. URL: www.cs.cmu.edu/~hebert/vol3d.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0534962
Program Officer
Kenneth C. Whang
Project Start
Project End
Budget Start
2005-11-01
Budget End
2009-10-31
Support Year
Fiscal Year
2005
Total Cost
$305,996
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213