The video analysis community has long attempted to bridge the gap from low-level feature extraction to semantic understanding and retrieval. One important barrier impeding progress is the lack of infrastructure needed to construct useful semantic concept ontologies, building modules to extract features from the video, interpreting the semantics of what the video contains, and evaluating the tasks against benchmark truth data. To solve this fundamental problem, this project will create a shared community resource around large video collections, extracted features, video segmentation tools, scalable semantic concept lexicons with annotations, ontologies relating the concepts to each other, tools for annotation, learned models and complete software modules for automatically describing the video through concepts, and finally a benchmark set of user queries for video retrieval evaluation.
The resource will allow researchers to build their own image/video classifiers, test new low-level features, expand the concept ontology, and explore higher level search services, etc., without having to redevelop several person year?s worth of infrastructure.
Using this tool suite and reference implementation, researchers can quickly customize concept ontologies and classifiers for diverse subdomains.
The contribution of the proposed work lies in the development of a large number of critical research resources for digital video analysis and searching. The modular architecture of the proposed resources provides great flexibility in adding new ontologies and testing new analytics components developed by other researchers in different domains. The use of large diverse standardized video datasets and well-defined benchmark procedures ensures a rigorous process to assess scientific progress.
The results will facilitate rapid exploration of new ideas and solutions, contributing to advancements of major societal interest, such as next-generation media search and security.
URL: www.informedia.cs.cmu.edu/analyticsLibrary
In this project, we have developed and disseminated large scale video analytics tools and data sets for video indexing and retrieval research. Such CRI resources are useful for researchers to quickly gain access to large infrastructures when starting new research activities that require large video data, features, ontologies, and statistical models. First, we have developed and released a library of video concept detection models for detecting a few hundreds of frequent, observable concepts in broadcast news video. The model library has been used by many groups and the paper describing the models has been cited more than 100 times. One important issue in developing analytics tools is to cope with the different domains from which the video corpora are constructed and the machine learning models are trained. Solving this problem is important for ensuring the validity and robustness of the detection models in different test conditions. To address this, we have developed a domain adaptation solution, called Domain Adaptive Semantic Diffusion (DASD), with associated publications and public release of software tools. To evaluate the performance of the video analytics tools, we participated in the NIST TRECVID video retrieval evaluation forum in 2010 and demonstrated the best performance in the Multimedia Event Detection (MED) task. In order to cope with the fast growing volumes of multimedia data, we developed a new hashing method that maps high-dimensional features to compact binary codes, which can be used to quickly find nearest neighboring samples from a gigantic database (e.g., tens of millions of samples) in the high-dimensional space. Our method is novel in jointly optimizing the search speed and accuracy. The results were presented in a paper published in the prestigious IEEE Computer Vision and Pattern Recognition (CVPR) conference 2011 as an oral paper (acceptance rate 3.5%). Finally, motivated by the rapid emergency of the mobile services, we further extended our efforts to develop new tools for multimedia content analysis on the mobile platforms. We developed a new mobile query technique, called Active Query Sensing, which applied information theoretical methods to predict the best viewing angles for sensing the environment and capturing mobile images as visual queries. We applied the visual feature extraction and matching tools developed in the CRI project for developing a system for large-scale mobile visual matching. The corresponding paper received the Best Paper Award at the prestigious ACM Multimedia Conference 2011.