This project provides solutions to one of the important components of an interactive, object-based, semantic multimedia information retrieval system: "mosaic" generation. A mosaic can be considered as a static component (or background) of a scene that does not change over a sequence of frames and is obtained by computing the global motion between frames, warping according to the global motion, and then blending the frames. Mosaic generation plays an important role in many applications including object-based coding (where objects in a scene are also coded or compressed independent of regular rectangular frame coding), video compression, video indexing, object tracking, virtual environments, security surveillance, wide-area surveillance, panoramic video, traffic monitoring, object recognition, and human behavior analysis since these applications usually require the subtraction of actual scenes from the background (or the mosaic) to determine the foreground objects. Since traditional mosaic generation methods require object segmentation for videos containing moving objects, they are not suitable for real-time mosaic generation and especially for video encoders that require sprite coding (coding based on layering objects on top of a mosaic or a sprite). This project (i) develops mosaic generation solutions for larger domains of videos; (ii) generates mosaics for videos containing many shots by classifying video shots; and (iii) develops objective evaluation methods for mosaic generation by producing ground-truths. Sprite fusion method blends assertive and conservative sprites that are generated using the aligned frame differences without object segmentation thus eliminating the object occlusion problem especially for videos where the camera tracks an object. Since the sprite fusion method computes the global motion once for a pair of frames and does not require object segmentation, it suits well for real-time mosaic generation or sprite coding for video encoders. As the mosaic is constructed for a sequence, the global motion vectors are stored to be able to generate (or warp) any frame with respect to another frame thus allowing to provide multiple degrees-of-freedom for spatial interactions including up-down, left-right, move forward-backward, and rotation. The multiple degrees-of-freedom with the warping of the current (or active) scene forms the basis of interactive video reproduction by regenerating any frame from the mosaic and then overlaying the active (current scene) on top of the reproduced frame.

The educational component of this project includes development of a new course on multimedia that appeals to any freshman students in order to grow interest in computer science. This course covers fundamental concepts in computer science including types of media, color models, storage devices, multimedia authoring, and internet. A multimedia workshop is planned for K-12 students. The results of this research project will be disseminated via Internet (www.cs.uah.edu/~raygun/projects/mosaics.htm), including a video database system (video sets and any truth annotation for it) to share test data with other researchers and to encourage open metrics-based evaluations across research institutions. This project will foster the development of photo-realistic visualization systems using interactive video reproduction, with a wide range of applications.

Project Report

Mosaic generation is the process of generating a static component (or background) of a scene that does not change in a video by computing the displacement (or camera motion in general) between images of video, aligning images on top of each other according to the camera motion, and then blending these images to get the big picture of the static scene or object. Mosaic generation plays an important role in many applications including object-based compression (where objects in a scene are also coded or compressed independent of regular rectangular image coding), video compression, video indexing, object tracking, virtual environments, security surveillance, wide-area surveillance, panoramic video, traffic monitoring, object recognition, and human behavior analysis since these applications usually require the subtraction of actual scenes from the background (or the mosaic) to determine the foreground objects. In this project, we developed a video classification method by extracting features based on presence of moving objects and global motion (or camera motion in general) in the video. This classification helps to determine the suitability of video for mosaic generation. Hence, this helps for real or commercial systems to determine whether the video is suitable for mosaic generation or not. Past research worked on a small set of videos to demonstrate on how mosaic generation algorithms worked. However, providing results on a few videos does not indicate that proposed algorithms work on other videos. In this research, the videos are categorized into domains so that research in mosaic generation targets video domains rather than a small set of videos. One major problem in mosaic generation is the presence of moving objects that should be removed from the mosaic. We proposed "sprite fusion" method by merging two types of mosaics that are called as "assertive" and "conservative". Sprite fusion is a blending method for mosaic generation that targets tracking videos and do not require identification or segmentation of moving objects to generate the mosaic. Moving object segmentation is fairly complex and limits the deployment of mosaic generation in commercial systems. Sprite fusion is not affected by the slow motion, visual static pattern, occlusion, size, or the number of moving objects. Sprite fusion deals tracking videos where moving objects are tried to be maintained in the center of video. We provide a formal proof that sprite fusion provides good results for this domain of videos (not just a few videos). It is hard to assess the quality and correctness of a mosaic. In the past, an expert checks the correctness of the mosaic subjectively, and an objective measure (i.e., Peak-Signal-To-Noise-Ratio (PSNR)) is used to quantify the correctness. To overcome the limitations of this objective measure, we proposed "synthetic video generation" from a high-resolution image by applying camera motion patterns. Since the video is generated by predetermined camera motion patterns, the correct motion and the correct mosaic are stored and used to validate the results of a mosaic generation algorithm. Synthetic videos can also identify the parts of a video that a mosaic generation algorithm fails or identify its weaknesses with respect to camera motion patterns. We have developed a video database that is available to researchers in mosaic generation area. The video database contains original videos and synthetic videos for researchers to test and compare their results. We have developed three types of applications: video reproduction, interactive retrieval, and virtual tour. Video reproduction provides video editing such as object centralization and aspect ratio conversion (4:3 -> 16:9) using mosaics. Interactive retrieval provides retrieval of spatio-temporal content of video using gamepad. Virtual tour enables photo-realistic virtual tour of an environment with actions such as "open the door". We have organized two workshops: academic and K12. The academic workshop (IEEE International Workshop on Video Panorama) was organized under IEEE International Symposium on Multimedia in December 5-7, 2011 in Dana Point, California. K12 Workshop was held together with Butler High School in May 2012. Based on the surveys, 48 students attended the workshop from grades 9 through 12 with majority of students from minorities. We have developed a new course to be offered for freshman students. The PI has made two presentations on mosaic generation at Alabama A&M University (a HBCU). The PI made a presentation on image representations in Spring 2012 with one of his students at Alabama A&M University. Two graduate students completed their master degrees with thesis on video classification and interactive retrieval components of the project. One graduate student completed PhD degree on sprite fusion and its applications. The PI has hired two REU (research experience for undergraduate) students to teach the basics of mosaic generation research. The PI has developed a laboratory with green screen, multimedia editing software, and computers to conduct research.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0812307
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2008-10-01
Budget End
2012-09-30
Support Year
Fiscal Year
2008
Total Cost
$205,441
Indirect Cost
Name
University of Alabama in Huntsville
Department
Type
DUNS #
City
Huntsville
State
AL
Country
United States
Zip Code
35805