The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase I project is considerable because a variety of complementary new technologies is ushering in a new era in which visual messages are becoming a first-class media type along-side text and speech. Today, both amateur and professional videographers still have to enter the virtual darkroom to sift through video, edit it, and produce engaging content. Video creation is waiting for its Polaroid moment, when a technological solution will transform the post-production time required to create engaging video. If successful, the technology developed in this project will greatly increase the utility of any video capture device and would have implications outside of Internet media in areas such as life recording and knowledge transfer. The countless video clips of important or memorable events that are today commonly archived and forgotten could instead be automatically summarized and made available in a usable and engaging format.
This Small Business Innovation Research (SBIR) Phase I project aims to evaluate the technical viability of an automatic video summarization system based on neural networks and adapted to measurements of human psychology. As people collectively record more videos than they can possibly consume (the video deluge problem), a technology that automatically turns raw videos into relevant and engaging summaries becomes increasingly critical. The company's proposed platform would streamline video sharing, search, and viewing, all of which are staples of our online lives. Scientifically we are at a unique time in the capabilities of artificial visual systems, with some systems rivaling human performance in limited domains. Furthermore, the field of visual psychology has also seen recent progress in relating visual semantic information to cognitive phenomena, like memorability of images. Taken together, it may now be possible to automatically predict the cognitive relevance of visual information and produce effective video summarizations. This project combines deep neural networks for visual object recognition, recurrent networks for contextually embedded temporal information, and user measurement of interest, memorability, and uniqueness. The primary technical objective is to determine whether a system can automatically predict human-produced video summarizations.