Digital information processing has become an essential part of modern life. It is nowadays often expressed in a form of multimedia, involving videos accompanied with images, captions, and audio. Given the explosive growth of such multimedia data, it is extremely critical that it is accurately summarized and organized for automatic processing in artificial intelligence. One important yet challenging problem is automatic interpretation and summarization of video content, having enormous applications in video advertisements, online video searching and browsing, movie recommendation based on personal preference, and essentially any electronic commerce platform. In this project, the research team plans to develop statistical tools to raise our capacity of processing digital information to respond to a rapid growth of video content in real-world applications. The primary objective is to create a learning system to decipher the meaning of visual expressions as perceived by the audience, with a focus on understanding semantic meaning conveyed by a video.

This project aims to develop methods of automatic video interpretation and description, which understands visual thoughts expressed by a video and generates semantic expressions of the content of a video. Particularly, it will utilize conditional dependence structures of entities as well as between entities and their pertinent actions, in a framework of multi-label and hierarchical classification. It will focus on three areas: 1) entity and action learning, 2) semantic learning for long videos and content-based segmentation, and 3) automatic video description generation, each of which develops techniques in novel ways. In each area, classification will be performed collaboratively based on pairwise conditional label dependencies and temporal dependencies of video frames, characterized by graphical and hidden Markov models. Special effort will be devoted to learning from multiple sources and extracting latent structures corresponding to scenes of a video. The PIs also plan to release the software developed as open source and build a user community around the language by ensuring that interested researchers are able to contribute to the codebase of the software developed. This will allow a wider growth of the project. This aspect is of special interest to the software cluster in the Office of Advanced Cyberinfrastructure, which has provided co-funding for this award.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1721550
Program Officer
Christopher Stark
Project Start
Project End
Budget Start
2017-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2017
Total Cost
$160,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305