The project addresses the fundamental challenge of grounding high-level semantic concepts about events into low-level video data. The key innovations include: (1) Representing events via probabilistic event logic (PEL) along with corresponding inference and learning algorithms, (2) Video segmentation into a hierarchy of space-time tubes, and (3) Robustly grounding PEL into space-time tubes via AND-OR grammars. Space-time tubes are extracted by tracking candidate object boundaries across frames, where both boundary detection and tracking are learned from training videos. PEL allows for arbitrary, probabilistic, spatiotemporal constraints among events, including the traditional compositional rules, Allen relations between time intervals, and correlations among different events. Unlike existing work, the logical nature of PEL allows humans, even non-experts, to easily inject their own knowledge into the system. PEL conducts joint, holistic inference to find the globally best parse over all events, which is grounded in an AND-OR grammar of primitive events. The AND-OR grammar uses robust graph matching of video tubes for handling uncertainty in low-level visual processing. For evaluation, two video datasets of American football and a building?s atrium are compiled, with fully annotated event labels, object tracks, and spatiotemporal segmentations.

Training is provided for graduate and undergraduate students, including those from under-represented groups. The project is expected to: (a) advance the state of the art which typically focuses only on video classification; (b) make the two datasets public; (c) generate workshops/tutorials on the related topics; and (d) produce publications in the highest-impact journals/conferences.

Project Report

The project addressed one of the basic problems in computer vision, that of detecting human actions and identifying their spatiotemporal extents in video. The key outcomes include: 1) Formulation of representation, 2) Learning algorithms, and 3) Inference algorithms for parsing human actions in video using the probabilistic event logic (PEL) grounded onto a hierarchy of video segments. PEL parses the video by identifying video segments occupied by events of interest, such that the resulting parse respects compositional, force-dynamic, and the Allen temporal relations between time intervals encoded in the PEL knowledge base. Our major outcomes include the following: 1) An efficient algorithm for compiling PEL to its equivalent Conjunctive Normal Form (CNF) form, and the corresponding inference algorithm for PEL-CNF. 2) An algorithm for learning both the formulas and formula weights of PEL knowledge base in an unsupervised manner, and an algorithm for learning weights of user-specified formulas in PEL knowledge base. 3) Two new video datasets of basketball and volleyball games were collected and manually annotated with event labels, object tracks, and spatiotemporal segmentations. Extensive experimental evaluation on this project demonstrated that PEL brought the internal computer’s representation of high-level concepts closer to terms that a human could easily interact with, enabling even non-experts to naturally inject their own knowledge into the system. Training was provided for a number of graduate and undergraduate students, including those from under-represented groups. The project provided research material for: 4 M.S. theses, 2 Ph.D. dissertations, 2 Ph.D. preliminary examination presentations, and 2 Ph.D. qualifier examination presentations in the School of Electrical Engineering and Computer Science, at Oregon State University. The project results were widely disseminated in peer-reviewed publications, and invited talks by the principal investigators at top computer vision conferences. The project enabled the principal investigators to explore new research domains -- specifically, large-scale analysis of real-world videos of American football games. The project results, datasets, and list of all publications are summarized at the following website: http://blogs.oregonstate.edu/osupel/

Project Start
Project End
Budget Start
2010-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$465,984
Indirect Cost
Name
Oregon State University
Department
Type
DUNS #
City
Corvallis
State
OR
Country
United States
Zip Code
97331