Audiovisual Scene Analysis

In many applications such as Homeland Security, video surveillance is a crucial component. An important task is to extract moving objects (vehicles, people) and to separate the individual sounds (car noise, speech). But this is done usually separately, the former from the image sequences alone, and the latter from the audio (of the sound mixture). However, the two are closely coupled. We aim to do the two jointly, and term the task: Audiovisual Scene Analysis. The tool we shall use is Probabilistic Graphical Models, esp. Generative Models. A generative model includes hidden variables which affect the observed data (in this case, video with the associated audio). Using the Expectation-Maximization algorithms, these parameters can be estimated. The challenge is to come up with appropriate models such that the parameter estimation will lead to audiovisual object separation. Once the audiovisual objects in a video are separated, then techniques such as face recognition and speech analysis could be used to detect objects and events of importance to specific applications.

Project Start
Project End
Budget Start
2004-09-15
Budget End
2008-08-31
Support Year
Fiscal Year
2004
Total Cost
$599,999
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820