The main goal of this project is to automate the construction and manipulation of very high-level, three-dimensional structural and appearance representations of humans from un-instrumented monocular video. This technology would enable a broad spectrum of applications including video browsing and indexing (content-based access to digital libraries), entertainment, virtual reality or human-computer interaction. Our methodology involves an alliance between supervised and unsupervised statistical modeling and learning methods, non- linear optimization and sampling techniques and computer vision. The goal of these procedures is to automate the model construction process. We aim for compact representations that have the optimal level of complexity in order to ensure stable and reliable perceptual inferences.

Broad project significance

The purpose of this research is to derive artificial systems that are able to recover accurate three-dimensional models of human structure and appearance from video sequences filmed with a single camera (this include movies, sports or cultural events like ballet, or home recorded videos). Human are the prevailing subjects in the existing video data, which typically records their motions, actions or expressions, the fine or coarse details of their behavior, the way they collaborate and communicate. Visualizing or analyzing scenes with complex life events based on reconstructed human models is an important problem for the advancement of a variety of technological fields including digital libraries and archives, video coding, entertainment, animation and virtual reality, as well as intelligent human-computer interfaces, protection and security

Human analysis in video is an open research problem facing important scientific and computational challenges. The proportions of the human body vary across individuals due to gender, weight, age or race. Aside from this variability, any single human body has many degrees of freedom due to articulation and the individual limbs are deformable due to muscle and clothing. Finally many real-world scenes involve multiple interacting humans occluded by each other or by other objects. The scene conditions may also vary due to the camera motion or lighting changes. These factors make accurate 3d human models difficult to build and difficult to reconstruct reliably from flat 2d images. In order to address these challenges, this research will involve synergies between optimization algorithms, computer vision and image processing technologies. A key component of our approach is the use of large scale statistical learning methods in order to automatically acquire compact models of humans directly from videos filmed in the real world. In this respect, this research can lead to fruitful connections with other computer science disciplines like computer graphics or computer animation. These areas are known for their highly realistic, but extraordinary complex to construct models of the physical world, which often require intensive laboratory design by skilled artists.

URL: http://ttic.uchicago.edu/~crismin/human_models_from_video.html

Project Start
Project End
Budget Start
2006-03-01
Budget End
2010-02-28
Support Year
Fiscal Year
2005
Total Cost
$336,929
Indirect Cost
Name
Toyota Technological Institute at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60637