The visual input is a function of various conceptually orthogonal factors. Each of these factors, typically, can be represented as an underlying nonlinear manifold. So, in general, each data point lies on a mixture of manifolds. Therefore, we have a product space of all these factors, which makes the problem very challenging. However, the problem can be approached if we understand conceptually, to some extent, the topology, dimensionality and the properties of each individual manifold of the orthogonal factors that generated the data. The ultimate goal of this research is to establish general mathematical frameworks for the separation of multiple factors in the data. In particular, in context of human motion, the objective is to establish a mathematical framework that decouples intrinsic body configuration from other sources of variability that affect the visual input and, consequently, to exploit such models in recovering body configuration. To achieve this goal four research directions will be investigated 1) Learning a unified invariant content manifold representation from various style variations on the same manifold. 2) Learning factorized generative models for the data given representation of one or more of the underlying manifolds. 3) Given representation of the underlying manifold, how that can be used to select discriminative features in the visual input. 4) Applying the findings towards the recovery of intrinsic body configuration.

The problem of separation of style and content is an essential task in visual perception and is a fundamental mystery of perception. It is not clear how we perceive a common motion, such as walking, regardless of all sources of variations in its appearance. The fundamental research problems addressed in this research plan appear extensively in different computer vision as well as machine learning applications. The findings will help promote the state-of-the-art in computer vision and machine learning fields as well as bringing interesting computational models to researchers in the cognitive science field. Human motion analysis will be the main applied domain for this research. The proposed research in human motion analysis has various important applications such as surveillance, security, human computer interaction, etc. Human motion analysis will be the integrating theme between the research and the educational activities for motivating Math and Science education. The educational plan consists of several integrated activities targeting the graduate level, the undergraduate level, and high school educators and students. The goal is to develop educational tools that will integrate the efforts of the PI, high school educators, undergraduate and high school students through collaborating in the design, implementation, and evaluation of a computer vision virtual classroom.

URL: www.cs.rutgers.edu/~elgammal/Research/GStyleContent.htm

Project Report

In the last two decades, extensive research in the computer vision community has focused on the analysis and understanding of human motion in images and videos. This wide interest emanated from various potential real-world applications such as visual surveillance, human-machine interface, video archival and retrieval, computer graphics animation, autonomous driving and virtual reality. Humans are typically the most important subjects in the images and videos of these applications. Researchers have looked at a wide range of problems including detection of humans and their motion, locating faces in images, tracking people and their limbs, recovering body posture, extracting various biometrics, analysing facial expression and hand gestures. As the human body moves through the 3D world, motion is constrained by body dynamics and projected by lenses to form the visual input we capture through our cameras. Therefore, the changes (deformation) in appearance (texture, contours, edges) in the visual input (images and videos) corresponding to performing certain actions are well constrained by the three-dimensional (3D) body structure and the dynamics of the action being performed. Researchers have always tried to explicitly or implicitly exploit such kinematic and dynamic constraints in their models to recover the body configuration. Despite the high dimensionality of the configuration space, many human motions intrinsically lie on low-dimensional manifolds. This is true for the kinematics of the body, as well as for the observed motion through image sequences. For example, the silhouette (occluding contour) of a human walking is an example of a dynamic shape, where the shape deforms over time based on the action being performed. These deformations are restricted by the physical body and the temporal constraints posed by the action being performed. Given the spatial and the temporal constraints as points in a high-dimensional visual input space, these silhouettes are expected to lie on a low-dimensional manifold. Intuitively, the gait is a 1D manifold that is embedded in a high- dimensional visual space. The main contribution of this project is in developing a computational framework for learning models that can explicitly factorise the intrinsic body configuration, as a function of time, from the various appearance factors. The learned models support tasks such as synthesis and body configuration recovery, as well as the recovery of other aspects such as viewpoint, person style parameters, etc. We have applied the models that we developed on various applications of human motion analysis including gait tracking, extracting gait biometrics, analysis and synthesis of facial expression, and analysis of complex motion, such as ballet motion. We are also investigating other applications of the mathematical models in different problems, including object recognition and viewpoint estimation. The greatest accomplishment that we is highlighting that an explicit low- dimensional representation of human motion can effectively help solve the challenging posture estimation problem. Several researchers have followed our lead in investigating manifold-based representations for different human motion analysis problems.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0546372
Program Officer
Jie Yang
Project Start
Project End
Budget Start
2006-01-01
Budget End
2013-12-31
Support Year
Fiscal Year
2005
Total Cost
$500,237
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901