This project will develop new approaches for recovering the three-dimensional (3D) shape and pose of the human body in images and video sequences. The methods will use a detailed 3D body model learned from laser range scans of over 2000 people. The approach will model the shape variation across people as well as the non-rigid shape variation due to changes in pose. The project will develop and test methods for robustly recovering the body shape in surveillance video sequences, in scenes with strong lighting, from collections of snapshots and in unconstrained television/film sequences. The recovered body model will be used to produce a variety of biometric measurements.
The majority of images and video sequences are of humans and recognizing people and their actions is a core problem in computer vision. The problem is challenging however because the human body is a complex, non-rigid, and articulated structure that can vary dramatically in pose, shape and appearance. Current methods focus on estimating human pose and typically ignore the problem of human shape estimation. This project will treat these problems together resulting in more robust solutions which will have a wide ranging impact in multiple disciplines. Human pose estimation is currently used in areas such as gait analysis, special effects, game development, human factors, and sports training to name a few. Robust video-based systems like the one developed here will extend the range of applications to home entertainment, elder care, autonomous vehicles and animal movement analysis. By extending previous methods to also estimate the three-dimensional shape of the human body in images and video sequences this project will enable additional applications in video forensics, surveillance, preventative medicine and special effects. More generally, methods like those developed here, that robustly recover the shape and pose of people in complex images and video streams, will be applicable to a wider range of problems in object recognition and tracking.
Project website: www.cs.brown.edu/~black/SCAPE.html
Human Body Shape and Pose from Images The robust estimation of human pose and shape from images and video is important for applications in graphics, medicine, surveillance, and special effects. Previous work has focused primarily on estimating body pose and not shape. Three-dimensional body shape is challenging to estimate from sensor data such as images and video because information about 3D is missing. To enable robust shape estimation, we use a detailed, but low-dimensional, graphics model of the human body that is learned from a database of 3D laser range scans of over 2000 bodies (Figure 1). While much richer than previous human body models used in computer vision, the low-dimensional nature of the model makes it computationally practical. It also allows us to recover both body shape and pose simultaneously. To estimate body shape from a signal image, we use a variety of cues such as image contours, edges and shading (Figure 2). Our approach relates body shape and illumination to how the body appears in images. The approach is even robust enough to estimate body shape from paintings (Figure 3). The technology of body shape estimation promises to provide new tools to several disciplines beyond computer vision including medicine, fitness and computer graphics. Additionally there are several commercial applications of such technology in forensic video analysis and the garment industry. To be practical, however, one must be able to estimate body shape under clothing. To this end, we developed a model of how clothing shape deviates from the human body. With this we are able to approximately estimate the shape of a person under their clothes (Figure 4). This will enable applications such as virtual clothing shopping and virtual try-on, in which users have their body "scanned" with a few images and then their 3D body model does the shopping for them. To create a more accurate body model, we developed a method that uses the Microsoft Kinect game sensor as an input device (Figure 5). The Kinect produces noisy depth information. By combining depth measurements from people in multiple poses, we accurately estimate the person’s body shape. The Kinect costs less than $200 and the accuracy of the recovered body model rivals methods costing tens of thousands of dollars. The project has led to two patent filings focused on body shape estimation for the apparel industry. The techniques developed in this project promise a new way of estimating body shape and sizing people to clothing that fits. This could change the way people shop for clothing on-line.