Digital 2D photos and videos are already ubiquitous in our everyday life. Fueled by new products such as Google Earth and interactive 3D games, the public seems to have a renewed interest in 3D contents. Creating 3D contents beyond simple fixed viewpoint stereoscopic movies has so far remained to be a labor intensive and expensive process. This research investigates new software algorithms that can significantly simplify this modeling process, in particular for dynamic models. The use of inexpensive depth sensors (such a stereo camera) will capture a dynamic scene over time from different locations and automatically generate a complete 4D model. The recovered models can be used in many applications such as simulations to create realistic virtual environment, entertainment to render special effect, and perhaps simply to allow everyone to enjoy their cherish moments, such as a baby?s first step, in interactive 3D.
From a technical standpoint, from the input sequence of color+depth maps, which provides a partial sampling of the entire 4D (space+time) model, the central research problem is how to fuse these partial samples to form a complete model. Different from all previous hole filling approaches, this research aims to deal with models that exhibit one or more of the following characteristics: large (e.g., several city blocks), dynamic, deforming, yet sparsely sampled (e.g., less than 50% is available), and possibly very noisy. Reconstructing a complete model under these conditions can be very challenging or sometime ill-posed. However, scene structures are usually not stochastic; the same or similar structure element may have appeared in the input set a few times, probably at a different time and location. Therefore, samples from different time or space can be used to fill in the missing data, making model completion possible, without using any external sources.