This SGER proposal sets forth a one year research plan to explore two exciting and fundamentally new directions, namely (1) discovering significant scenes and objects in photo collections available on the internet and (2) reconstructing dense geometry from these online photo collections.
The scale, diversity, and unorganized nature of photos posted on the Internet present major challenges to existing computer vision techniques. This SGER proposal describes key solutions to address these challenges. The first goal is to devise techniques to automatically infer the content of image collections, by identifying the significant scenes and objects that are contained therein. For example, it may be possible to deduce all of the popular tourist sites, statues, paintings, and other artifacts of Rome from the nearly one million photos on Flickr. This requires having to compute canonical views of these scenes and objects, that together cover the most interesting aspects of these scenes. The second goal is to develop multi-view stereo techniques (MVS) that can operate effectively on such uncontrolled and highly variable image sets. Because lighting, camera response, and foreground clutter can differ substantially from image to image, new stereo matching algorithms are required. The PIs will develop strategies for selecting which views to combine, explore matching metrics to factor out lighting and camera variations, and ultimately leverage lighting variations to combine photometric and geometric stereo.
Ultimately, this research can lead to the creation of a geometry crawler that scours the Internet for objects to reconstruct such an approach could be used, given sufficient compute time and compute power, to automatically create 3D models for all of the world's well- photographed sites, cities, landscapes, and objects.
The outcome of this research consists of tools that can automatically discover and describe scenes and reconstruct geometric models from Internet collections. This outcome will enable a host of important applications, ranging across 3D visualization, localization, communication, and recognition, that go well beyond traditional computer vision problems and can have broad impacts for the population at large. In addition, the proposed work will create a large set of new resources for a range of audiences. First, the output of the research will be massive datasets of registered imagery for many world sites and dense 3D reconstructions of those same sites. This data will be distributed broadly to help advance research in the computer vision community. The data will also be made available for many other purposes, such as computer graphics research into image-based rendering, cultural heritage, localization efforts, and scientific visualization.