The goal driving this project is a computer vision system that takes continuously streaming video from a moving camera as input and opportunistically builds a visual description of the world around it, starting from a very agnostic state where it reconstructs everything from scratch, and progressing towards a state where it is mostly recognizing new input as something it has seen before. The envisioned system will appear in a diverse set of applications, such as in wearable computing as an assistant for fully seeing as well as visually impaired users, general human computer interaction, monitoring television streams, surveillance, robotics, navigation, smart cars, mapping and 3D reconstruction.

The project will explore a symbiosis between geometry and recognition, aiming at entirely automatic visual 3D reconstruction of an environment. The challenge is to develop a theory that allows constructing a system that can improve the visual description over time in a trustworthy and robust manner, incorporating large flexibility, while utilizing the constraints and previously gathered knowledge necessary to approach human performance.

To meet this challenge, an experimental system will be developed that combines affine-covariant image region extraction and description, context-aware queries, geometric validation, topological and metric mapping. Affine-covariant region extraction and description provides features that are robust to occlusion and changes in viewpoint or illumination. The ability to quickly bring forth relevant material from the past is provided by context-aware queries. Geometric consistency of a configuration of local features then serves as validation of match correctness.

The validation allows continuous learning of which features and descriptors are stable and which feature constellations work efficiently in pulling consistent matches out of the previously gathered database. A topological (or quasi-metric) map of scenes and objects is created based on the validated matches. The map consists of a number of world cliques that contain 3D positions and visual descriptor classes of feature points that give rise to stable affine-covariant regions. When sufficient information is gathered, the topologial map is fused into a global metric map using structure from motion techniques. The results will be evaluated through simulations and by using real video with ground truth positioning data.

The broader impacts of the project include applications to large-scale 3D modeling in commercial and military cartography, generation of virtual walkthroughs, and autonomous robotic navigation. In addition, the PI will develop a certificate program in computer vision and graphics, and will explore applications of the computer vision system to assisting visually impaired users.

URL: www.vis.uky.edu/~dnister/Projects/CAREER/career.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0545920
Program Officer
Jie Yang
Project Start
Project End
Budget Start
2006-01-01
Budget End
2006-12-31
Support Year
Fiscal Year
2005
Total Cost
$104,005
Indirect Cost
Name
University of Kentucky
Department
Type
DUNS #
City
Lexington
State
KY
Country
United States
Zip Code
40506