The project investigates computational methods for object detection, spatial scene construction, and natural language spatial descriptions derived from real-time visual images to describe prototypical indoor spaces (e.g., rooms, offices, etc.). The primary application of this research is to provide blind or visually impaired users with spatial information about their surroundings that may otherwise be difficult to obtain from non-visual sensing. Such knowledge will assist in development of accurate cognitive models of the environment and will support better informed execution of spatial behaviors in everyday tasks.
A second motivation for the work is to contribute to the improvement of spatial capacities for computers and robots. Computers and robots are similarly "blind" to images unless they have been provided some means to "see" and understand them. Currently, no robotic system is able to reliably perform high-level processing of spatial information on the basis of image sequences, e.g., to find an empty chair in a room, which not only means finding an empty chair in an image, but also localizing the chair in the room, and performing an action of reaching the chair. The guiding tenet of this research is that a better understanding of spatial knowledge acquisition from visual images and concepts of spatial awareness by humans can also be applied to reducing the ambiguity and uncertainty of information processing by autonomous systems.
A central contribution of this work is to make the spatial information content of visual images available to the visually impaired, a rapidly growing demographic of our aging society. In an example scenario a blind person and her guide dog are walking to her doctor's office, an office which she has not previously visited. At the office she needs information for performing some essential tasks such as finding the check-in counter, available seating, or the bathroom. No existing accessible navigation systems are able to describe the spatial parameters of an environment and help detect and localize objects in that space. Our work will provide the underlying research and elements to realize such a system.