In this project the goal is to develop a computer vision system to assist visually impaired people when navigating in indoor environments. A novel feature of the vision system is that it uses a motion sensing input device to gather both an image and depth map of the scene as input. By learning joint representations that combine both depth and intensity information, powerful features can be learned that give a dramatic improvement over existing scene understanding algorithms (which rely on intensity information alone). The availability of depth information also allows the recovery of the room geometry and permits the construction of new types of 3D priors on the locations of objects, not currently possible with existing approaches. The output of the vision system will be communicated to the visually impaired person via a number of possible methods: (i) a tactile hand-grip on a cane; (ii) a wearable pad embedded with actuators and (iii) the BrainPort sensor which has an array of tiny actuators that are placed on the tongue.
The expected results of the project are: (i) a large dataset of indoor scenes with depth maps and dense labels; (ii) new open-source algorithms for fusing depth and intensity information to aid scene understanding and (iii) a prototype vision-based assistive device for visually impaired people. The project aims to assist the approximately 2.5 million people in the US are blind or partially sighted.