The human vision system understands and interprets complex scenes for a wide range of visual tasks in real-time while consuming less than 20 Watts of power. This Expeditions-in-Computing project explores holistic design of machine vision systems that have the potential to approach and eventually exceed the capabilities of human vision systems. This will enable the next generation of machine vision systems to not only record images but also understand visual content. Such smart machine vision systems will have a multi-faceted impact on society, including visual aids for visually impaired persons, driver assistance for reducing automotive accidents, and augmented reality for enhanced shopping, travel, and safety. The transformative nature of the research will inspire and train a new generation of students in inter-disciplinary work that spans neuroscience, computing and engineering discipline.
While several machine vision systems today can each successfully perform one or a few human tasks ? such as detecting human faces in point-and-shoot cameras ? they are still limited in their ability to perform a wide range of visual tasks, to operate in complex, cluttered environments, and to provide reasoning for their decisions. In contrast, the mammalian visual cortex excels in a broad variety of goal-oriented cognitive tasks, and is at least three orders of magnitude more energy efficient than customized state-of-the-art machine vision systems. The proposed research envisions a holistic design of a machine vision system that will approach the cognitive abilities of the human cortex, by developing a comprehensive solution consisting of vision algorithms, hardware design, human-machine interfaces, and information storage. The project aims to understand the fundamental mechanisms used in the visual cortex to enable the design of new vision algorithms and hardware fabrics that can improve power, speed, flexibility, and recognition accuracies relative to existing machine vision systems. Towards this goal, the project proposes an ambitious inter-disciplinary research agenda that will (i) understand goal-directed visual attention mechanisms in the brain to design task-driven vision algorithms; (ii) develop vision theory and algorithms that scale in performance with increasing complexity of a scene; (iii) integrate complementary approaches in biological and machine vision techniques; (iv) develop a new-genre of computing architectures inspired by advances in both the understanding of the visual cortex and the emergence of electronic devices; and (v) design human-computer interfaces that will effectively assist end-users while preserving privacy and maximizing utility. These advances will allow us to replace current-day cameras with cognitive visual systems that more intelligently analyze and understand complex scenes, and dynamically interact with users.
Machine vision systems that understand and interact with their environment in ways similar to humans will enable new transformative applications. The project will develop experimental platforms to: (1) assist visually impaired people; (2) enhance driver attention; and (3) augment reality to provide enhanced experience for retail shopping or a vacation visit, and enhanced safety for critical public infrastructure. This project will result in education and research artifacts that will be disseminated widely through a web portal and via online lecture delivery. The resulting artifacts and prototypes will enhance successful ongoing outreach programs to under-represented minorities and the general public, such as museum exhibits, science fairs, and a summer camp aimed at K-12 students. It will also spur similar new outreach efforts at other partner locations. The project will help identify and develop course material and projects directed at instilling interest in computing fields for students in four-year colleges. Partnerships with two Hispanic serving institutes, industry, national labs and international projects are also planned.