The sense of vision gives people an instant picture of the world, enabling rapid recognition of objects and others, and to understand their relationships and the layout of the scene. Machine vision is essential to many applications of artificial intelligence, but cannot yet emulate the richness and robustness of human vision. The study of human vision and the development of machine vision have mutually shaped each other in a virtuous cycle. The design of deep neural networks, the kind of artificial neural networks that now dominate machine vision, was inspired by neurobiological principles. In turn, neuroscientists have recently found that deep neural networks trained to recognize objects provide the best current model of human and primate vision. However, current artificial neural networks still fail to capture visual recognition capabilities of the human brain. The goal of this project is to learn what computational mechanisms best explain human vision. To achieve this, the research will develop and apply a novel methodology — "controversial stimuli." Controversial stimuli are computer-generated visual images optimized to cause two neural network models to disagree about their content. Presenting such a stimulus to a human observer will identify neural network models that mimic human vision. These stimuli will systematically compare artificial neural network models to human brains, and find ways to improve the models. This project will generate scientific insights on human vision and engineering insights for machine vision. The project will also generate outreach activities to improve public understanding of the power and limitations of neural networks, and their relationship to human intelligence.
The project will develop methods for the synthesis of controversial stimuli, images optimized to adjudicate among alternative deep neural network models of human vision with brain and behavioral data. An initial behavioral experiment will challenge humans to classify and rate various controversial stimuli. Further experiments will design and employ synthetic stimuli for adjudicating between deep neural network models on the basis of brain activity measurements. Hemodynamic responses to the stimuli will be measured in the human ventral visual stream with functional magnetic resonance imaging (fMRI). Each of the fMRI experiments will test a different aspect of deep neural network modeling of visual neural responses, including the distinction between discriminative and generative image classifiers. The stimulus synthesis algorithms that will be developed in this project, as well as the resulting stimuli and corresponding fMRI and behavioral datasets, will be shared with the scientific community as an open resource.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.