The human visual system can solve the computationally complex task of detecting objects in cluttered natural scenes within a fraction of a second. Human electroencephalography (EEG) studies and intracranial recordings show object category selective signals within 120ms post stimulus onset, and response-selective signals over frontal channels starting at 150-180 ms following response onset. Computational simulations have indicated that such """"""""rapid"""""""" object recognition can be done based on a single feedforward pass through the visual hierarchy, starting from primary visual cortex and progressing through inferotemporal cortex to task circuits in prefrontal cortex. Within this feedforward computational model, it is generally assumed that there is a progression from relatively simple features, such as edges and corners at the first cortical stages, to combinations of these simple features at intermediate levels, to """"""""objects"""""""" at the top of the system. According to this view, objects are all processed at essentially the same level of the visual hierarchy. However, this """"""""Standard Model"""""""" was recently challenged by behavioral demonstrations that accurate saccades towards objects can be made well before 150-180 ms. In the first such study, the French PI and co-workers reported that when two natural scenes were simultaneously flashed to the left and right of fixation, reliable saccades to images containing animals were initiated as early as 120-130 ms after image's onset. Given that saccadic programming and execution presumably needs at least 20 ms, the underlying visual processing must have completed within 100 ms;considerably earlier than the 150 ms latency of the first differential activity. The visual system's ability to detect and respond to faces is even faster, since Thorpe's lab has demonstrated saccades to faces embedded in natural scenes just 100ms post stimulus onset. Intellectual Merit. These ultra-rapid detection times pose major problems for the current """"""""Standard Model"""""""" of visual processing. With support from the National Science Foundation and the French Agence Nationale de Recherche (ANR), this project aims to test the hypothesis that the visual system can learn representations for objects early in the hierarchy. Specifically, the team's recent behavioral, electroencephalographic (EEG), and functional magnetic resonance imaging (fMRI) data suggest the possibility that the visual system can increase its processing speed on particular tasks by basing task-relevant decisions on signals that originate from intermediate processing levels, rather than requiring that stimuli are processed by the entire visual hierarchy. This hypothesis is supported by computational modeling results that establish a crucial role of intermediate feature selectivity for both object detection abilities and the visual system's robustness to deal with visual clutter. Moreover, the modeling results establish that object detection can be performed with human-level performance based on solely intermediate features, without requiring processing by the full hierarchy. This project could rewrite the book on how the brain detects objects. Instead of the classic hierarchical model, in which objects can only be coded at the very top of the system, this project will show how """"""""objects"""""""" can be detected by neurons located in early areas of visual system, especially when those objects are biologically very important and their detection requires receptive fields with resolutions only found in lower visual areas. This hypothesis will be tested using a tightly integrated multidisciplinary approach that leverages the team's background in computer science, computational neuroscience, and human vision. Specifically, the project will employ: a) behavioral studies using eye tracking to determine the capabilities of human ultra-rapid object detection, b) simultaneous EEG and eye tracking studies to determine when object-selective responses occur, c) fMRI studies to show precisely where the object-selective representations can be found in the brain, and d) computational modeling studies to determine whether such multilevel object mechanisms make sense and can account for human performance levels. In addition, the team will use these techniques to test the hypothesis that training on object localization induces the learning of object-selective representations at lower levels of the visual system that permit rapid and accurate object localization. Broader Impact. The proposed study will revolutionize our understanding of the mechanisms underlying the brain's ability to rapidly detect objects. The results will also be relevant for medical and technial challenges. Bioinspired vision systems could be very important for devising aids for patients with visual deficiencies, and the project results will be leveraged to help build improved visual aids for the blind. In addition, understanding how the brain detects objects is of great interest for improving machine visions systems, whose performance still lags far behind that of their human counterparts in terms of robustness to clutter and tolerance to transformations. The project builds on pilot data obtained by the US co-PI with the French PI and includes extensive plans for exchanges between the US and French teams. Research results from this project will be broadly disseminated through publications and conference presentations.

National Institute of Health (NIH)
National Eye Institute (NEI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IFCN-B (50))
Program Officer
Wiggs, Cheri
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Georgetown University
Schools of Medicine
United States
Zip Code
Martin, Jacob G; Davis, Charles E; Riesenhuber, Maximilian et al. (2018) Zapping 500 faces in less than 100?seconds: Evidence for extremely fast and sustained continuous visual search. Sci Rep 8:12482
Cox, Patrick H; Riesenhuber, Maximilian (2015) There Is a ""U"" in Clutter: Evidence for Robust Sparse Codes Underlying Clutter Tolerance in Human Vision. J Neurosci 35:14148-59