Our goal is to understand how the brain accomplishes visual object recognition. Evidence obtained under this grant and from other labs suggests that each visual image is processed along the ventral visual cortical process- ing stream into a new pattern of neural activity at its top level -- the inferior temporal cortex (IT) -- that con- veys explicit information about object identity, even in the face of substantial view uncertainty (?invariance?). That IT population representation is thought to be causally responsible for object recognition. But precisely how does the IT population account for a seemingly in?nite number of object discriminations? What are the behaviorally critical ?features? conveyed by IT? How many? How can they be described? Here we aim to build and test image-to-IT-to-behavior models that are predictively accurate over the entire domain of core visual object recognition behavior. Substantial prior work argues that we should start by test- ing and developing the IT 100.1f model family: all models in that family state that IT conveys ~100, image- computable ?features? in its activity sampled at ~1 mm scale. How can we test and develop such models? First, this model family predicts that we can build and provide a single, low dimensional (<100) Euclidean em- bedding space to predict all basic and subordinate level object discrimination tasks (Aim 1). Second, the model family predicts that we can discover the particular aspects of IT activity (called IT ?features?) as those that, when weighted and summed, exactly predict behavioral object confusion of every image (Aim 2a). Third, the model family predicts that temporary suppression of individual, mm-scale portions of IT cortex will produce reliable, predictable patterns of behavioral disruption across all basic-level and subordinate level object tasks (Aim 3). Fourth, the model family posits that differences in IT neural tuning functions at spatial scales less than ~1 mm are irrelevant for core object discrimination behavior ? a prediction we will test with both record- ing (Aim 2a) and neural perturbation (Aim 3) experiments. Finally, the model family motivates our goal (Aim 2b) of characterizing the complete set of ~100 IT features with image-computable functions and with human shape adjectives. While substantial preliminary data support these predictions and goals, a complete model has not yet been built or tested. If these aims are accomplished, this work would transform our understanding by showing pre- cisely how core object recognition is causally accounted for at the level of IT cortex, and by providing a model that would accurately predict how any image manipulation or direct IT neural intervention would alter any core object recognition behavior.

Public Health Relevance

Visual object recognition is fundamental to our well-being, yet we only have a crude understanding of the brain regions that underlie this ability. The goal of these experiments is to build and test mechanistically motivated models of how neural processing and activity in the brain underlies our object recognition abilities. The results will support an understanding of how our brain sees and evaluates objects, and is likely to give new insight into how to augment or repair our object recognition abilities.

National Institute of Health (NIH)
National Eye Institute (NEI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IFCN-Q (02))
Program Officer
Flanders, Martha C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
Organized Research Units
United States
Zip Code
Rajalingham, Rishi; Issa, Elias B; Bashivan, Pouya et al. (2018) Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks. J Neurosci 38:7255-7269
Hong, Ha; Yamins, Daniel L K; Majaj, Najib J et al. (2016) Explicit information for category-orthogonal object properties increases along the ventral stream. Nat Neurosci 19:613-22
Aparicio, Paul L; Issa, Elias B; DiCarlo, James J (2016) Neurophysiological Organization of the Middle Face Patch in Macaque Inferior Temporal Cortex. J Neurosci 36:12729-12745
Rajalingham, Rishi; Schmidt, Kailyn; DiCarlo, James J (2015) Comparison of Object Recognition Behavior in Human and Monkey. J Neurosci 35:12127-36
Afraz, Arash; Boyden, Edward S; DiCarlo, James J (2015) Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination. Proc Natl Acad Sci U S A 112:6730-5
Majaj, Najib J; Hong, Ha; Solomon, Ethan A et al. (2015) Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance. J Neurosci 35:13402-18
Cadieu, Charles F; Hong, Ha; Yamins, Daniel L K et al. (2014) Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput Biol 10:e1003963
Yamins, Daniel L K; Hong, Ha; Cadieu, Charles F et al. (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U S A 111:8619-24
Issa, Elias B; Papanastassiou, Alex M; DiCarlo, James J (2013) Large-scale, high-resolution neurophysiological maps underlying FMRI of macaque temporal lobe. J Neurosci 33:15207-19
Baldassi, Carlo; Alemi-Neissi, Alireza; Pagan, Marino et al. (2013) Shape similarity, better than semantic membership, accounts for the structure of visual object representations in a population of monkey inferotemporal neurons. PLoS Comput Biol 9:e1003167

Showing the most recent 10 out of 25 publications