Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James J. DiCarlo is active.

Publication


Featured researches published by James J. DiCarlo.


PLOS Computational Biology | 2008

Why is real-world visual object recognition hard?

Nicolas Pinto; David Cox; James J. DiCarlo

Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brains anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, “natural” images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled “natural” images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientists “null” model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a “simpler” recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.


Proceedings of the National Academy of Sciences of the United States of America | 2014

Performance-optimized hierarchical models predict neural responses in higher visual cortex

Daniel Yamins; Ha Hong; Charles F. Cadieu; Ethan A. Solomon; Darren Seibert; James J. DiCarlo

Significance Humans and monkeys easily recognize objects in scenes. This ability is known to be supported by a network of hierarchically interconnected brain areas. However, understanding neurons in higher levels of this hierarchy has long remained a major challenge in visual systems neuroscience. We use computational techniques to identify a neural network model that matches human performance on challenging object categorization tasks. Although not explicitly constrained to match neural data, this model turns out to be highly predictive of neural responses in both the V4 and inferior temporal cortex, the top two layers of the ventral visual hierarchy. In addition to yielding greatly improved models of visual cortex, these results suggest that a process of biological performance optimization directly shaped neural mechanisms. The ventral visual stream underlies key human visual object recognition abilities. However, neural encoding in the higher areas of the ventral stream remains poorly understood. Here, we describe a modeling approach that yields a quantitatively accurate model of inferior temporal (IT) cortex, the highest ventral cortical area. Using high-throughput computational techniques, we discovered that, within a class of biologically plausible hierarchical neural network models, there is a strong correlation between a model’s categorization performance and its ability to predict individual IT neural unit response data. To pursue this idea, we then identified a high-performing neural network that matches human performance on a range of recognition tasks. Critically, even though we did not constrain this model to match neural data, its top output layer turns out to be highly predictive of IT spiking responses to complex naturalistic images at both the single site and population levels. Moreover, the model’s intermediate layers are highly predictive of neural responses in the V4 cortex, a midlevel visual area that provides the dominant cortical input to IT. These results show that performance optimization—applied in a biologically appropriate model class—can be used to build quantitative predictive models of neural processing.


Neuron | 2006

Object selectivity of local field potentials and spikes in the macaque inferior temporal cortex.

Gabriel Kreiman; Chou P. Hung; Alexander Kraskov; Rodrigo Quian Quiroga; Tomaso Poggio; James J. DiCarlo

Local field potentials (LFPs) arise largely from dendritic activity over large brain regions and thus provide a measure of the input to and local processing within an area. We characterized LFPs and their relationship to spikes (multi and single unit) in monkey inferior temporal cortex (IT). LFP responses in IT to complex objects showed strong selectivity at 44% of the sites and tolerance to retinal position and size. The LFP preferences were poorly predicted by the spike preferences at the same site but were better explained by averaging spikes within approximately 3 mm. A comparison of separate sites suggests that selectivity is similar on a scale of approximately 800 microm for spikes and approximately 5 mm for LFPs. These observations imply that inputs to IT neurons convey selectivity for complex shapes and that such input may have an underlying organization spanning several millimeters.


The Journal of Neuroscience | 2006

Discrimination training alters object representations in human extrastriate cortex.

Hans Op de Beeck; Chris I. Baker; James J. DiCarlo; Nancy Kanwisher

Visual object recognition relies critically on learning. However, little is known about the effect of object learning in human visual cortex, and in particular how the spatial distribution of training effects relates to the distribution of object and face selectivity across the cortex before training. We scanned human subjects with high-resolution functional magnetic resonance imaging (fMRI) while they viewed novel object classes, both before and after extensive training to discriminate between exemplars within one of these object classes. Training increased the strength of the response in visual cortex to trained objects compared with untrained objects. However, training did not simply induce a uniform increase in the response to trained objects: the magnitude of this training effect varied substantially across subregions of extrastriate cortex, with some showing a twofold increase in response to trained objects and others (including the right fusiform face area) showing no significant effect of training. Furthermore, the spatial distribution of training effects could not be predicted from the spatial distribution of either pretrained responses or face selectivity. Instead, training changed the spatial distribution of activity across the cortex. These findings support a dynamic view of the ventral visual pathway in which the cortical representation of an object category is continuously modulated by experience.


Science | 2008

Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex

Nuo Li; James J. DiCarlo

Object recognition is challenging because each object produces myriad retinal images. Responses of neurons from the inferior temporal cortex (IT) are selective to different objects, yet tolerant (“invariant”) to changes in object position, scale, and pose. How does the brain construct this neuronal tolerance? We report a form of neuronal learning that suggests the underlying solution. Targeted alteration of the natural temporal contiguity of visual experience caused specific changes in IT position tolerance. This unsupervised temporal slowness learning (UTL) was substantial, increased with experience, and was significant in single IT neurons after just 1 hour. Together with previous theoretical work and human object perception experiments, we speculate that UTL may reflect the mechanism by which the visual stream builds and maintains tolerant object representations.


The Journal of Neuroscience | 2010

Selectivity and Tolerance (“Invariance”) Both Increase as Visual Information Propagates from Cortical Area V4 to IT

Nicole C. Rust; James J. DiCarlo

Our ability to recognize objects despite large changes in position, size, and context is achieved through computations that are thought to increase both the shape selectivity and the tolerance (“invariance”) of the visual representation at successive stages of the ventral pathway [visual cortical areas V1, V2, and V4 and inferior temporal cortex (IT)]. However, these ideas have proven difficult to test. Here, we consider how well population activity patterns at two stages of the ventral stream (V4 and IT) discriminate between, and generalize across, different images. We found that both V4 and IT encode natural images with similar fidelity, whereas the IT population is much more sensitive to controlled, statistical scrambling of those images. Scrambling sensitivity was proportional to receptive field (RF) size in both V4 and IT, suggesting that, on average, the number of visual feature conjunctions implemented by a V4 or IT neuron is directly related to its RF size. We also found that the IT population could better discriminate between objects across changes in position, scale, and context, thus directly demonstrating a V4-to-IT gain in tolerance. This tolerance gain could be accounted for by both a decrease in single-unit sensitivity to identity-preserving transformations (e.g., an increase in RF size) and an increase in the maintenance of rank-order object selectivity within the RF. These results demonstrate that, as visual information travels from V4 to IT, the population representation is reformatted to become more selective for feature conjunctions and more tolerant to identity preserving transformations, and they reveal the single-unit response properties that underlie that reformatting.


The Journal of Neuroscience | 2005

Multiple Object Response Normalization in Monkey Inferotemporal Cortex

Davide Zoccolan; David Cox; James J. DiCarlo

The highest stages of the visual ventral pathway are commonly assumed to provide robust representation of object identity by disregarding confounding factors such as object position, size, illumination, and the presence of other objects (clutter). However, whereas neuronal responses in monkey inferotemporal cortex (IT) can show robust tolerance to position and size changes, previous work shows that responses to preferred objects are usually reduced by the presence of nonpreferred objects. More broadly, we do not yet understand multiple object representation in IT. In this study, we systematically examined IT responses to pairs and triplets of objects in three passively viewing monkeys across a broad range of object effectiveness. We found that, at least under these limited clutter conditions, a large fraction of the response of each IT neuron to multiple objects is reliably predicted as the average of its responses to the constituent objects in isolation. That is, multiple object responses depend primarily on the relative effectiveness of the constituent objects, regardless of object identity. This average effect becomes virtually perfect when populations of IT neurons are pooled. Furthermore, the average effect cannot simply be explained by attentional shifts but behaves as a primarily feedforward response property. Together, our observations are most consistent with mechanistic models in which IT neuronal outputs are normalized by summed synaptic drive into IT or spiking activity within IT and suggest that normalization mechanisms previously revealed at earlier visual areas are operating throughout the ventral visual stream.


PLOS Computational Biology | 2014

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Charles F. Cadieu; Ha Hong; Daniel Yamins; Nicolas Pinto; Diego Ardila; Ethan A. Solomon; Najib J. Majaj; James J. DiCarlo

The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.


Nature Neuroscience | 2016

Using goal-driven deep learning models to understand sensory cortex.

Daniel Yamins; James J. DiCarlo

Fueled by innovation in the computer vision and artificial intelligence communities, recent developments in computational neuroscience have used goal-driven hierarchical convolutional neural networks (HCNNs) to make strides in modeling neural single-unit and population responses in higher visual cortical areas. In this Perspective, we review the recent progress in a broader modeling context and describe some of the key technical innovations that have supported it. We then outline how the goal-driven HCNN approach can be used to delve even more deeply into understanding the development and organization of sensory cortical processing.


PLOS Computational Biology | 2009

A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation

Nicolas Pinto; David Doukhan; James J. DiCarlo; David Cox

While many models of biological object recognition share a common set of “broad-stroke” properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model—e.g., the number of units per layer, the size of pooling kernels, exponents in normalization operations, etc. Since the number of such parameters (explicit or implicit) is typically large and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea or because the correct “parts” have not been tuned correctly, assembled at sufficient scale, or provided with enough training. Here, we present a high-throughput approach to the exploration of such parameter sets, leveraging recent advances in stream processing hardware (high-end NVIDIA graphic cards and the PlayStation 3s IBM Cell Processor). In analogy to high-throughput screening approaches in molecular biology and genetics, we explored thousands of potential network architectures and parameter instantiations, screening those that show promising object recognition performance for further analysis. We show that this approach can yield significant, reproducible gains in performance across an array of basic object recognition tasks, consistently outperforming a variety of state-of-the-art purpose-built vision systems from the literature. As the scale of available computational power continues to expand, we argue that this approach has the potential to greatly accelerate progress in both artificial vision and our understanding of the computational underpinning of biological vision.

Collaboration


Dive into the James J. DiCarlo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ha Hong

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Nicolas Pinto

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Davide Zoccolan

International School for Advanced Studies

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Elias B. Issa

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nuo Li

Howard Hughes Medical Institute

View shared research outputs
Top Co-Authors

Avatar

Tomaso Poggio

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge