Is this you? Create Your Porfile

Aude Oliva

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aude Oliva is active.

Explore More

Publication

Featured researches published by Aude Oliva.

International Journal of Computer Vision | 2001

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Aude Oliva; Antonio Torralba

In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

computer vision and pattern recognition | 2010

SUN database: Large-scale scene recognition from abbey to zoo

Jianxiong Xiao; James Hays; Krista A. Ehinger; Aude Oliva; Antonio Torralba

Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories. Whereas standard databases for object categorization contain hundreds of different classes of objects, the largest available dataset of scene categories contains only 15 classes. In this paper we propose the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images. We use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. We measure human scene classification performance on the SUN database and compare this with computational methods. Additionally, we study a finer-grained scene representation to detect scenes embedded inside of larger scenes.

Psychological Review | 2006

Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search.

Antonio Torralba; Aude Oliva; Monica S. Castelhano; John M. Henderson

Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scene-centered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.

Progress in Brain Research | 2006

Building the gist of a scene: the role of global image features in recognition.

Aude Oliva; Antonio Torralba

Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? On the basis of behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene gist understanding, based on scene-centered, rather than object-centered primitives. We show that the structure of a scene image can be estimated by the mean of global image features, providing a statistical summary of the spatial layout properties (Spatial Envelope representation) of the scene. Global features are based on configurations of spatial scales and are estimated without invoking segmentation or grouping operations. The scene-centered approach is not an alternative to local image analysis but would serve as a feed-forward and parallel pathway of visual processing, able to quickly constrain local feature analysis and enhance object recognition in cluttered natural scenes.

Proceedings of the National Academy of Sciences of the United States of America | 2007

A feedforward architecture accounts for rapid categorization

Thomas Serre; Aude Oliva; Tomaso Poggio

Primates are remarkably good at recognizing objects. The level of performance of their visual system and its robustness to image degradations still surpasses the best computer vision systems despite decades of engineering effort. In particular, the high accuracy of primates in ultra rapid object categorization and rapid serial visual presentation tasks is remarkable. Given the number of processing stages involved and typical neural latencies, such rapid visual processing is likely to be mostly feedforward. Here we show that a specific implementation of a class of feedforward theories of object recognition (that extend the Hubel and Wiesel simple-to-complex cell hierarchy and account for many anatomical and physiological constraints) can predict the level and the pattern of performance achieved by humans on a rapid masked animal vs. non-animal categorization task.

Network: Computation In Neural Systems | 2003

Statistics of natural image categories

Antonio Torralba; Aude Oliva

In this paper we study the statistical properties of natural images belonging to different categories and their relevance for scene and object categorization tasks. We discuss how second-order statistics are correlated with image categories, scene scale and objects. We propose how scene categorization could be computed in a feedforward manner in order to provide top-down and contextual information very early in the visual processing chain. Results show how visual categorization based directly on low-level features, without grouping or segmentation stages, can benefit object localization and identification. We show how simple image statistics can be used to predict the presence and absence of objects in the scene before exploring the image.

Psychological Science | 1994

From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition

Philippe G. Schyns; Aude Oliva

In very fast recognition tasks, scenes are identified as fast as isolated objects How can this efficiency be achieved, considering the large number of component objects and interfering factors, such as cast shadows and occlusions? Scene categories tend to have distinct and typical spatial organizations of their major components If human perceptual structures were tuned to extract this information early in processing, a coarse-to-fine process could account for efficient scene recognition A coarse description of the input scene (oriented “blobs” in a particular spatial organization) would initiate recognition before the identity of the objects is processed We report two experiments that contrast the respective roles of coarse and fine information in fast identification of natural scenes The first experiment investigated whether coarse and fine information were used at different stages of processing The second experiment tested whether coarse-to-fine processing accounts for fast scene categorization The data suggest that recognition occurs at both coarse and fine spatial scales By attending first to the coarse scale, the visual system can get a quick and rough estimate of the input to activate scene schemas in memory, attending to fine information allows refinement, or refutation, of the raw estimate

Proceedings of the National Academy of Sciences of the United States of America | 2008

Visual long-term memory has a massive storage capacity for object details

Timothy F. Brady; Talia Konkle; George A. Alvarez; Aude Oliva

One of the major lessons of memory research has been that human memory is fallible, imprecise, and subject to interference. Thus, although observers can remember thousands of images, it is widely assumed that these memories lack detail. Contrary to this assumption, here we show that long-term memory is capable of storing a massive number of objects with details from the image. Participants viewed pictures of 2,500 objects over the course of 5.5 h. Afterward, they were shown pairs of images and indicated which of the two they had seen. The previously viewed item could be paired with either an object from a novel category, an object of the same basic-level category, or the same object in a different state or pose. Performance in each of these conditions was remarkably high (92%, 88%, and 87%, respectively), suggesting that participants successfully maintained detailed representations of thousands of images. These results have implications for cognitive models, in which capacity limitations impose a primary computational constraint (e.g., models of object recognition), and pose a challenge to neural models of memory storage and retrieval, which must be able to account for such a large and detailed storage capacity.

computer vision and pattern recognition | 2016

Learning Deep Features for Discriminative Localization

Bolei Zhou; Aditya Khosla; Àgata Lapedriza; Aude Oliva; Antonio Torralba

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.

international conference on image processing | 2003

Top-down control of visual attention in object detection

Aude Oliva; Antonio Torralba; Monica S. Castelhano; John M. Henderson

Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.

Explore More