Is this you? Create Your Porfile

Adria Recasens

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adria Recasens is active.

Explore More

Publication

Featured researches published by Adria Recasens.

european conference on computer vision | 2016

Where Should Saliency Models Look Next

Zoya Bylinskii; Adria Recasens; Ali Borji; Aude Oliva; Antonio Torralba

Recently, large breakthroughs have been observed in saliency modeling. The top scores on saliency benchmarks have become dominated by neural network models of saliency, and some evaluation scores have begun to saturate. Large jumps in performance relative to previous models can be found across datasets, image types, and evaluation metrics. Have saliency models begun to converge on human performance? In this paper, we re-examine the current state-of-the-art using a fine-grained analysis on image types, individual images, and image regions. Using experiments to gather annotations for high-density regions of human eye fixations on images in two established saliency datasets, MIT300 and CAT2000, we quantify up to 60% of the remaining errors of saliency models. We argue that to continue to approach human-level performance, saliency models will need to discover higher-level concepts in images: text, objects of gaze and action, locations of motion, and expected locations of people in images. Moreover, they will need to reason about the relative importance of image regions, such as focusing on the most important person in the room or the most informative sign on the road. More accurately tracking performance will require finer-grained evaluations and metrics. Pushing performance further will require higher-level image understanding.

computer vision and pattern recognition | 2017

Emotion Recognition in Context

Ronak Kosti; Jose M. Alvarez; Adria Recasens; Àgata Lapedriza

Understanding what a person is experiencing from her frame of reference is essential in our everyday life. For this reason, one can think that machines with this type of ability would interact better with people. However, there are no current systems capable of understanding in detail peoples emotional states. Previous research on computer vision to recognize emotions has mainly focused on analyzing the facial expression, usually classifying it into the 6 basic emotions [11]. However, the context plays an important role in emotion perception, and when the context is incorporated, we can infer more emotional states. In this paper we present the Emotions in Context Database (EMCO), a dataset of images containing people in context in non-controlled environments. In these images, people are annotated with 26 emotional categories and also with the continuous dimensions valence, arousal, and dominance [21]. With the EMCO dataset, we trained a Convolutional Neural Network model that jointly analyzes the person and the whole scene to recognize rich information about emotional states. With this, we show the importance of considering the context for recognizing peoples emotions in images, and provide a benchmark in the task of emotion recognition in visual context.

european conference on machine learning | 2013

Spectral Learning of Sequence Taggers over Continuous Sequences

Adria Recasens; Ariadna Quattoni

In this paper we present a spectral algorithm for learning weighted finite-state sequence taggers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modelling paired input-output sequences and have numerous applications in real-world problems. Our approach is based on generalizing the class of weighted finite-state sequence taggers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of the linear combination are determined by dynamic features of the continuous input sequence. The resulting learning algorithm is efficient and accurate.

european conference on computer vision | 2018

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

David F. Harwath; Adria Recasens; Dídac Surís; Galen Chuang; Antonio Torralba; James R. Glass

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to. We demonstrate that these audio-visual associative localizations emerge from network-internal representations learned as a by-product of training to perform an image-audio retrieval task. Our models operate directly on the image pixels and speech waveform, and do not rely on any conventional supervision in the form of labels, segmentations, or alignments between the modalities during training. We perform analysis using the Places 205 and ADE20k datasets demonstrating that our models implicitly learn semantically-coupled object and word detectors.

computer vision and pattern recognition | 2017

EMOTIC: Emotions in Context Dataset

Ronak Kosti; Jose M. Alvarez; Adria Recasens; Àgata Lapedriza

Recognizing peoples emotions from their frame of reference is very important in our everyday life. This capacity helps us to perceive or predict the subsequent actions of people, interact effectively with them and to be sympathetic and sensitive toward them. Hence, one should expect that a machine needs to have a similar capability of understanding peoples feelings in order to correctly interact with humans. Current research on emotion recognition has focused on the analysis of facial expressions. However, recognizing emotions requires also understanding the scene in which a person is immersed. The unavailability of suitable data to study such a problem has made research in emotion recognition in context difficult. In this paper, we present the EMOTIC database (from EMOTions In Context), a database of images with people in real environments, annotated with their apparent emotions. We defined an extended list of 26 emotion categories to annotate the images, and combined these annotations with three common continuous dimensions: Valence, Arousal, and Dominance. Images in the database are annotated using the Amazon Mechanical Turk (AMT) platform. The resulting set contains 18,313 images with 23,788 annotated people. The goal of this paper is to present the EMOTIC database, detailing how it was created and the information available. We expect this dataset can help to open up new horizons on creating systems able of recognizing rich information about peoples apparent emotional states.

neural information processing systems | 2015