Aaron C. Courville | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aaron C. Courville is active.

Explore More

Publication

Featured researches published by Aaron C. Courville.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Representation Learning: A Review and New Perspectives

Yoshua Bengio; Aaron C. Courville; Pascal Vincent

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

international conference on machine learning | 2007

An empirical evaluation of deep architectures on problems with many factors of variation

Hugo Larochelle; Dumitru Erhan; Aaron C. Courville; James Bergstra; Yoshua Bengio

Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine learning algorithms already report reasonable results. Here, we present a series of experiments which indicate that these models show promise in solving harder learning problems that exhibit many factors of variation. These models are compared with well-established algorithms such as Support Vector Machines and single hidden-layer feed-forward neural networks.

international conference on computer vision | 2015

Describing Videos by Exploiting Temporal Structure

Li Yao; Atousa Torabi; Kyunghyun Cho; Nicolas Ballas; Chris Pal; Hugo Larochelle; Aaron C. Courville

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description model. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.

Trends in Cognitive Sciences | 2006

Bayesian theories of conditioning in a changing world

Aaron C. Courville; Nathaniel D. Daw; David S. Touretzky

The recent flowering of Bayesian approaches invites the re-examination of classic issues in behavior, even in areas as venerable as Pavlovian conditioning. A statistical account can offer a new, principled interpretation of behavior, and previous experiments and theories can inform many unexplored aspects of the Bayesian enterprise. Here we consider one such issue: the finding that surprising events provoke animals to learn faster. We suggest that, in a statistical account of conditioning, surprise signals change and therefore uncertainty and the need for new learning. We discuss inference in a world that changes and show how experimental results involving surprise can be interpreted from this perspective, and also how, thus understood, these phenomena help constrain statistical theories of animal and human learning.

international conference on multimodal interfaces | 2013

Combining modality specific deep neural networks for emotion recognition in video

Samira Ebrahimi Kahou; Chris Pal; Xavier Bouthillier; Pierre Froumenty; Caglar Gulcehre; Roland Memisevic; Pascal Vincent; Aaron C. Courville; Yoshua Bengio; Raul Chandias Ferrari; Mehdi Mirza; Sébastien Jean; Pierre-Luc Carrier; Yann N. Dauphin; Nicolas Boulanger-Lewandowski; Abhishek Aggarwal; Jeremie Zumer; Pascal Lamblin; Jean-Philippe Raymond; Guillaume Desjardins; Razvan Pascanu; David Warde-Farley; Atousa Torabi; Arjun Sharma; Emmanuel Bengio; Myriam Côté; Kishore Reddy Konda; Zhenzhou Wu

In this paper we present the techniques used for the University of Montréals team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions expressed by the primary human subject in short video clips extracted from feature length movies. This involves the analysis of video clips of acted scenes lasting approximately one-two seconds, including the audio track which may contain human voices as well as background music. Our approach combines multiple deep neural networks for different data modalities, including: (1) a deep convolutional neural network for the analysis of facial expressions within video frames; (2) a deep belief net to capture audio information; (3) a deep autoencoder to model the spatio-temporal information produced by the human actions depicted within the entire scene; and (4) a shallow network architecture focused on extracted features of the mouth of the primary human subject in the scene. We discuss each of these techniques, their performance characteristics and different strategies to aggregate their predictions. Our best single model was a convolutional neural network trained to predict emotions from static frames using two large data sets, the Toronto Face Database and our own set of faces images harvested from Google image search, followed by a per frame aggregation strategy that used the challenge training data. This yielded a test set accuracy of 35.58%. Using our best strategy for aggregating our top performing models into a single predictor we were able to produce an accuracy of 41.03% on the challenge test set. These compare favorably to the challenge baseline test set accuracy of 27.56%.

international conference on neural information processing | 2013

Challenges in Representation Learning: A Report on Three Machine Learning Contests

Ian J. Goodfellow; Dumitru Erhan; Pierre Carrier; Aaron C. Courville; Mehdi Mirza; Ben Hamner; Will Cukierski; Yichuan Tang; David Thaler; Dong-Hyun Lee; Yingbo Zhou; Chetan Ramaiah; Fangxiang Feng; Ruifan Li; Xiaojie Wang; Dimitris Athanasakis; John Shawe-Taylor; Maxim Milakov; John Park; Radu Tudor Ionescu; Marius Popescu; Cristian Grozea; James Bergstra; Jingjing Xie; Lukasz Romaszko; Bing Xu; Zhang Chuang; Yoshua Bengio

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.

european conference on computer vision | 2012

Disentangling factors of variation for facial expression recognition

Salah Rifai; Yoshua Bengio; Aaron C. Courville; Pascal Vincent; Mehdi Mirza

We propose a semi-supervised approach to solve the task of emotion recognition in 2D face images using recent ideas in deep learning for handling the factors of variation present in data. An emotion classification algorithm should be both robust to (1) remaining variations due to the pose of the face in the image after centering and alignment, (2) the identity or morphology of the face. In order to achieve this invariance, we propose to learn a hierarchy of features in which we gradually filter the factors of variation arising from both (1) and (2). We address (1) by using a multi-scale contractive convolutional network (CCNET) in order to obtain invariance to translations of the facial traits in the image. Using the feature representation produced by the CCNET, we train a Contractive Discriminative Analysis (CDA) feature extractor, a novel variant of the Contractive Auto-Encoder (CAE), designed to learn a representation separating out the emotion-related factors from the others (which mostly capture the subject identity, and what is left of pose after the CCNET). This system beats the state-of-the-art on a recently proposed dataset for facial expression recognition, the Toronto Face Database, moving the state-of-art accuracy from 82.4% to 85.0%, while the CCNET and CDA improve accuracy of a standard CAE by 8%.

IEEE Transactions on Multimedia | 2015

Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks

Kyunghyun Cho; Aaron C. Courville; Yoshua Bengio

Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output problems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input. In this paper we focus on the case where the input also has a rich structure and the input and output structures are somehow related. We describe systems that learn to attend to different places in the input, for each element of the output, for a variety of tasks: machine translation, image caption generation, video clip description, and speech recognition. All these systems are based on a shared set of building blocks: gated recurrent neural networks and convolutional neural networks, along with trained attention mechanisms. We report on experimental results with these systems, showing impressively good performance and the advantage of the attention mechanism.

The International Journal of Robotics Research | 2006

A Generative Model of Terrain for Autonomous Navigation in Vegetation

Carl Wellington; Aaron C. Courville; Anthony Stentz

Current approaches to off-road autonomous navigation are often limited by their ability to build a terrain model from sensor data. Available sensors make very indirect measurements of quantities of interest such as the supporting ground height and the location of obstacles, especially in domains where vegetation may hide the ground surface or partially obscure obstacles. A generative, probabilistic terrain model is introduced that exploits natural structure found in off-road environments to constrain the problem and use ambiguous sensor data more effectively. The model includes two Markov random fields that encode the assumptions that ground heights smoothly vary and terrain classes tend to cluster. The model also includes a latent variable that encodes the assumption that vegetation of a single type has a similar height. The model parameters can be trained by simply driving through representative terrain. Results from a number of challenging test scenarios in an agricultural domain reveal that exploiting the 3D structure inherent in outdoor domains significantly improves ground estimates and obstacle detection accuracy, and allows the system to infer the supporting ground surface even when it is hidden under dense vegetation.

robotics: science and systems | 2005

Interacting Markov Random Fields for Simultaneous Terrain Modeling and Obstacle Detection.

Carl Wellington; Aaron C. Courville; Anthony Stentz

Autonomous navigation in outdoor environments with vegetation is difficult because available sensors make very indirect measurements on quantities of interest such as the supporting ground height and the location of obstacles. We introduce a terrain model that includes spatial constraints on these quantities to exploit structure found in outdoor domains and use available sensor data more effectively. The model consists of a latent variable that establishes a prior that favors vegetation of a similar height, plus multiple Markov random fields that incorporate neighborhood interactions and impose a prior on smooth ground and class continuity. These Markov random fields interact through a hidden semi-Markov model that enforces a prior on the vertical structure of elements in the environment. The system runs in real-time and has been trained and tested using real data from an agricultural setting. Results show that exploiting the 3D structure inherent in outdoor domains significantly improves ground height estimates and obstacle detection accuracy.

Explore More