Hugo Larochelle | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hugo Larochelle is active.

Explore More

Publication

Featured researches published by Hugo Larochelle.

international conference on machine learning | 2008

Extracting and composing robust features with denoising autoencoders

Pascal Vincent; Hugo Larochelle; Yoshua Bengio; Pierre-Antoine Manzagol

Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite.

Journal of Machine Learning Research | 2016

Domain-adversarial training of neural networks

Yaroslav Ganin; Evgeniya Ustinova; Hana Ajakan; Pascal Germain; Hugo Larochelle; François Laviolette; Mario Marchand; Victor S. Lempitsky

We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.

international conference on machine learning | 2007

An empirical evaluation of deep architectures on problems with many factors of variation

Hugo Larochelle; Dumitru Erhan; Aaron C. Courville; James Bergstra; Yoshua Bengio

Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine learning algorithms already report reasonable results. Here, we present a series of experiments which indicate that these models show promise in solving harder learning problems that exhibit many factors of variation. These models are compared with well-established algorithms such as Support Vector Machines and single hidden-layer feed-forward neural networks.

international conference on machine learning | 2008

Classification using discriminative restricted Boltzmann machines

Hugo Larochelle; Yoshua Bengio

Recently, many applications for Restricted Boltzmann Machines (RBMs) have been developed for a large variety of learning problems. However, RBMs are usually used as feature extractors for another learning algorithm or to provide a good initialization for deep feed-forward neural network classifiers, and are not considered as a standalone solution to classification problems. In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers. We present an evaluation of different learning algorithms for RBMs which aim at introducing a discriminative component to RBM training and improve their performance as classifiers. This approach is simple in that RBMs are used directly to build a classifier, rather than as a stepping stone. Finally, we demonstrate how discriminative RBMs can also be successfully employed in a semi-supervised setting.

international conference on computer vision | 2015

Describing Videos by Exploiting Temporal Structure

Li Yao; Atousa Torabi; Kyunghyun Cho; Nicolas Ballas; Chris Pal; Hugo Larochelle; Aaron C. Courville

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description model. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.

Neural Computation | 2012

Learning where to attend with deep architectures for image tracking

Misha Denil; Loris Bazzani; Hugo Larochelle; Nando de Freitas

We discuss an attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of perception, the model consists of two interacting pathways, identity and control, intended to mirror the what and where pathways in neuroscience models. The identity pathway models object appearance and performs classification using deep (factored)-restricted Boltzmann machines. At each point in time, the observations consist of foveated images, with decaying resolution toward the periphery of the gaze. The control pathway models the location, orientation, scale, and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the control pathway, we encounter an attentional mechanism that learns to select gazes so as to minimize tracking uncertainty. Unlike in our previous work, we introduce gaze selection strategies that operate in the presence of partial information and on a continuous action space. We show that a straightforward extension of the existing approach to the partial information setting results in poor performance, and we propose an alternative method based on modeling the reward surface as a gaussian process. This approach gives good performance in the presence of partial information and allows us to expand the action space from a small, discrete set of fixation points to a continuous domain.

computer vision and pattern recognition | 2017

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Harm de Vries; Florian Strub; Sarath Chandar; Olivier Pietquin; Hugo Larochelle; Aaron C. Courville

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.

Medical Image Analysis | 2017

ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI

Oskar Maier; Bjoern H. Menze; Janina von der Gablentz; Levin Häni; Mattias P. Heinrich; Matthias Liebrand; Stefan Winzeck; Abdul W. Basit; Paul Bentley; Liang Chen; Daan Christiaens; Francis Dutil; Karl Egger; Chaolu Feng; Ben Glocker; Michael Götz; Tom Haeck; Hanna Leena Halme; Mohammad Havaei; Khan M. Iftekharuddin; Pierre-Marc Jodoin; Konstantinos Kamnitsas; Elias Kellner; Antti Korvenoja; Hugo Larochelle; Christian Ledig; Jia-Hong Lee; Frederik Maes; Qaiser Mahmood; Klaus H. Maier-Hein

&NA; Ischemic stroke is the most common cerebrovascular disease, and its diagnosis, treatment, and study relies on non‐invasive imaging. Algorithms for stroke lesion segmentation from magnetic resonance imaging (MRI) volumes are intensely researched, but the reported results are largely incomparable due to different datasets and evaluation schemes. We approached this urgent problem of comparability with the Ischemic Stroke Lesion Segmentation (ISLES) challenge organized in conjunction with the MICCAI 2015 conference. In this paper we propose a common evaluation framework, describe the publicly available datasets, and present the results of the two sub‐challenges: Sub‐Acute Stroke Lesion Segmentation (SISS) and Stroke Perfusion Estimation (SPES). A total of 16 research groups participated with a wide range of state‐of‐the‐art automatic segmentation algorithms. A thorough analysis of the obtained data enables a critical evaluation of the current state‐of‐the‐art, recommendations for further developments, and the identification of remaining challenges. The segmentation of acute perfusion lesions addressed in SPES was found to be feasible. However, algorithms applied to sub‐acute lesion segmentation in SISS still lack accuracy. Overall, no algorithmic characteristic of any method was found to perform superior to the others. Instead, the characteristics of stroke lesion appearances, their evolution, and the observed challenges should be studied in detail. The annotated ISLES image datasets continue to be publicly available through an online evaluation system to serve as an ongoing benchmarking resource (www.isles‐challenge.org). HighlightsEvaluation framework for automatic stroke lesion segmentation from MRIPublic multi‐center, multi‐vendor, multi‐protocol databases releasedOngoing fair and automated benchmark with expert created ground truth setsComparison of 14+7 groups who responded to an open challenge in MICCAISegmentation feasible in acute and unsolved in sub‐acute cases Graphical abstract Figure. No caption available.

computer vision and pattern recognition | 2014

Topic Modeling of Multimodal Data: An Autoregressive Approach

Yin Zheng; Yu-Jin Zhang; Hugo Larochelle

Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the hidden topic features by incorporating label information into the training objective of the model and show how to employ SupDocNADE to learn a joint representation from image visual words, annotation words and class label information. We also describe how to leverage information about the spatial position of the visual words for SupDocNADE to achieve better performance in a simple, yet effective manner. We test our model on the LabelMe and UIUC-Sports datasets and show that it compares favorably to other topic models such as the supervised variant of LDA and a Spatial Matching Pyramid (SPM) approach.

Neural Computation | 2006

Nonlocal Estimation of Manifold Structure

Yoshua Bengio; Martin Monperrus; Hugo Larochelle

We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation invites an exploration of nonlocal manifold learning algorithms that attempt to discover shared structure in the tangent planes at different positions. A training criterion for such an algorithm is proposed, and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where local nonparametric methods fail.

Explore More