Is this you? Create Your Porfile

Chris Pal

École Polytechnique de Montréal

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chris Pal is active.

Explore More

Publication

Featured researches published by Chris Pal.

computer vision and pattern recognition | 2007

Learning Conditional Random Fields for Stereo

Daniel Scharstein; Chris Pal

State-of-the-art stereo vision algorithms utilize color changes as important cues for object boundaries. Most methods impose heuristic restrictions or priors on disparities, for example by modulating local smoothness costs with intensity gradients. In this paper we seek to replace such heuristics with explicit probabilistic models of disparities and intensities learned from real images. We have constructed a large number of stereo datasets with ground-truth disparities, and we use a subset of these datasets to learn the parameters of conditional random fields (CRFs). We present experimental results illustrating the potential of our approach for automatically learning the parameters of models with richer structure than standard hand-tuned MRF models.

international conference on computer vision | 2009

Activity recognition using the velocity histories of tracked keypoints

Ross Messing; Chris Pal; Henry A. Kautz

We present an activity recognition feature inspired by human psychophysical performance. This feature is based on the velocity history of tracked keypoints. We present a generative mixture model for video sequences using this feature, and show that it performs comparably to local spatio-temporal features on the KTH activity recognition dataset. In addition, we contribute a new activity recognition dataset, focusing on activities of daily living, with high resolution video sequences of complex actions. We demonstrate the superiority of our velocity history feature on high resolution video sequences of complicated activities. Further, we show how the velocity history feature can be extended, both with a more sophisticated latent velocity model, and by combining the velocity history feature with other useful information, like appearance, position, and high level semantic information. Our approach performs comparably to established and state of the art methods on the KTH dataset, and significantly outperforms all other methods on our challenging new dataset.

international conference on computer vision | 2015

Describing Videos by Exploiting Temporal Structure

Li Yao; Atousa Torabi; Kyunghyun Cho; Nicolas Ballas; Chris Pal; Hugo Larochelle; Aaron C. Courville

Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description model. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions.

International Journal of Computer Vision | 2012

On Learning Conditional Random Fields for Stereo

Chris Pal; Jerod J. Weinman; Lam C. Tran; Daniel Scharstein

Until recently, the lack of ground truth data has hindered the application of discriminative structured prediction techniques to the stereo problem. In this paper we use ground truth data sets that we have recently constructed to explore different model structures and parameter learning techniques. To estimate parameters in Markov random fields (MRFs) via maximum likelihood one usually needs to perform approximate probabilistic inference. Conditional random fields (CRFs) are discriminative versions of traditional MRFs. We explore a number of novel CRF model structures including a CRF for stereo matching with an explicit occlusion model. CRFs require expensive inference steps for each iteration of optimization and inference is particularly slow when there are many discrete states. We explore belief propagation, variational message passing and graph cuts as inference methods during learning and compare with learning via pseudolikelihood. To accelerate approximate inference we have developed a new method called sparse variational message passing which can reduce inference time by an order of magnitude with negligible loss in quality. Learning using sparse variational message passing improves upon previous approaches using graph cuts and allows efficient learning over large data sets when energy functions violate the constraints imposed by graph cuts.

international conference on computer graphics and interactive techniques | 2005

Panoramic video textures

Aseem Agarwala; Ke Colin Zheng; Chris Pal; Maneesh Agrawala; Michael F. Cohen; Brian Curless; David Salesin; Richard Szeliski

This paper describes a mostly automatic method for taking the output of a single panning video camera and creating a panoramic video texture (PVT): a video that has been stitched into a single, wide field of view and that appears to play continuously and indefinitely. The key problem in creating a PVT is that although only a portion of the scene has been imaged at any given time, the output must simultaneously portray motion throughout the scene. Like previous work in video textures, our method employs min-cut optimization to select fragments of video that can be stitched together both spatially and temporally. However, it differs from earlier work in that the optimization must take place over a much larger set of data. Thus, to create PVTs, we introduce a dynamic programming step, followed by a novel hierarchical min-cut optimization algorithm. We also use gradient-domain compositing to further smooth boundaries between video fragments. We demonstrate our results with an interactive viewer in which users can interactively pan and zoom on high-resolution PVTs.

international conference on multimodal interfaces | 2013

Combining modality specific deep neural networks for emotion recognition in video

Samira Ebrahimi Kahou; Chris Pal; Xavier Bouthillier; Pierre Froumenty; Caglar Gulcehre; Roland Memisevic; Pascal Vincent; Aaron C. Courville; Yoshua Bengio; Raul Chandias Ferrari; Mehdi Mirza; Sébastien Jean; Pierre-Luc Carrier; Yann N. Dauphin; Nicolas Boulanger-Lewandowski; Abhishek Aggarwal; Jeremie Zumer; Pascal Lamblin; Jean-Philippe Raymond; Guillaume Desjardins; Razvan Pascanu; David Warde-Farley; Atousa Torabi; Arjun Sharma; Emmanuel Bengio; Myriam Côté; Kishore Reddy Konda; Zhenzhou Wu

In this paper we present the techniques used for the University of Montréals team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions expressed by the primary human subject in short video clips extracted from feature length movies. This involves the analysis of video clips of acted scenes lasting approximately one-two seconds, including the audio track which may contain human voices as well as background music. Our approach combines multiple deep neural networks for different data modalities, including: (1) a deep convolutional neural network for the analysis of facial expressions within video frames; (2) a deep belief net to capture audio information; (3) a deep autoencoder to model the spatio-temporal information produced by the human actions depicted within the entire scene; and (4) a shallow network architecture focused on extracted features of the mouth of the primary human subject in the scene. We discuss each of these techniques, their performance characteristics and different strategies to aggregate their predictions. Our best single model was a convolutional neural network trained to predict emotions from static frames using two large data sets, the Toronto Face Database and our own set of faces images harvested from Google image search, followed by a per frame aggregation strategy that used the challenge training data. This yielded a test set accuracy of 35.58%. Using our best strategy for aggregating our top performing models into a single predictor we were able to produce an accuracy of 41.03% on the challenge test set. These compare favorably to the challenge baseline test set accuracy of 27.56%.

human factors in computing systems | 1997

A two-ball mouse affords three degrees of freedom

I. Scott MacKenzie; R. William Soukoreff; Chris Pal

We describe a prototype two-ball mouse containing the electronics and mechanics of two mice in a single chassis. Unlike a conventional mouse, which senses x-axis and y-axis displacement only, our mouse also senses z-axis angular motion. This is accomplished through simple calculations on the two sets of x-y displacement data. Our mouse looks and feels like a standard mouse, however certain primitive operations are performed with much greater ease. The rotate tool -- common in most drawing programs -- becomes redundant as objects are easily moved with three degrees of freedom. Mechanisms to engage the added degree of freedom and different interaction techniques are discussed.

international conference on computer vision | 2005

Efficiently registering video into panoramic mosaics

Drew Steedly; Chris Pal; Richard Szeliski

We present an automatic and efficient method to register and stitch thousands of video frames into a large panoramic mosaic. Our method preserves the robustness and accuracy of image stitchers that match all pairs of images while utilizing the ordering information provided by video. We reduce the cost of searching for matches between video frames by adaptively identifying key frames based on the amount of image-to-image overlap. Key frames are matched to all other key frames, but intermediate video frames are only matched to temporally neighboring key frames and intermediate frames. Image orientations can be estimated from this sparse set of matches in time quadratic to cubic in the number of key frames but only linear in the number of intermediate frames. Additionally, the matches between pairs of images are compressed by replacing measurements within small windows in the image with a single representative measurement. We show that this approach substantially reduces the time required to estimate the image orientations with minimal loss of accuracy. Finally, we demonstrate both the efficiency and quality of our results by registering several long video sequences

international conference on multimodal interfaces | 2015

Recurrent Neural Networks for Emotion Recognition in Video

Samira Ebrahimi Kahou; Vincent Michalski; Kishore Reddy Konda; Roland Memisevic; Chris Pal

Deep learning based approaches to facial analysis and video analysis have recently demonstrated high performance on a variety of key tasks such as face recognition, emotion recognition and activity recognition. In the case of video, information often must be aggregated across a variable length sequence of frames to produce a classification result. Prior work using convolutional neural networks (CNNs) for emotion recognition in video has relied on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial aggregation of information. Recurrent neural networks (RNNs) have seen an explosion of recent interest as they yield state-of-the-art performance on a variety of sequence analysis tasks. RNNs provide an attractive framework for propagating information over a sequence using a continuous valued hidden layer representation. In this work we present a complete system for the 2015 Emotion Recognition in the Wild (EmotiW) Challenge. We focus our presentation and experimental analysis on a hybrid CNN-RNN architecture for facial expression analysis that can outperform a previously applied CNN approach using temporal averaging for aggregation.

knowledge discovery and data mining | 2007

Semi-supervised classification with hybrid generative/discriminative methods

Gregory Druck; Chris Pal; Andrew McCallum; Xiaojin Zhu

We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semi-supervised classification. In both cases we explore the tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labeled and unlabeled data. While prominent semi-supervised learning methods assume low density regions between classes or are subject to generative modeling assumptions, we conjecture that hybrid generative/discriminative methods allow semi-supervised learning in the presence of strongly overlapping classes and reduce the risk of modeling structure in the unlabeled data that is irrelevant for the specific classification task of interest. We apply both hybrid approaches within naively structured Markov random field models and provide a thorough empirical comparison with two well-known semi-supervised learning methods on six text classification tasks. A semi-supervised hybrid generative/discriminative method provides the best accuracy in 75% of the experiments, and the multi-conditional learning hybrid approach achieves the highest overall mean accuracy across all tasks.

Explore More