Louis-Philippe Morency

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Louis-Philippe Morency is active.

Explore More

Publication

Featured researches published by Louis-Philippe Morency.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

Hidden Conditional Random Fields

Ariadna Quattoni; Sy Bor Wang; Louis-Philippe Morency; Michael Collins; Trevor Darrell

We present a discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations. A hidden-state conditional random field framework learns a set of latent variables conditioned on local features. Observations need not be independent and may overlap in space and time.

computer vision and pattern recognition | 2006

Hidden Conditional Random Fields for Gesture Recognition

Sy Bor Wang; Ariadna Quattoni; Louis-Philippe Morency; David Demirdjian; Trevor Darrell

We introduce a discriminative hidden-state approach for the recognition of human gestures. Gesture sequences often have a complex underlying structure, and models that can incorporate hidden structures have proven to be advantageous for recognition tasks. Most existing approaches to gesture recognition with hidden states employ a Hidden Markov Model or suitable variant (e.g., a factored or coupled state model) to model gesture streams; a significant limitation of these models is the requirement of conditional independence of observations. In addition, hidden states in a generative model are selected to maximize the likelihood of generating all the examples of a given gesture class, which is not necessarily optimal for discriminating the gesture class against other gestures. Previous discriminative approaches to gesture sequence recognition have shown promising results, but have not incorporated hidden states nor addressed the problem of predicting the label of an entire sequence. In this paper, we derive a discriminative sequence model with a hidden state structure, and demonstrate its utility both in a detection and in a multi-way classification formulation. We evaluate our method on the task of recognizing human arm and head gestures, and compare the performance of our method to both generative hidden state and discriminative fully-observable models.

computer vision and pattern recognition | 2007

Latent-Dynamic Discriminative Models for Continuous Gesture Recognition

Louis-Philippe Morency; Ariadna Quattoni; Trevor Darrell

Many problems in vision involve the prediction of a class label for each frame in an unsegmented sequence. In this paper, we develop a discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics. Our approach incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels. Each class label has a disjoint set of associated hidden states, which enables efficient training and inference in our model. We evaluated our method on the task of recognizing human gestures from unsegmented video streams and performed experiments on three different datasets of head and eye gestures. Our results demonstrate that our model compares favorably to Support Vector Machines, Hidden Markov Models, and Conditional Random Fields on visual gesture recognition tasks.

workshop on applications of computer vision | 2016

OpenFace: An open source facial behavior analysis toolkit

Tadas Baltrusaitis; Peter Robinson; Louis-Philippe Morency

Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace - an open source tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace is the first open source tool capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. The computer vision algorithms which represent the core of OpenFace demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. Finally, OpenFace allows for easy integration with other applications and devices through a lightweight messaging system.

computer vision and pattern recognition | 2012

3D Constrained Local Model for rigid and non-rigid facial tracking

Tadas Baltrusaitis; Peter Robinson; Louis-Philippe Morency

We present 3D Constrained Local Model (CLM-Z) for robust facial feature tracking under varying pose. Our approach integrates both depth and intensity information in a common framework. We show the benefit of our CLM-Z method in both accuracy and convergence rates over regular CLM formulation through experiments on publicly available datasets. Additionally, we demonstrate a way to combine a rigid head pose tracker with CLM-Z that benefits rigid head tracking. We show better performance than the current state-of-the-art approaches in head pose tracking with our extension of the generalised adaptive view-based appearance model (GAVAM).

intelligent virtual agents | 2006

Virtual rapport

Jonathan Gratch; Anna Okhmatovskaia; Francois Lamothe; Stacy Marsella; Mathieu Morales; R. J. van der Werf; Louis-Philippe Morency

Effective face-to-face conversations are highly interactive. Participants respond to each other, engaging in nonconscious behavioral mimicry and backchanneling feedback. Such behaviors produce a subjective sense of rapport and are correlated with effective communication, greater liking and trust, and greater influence between participants. Creating rapport requires a tight sense-act loop that has been traditionally lacking in embodied conversational agents. Here we describe a system, based on psycholinguistic theory, designed to create a sense of rapport between a human speaker and virtual human listener. We provide empirical evidence that it increases speaker fluency and engagement.

international conference on multimodal interfaces | 2005

Contextual recognition of head gestures

Louis-Philippe Morency; Candace L. Sidner; Christopher Lee; Trevor Darrell

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. We investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of user gestures. We present a recognition framework which (1) extracts contextual features from an ECAs dialog manager, (2) computes a prediction of head nod and head shakes, and (3) integrates the contextual predictions with the visual observation of a vision-based head gesture recognizer. We found a subset of lexical, punctuation and timing features that are easily available in most ECA architectures and can be used to learn how to predict user feedback. Using a discriminative approach to contextual prediction and multi-modal integration, we were able to improve the performance of head gesture detection even when the topic of the test set was significantly different than the training set.

computer vision and pattern recognition | 2003

Adaptive view-based appearance models

Louis-Philippe Morency; Ali Rahimi; Trevor Darrell

We present a method for online rigid object tracking using an adaptive view-based appearance model. When the objects pose trajectory crosses itself, our tracker has bounded drift and can track objects undergoing large motion for long periods of time. Our tracker registers each incoming frame against the views of the appearance model using a two-frame registration algorithm. Using a linear Gaussian filter, we simultaneously estimate the pose of the object and adjust the view-based model as pose-changes are recovered from the registration algorithm. The adaptive view-based model is populated online with views of the object as it undergoes different orientations in pose space, allowing us to capture non-Lambertian effects. We tested our approach on a real-time rigid object tracking task using stereo cameras and observed an RMS error within the accuracy limit of an attached inertial sensor.

international conference on computer vision | 2013

Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild

Tadas Baltrusaitis; Peter Robinson; Louis-Philippe Morency

Facial feature detection algorithms have seen great progress over the recent years. However, they still struggle in poor lighting conditions and in the presence of extreme pose or occlusions. We present the Constrained Local Neural Field model for facial landmark detection. Our model includes two main novelties. First, we introduce a probabilistic patch expert (landmark detector) that can learn non-linear and spatial relationships between the input pixels and the probability of a landmark being aligned. Secondly, our model is optimised using a novel Non-uniform Regularised Landmark Mean-Shift optimisation technique, which takes into account the reliabilities of each patch expert. We demonstrate the benefit of our approach on a number of publicly available datasets over other state-of-the-art approaches when performing landmark detection in unseen lighting conditions and in the wild.

international conference on multimodal interfaces | 2011

Towards multimodal sentiment analysis: harvesting opinions from the web

Louis-Philippe Morency; Rada Mihalcea; Payal Doshi

With more than 10,000 new videos posted online every day on social websites such as YouTube and Facebook, the internet is becoming an almost infinite source of information. One crucial challenge for the coming decade is to be able to harvest relevant information from this constant flow of multimodal data. This paper addresses the task of multimodal sentiment analysis, and conducts proof-of-concept experiments that demonstrate that a joint model that integrates visual, audio, and textual features can be effectively used to identify sentiment in Web videos. This paper makes three important contributions. First, it addresses for the first time the task of tri-modal sentiment analysis, and shows that it is a feasible task that can benefit from the joint exploitation of visual, audio and textual modalities. Second, it identifies a subset of audio-visual features relevant to sentiment analysis and present guidelines on how to integrate these features. Finally, it introduces a new dataset consisting of real online data, which will be useful for future research in this area.

Explore More