Emmanuel Dellandréa
École centrale de Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emmanuel Dellandréa.
systems man and cybernetics | 2011
Xi Zhao; Emmanuel Dellandréa; Liming Chen; Ioannis A. Kakadiaris
Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
international conference on pattern recognition | 2010
Xi Zhao; Di Huang; Emmanuel Dellandréa; Liming Chen
Automatic facial expression recognition on 3D face data is still a challenging problem. In this paper we propose a novel approach to perform expression recognition automatically and flexibly by combining a Bayesian Belief Net (BBN) and Statistical facial feature models (SFAM). A novel BBN is designed for the specific problem with our proposed parameter computing method. By learning global variations in face landmark configuration (morphology) and local ones in terms of texture and shape around landmarks, morphable Statistic Facial feature Model (SFAM) allows not only to perform an automatic landmarking but also to compute the belief to feed the BBN. Tested on the public 3D face expression database BU-3DFE, our automatic approach allows to recognize expressions successfully, reaching an average recognition rate over 82%.
advanced video and signal based surveillance | 2005
Zhongzhe Xiao; Emmanuel Dellandréa; Weibei Dou; Liming Chen
The classification of emotional speech is a topic in speech recognition with more and more interest, and it has giant prospect in applications in a wide variety of fields. It is an important preparation for automatic classification and recognition of emotions to select a proper feature set as a description to the emotional speech, and to find a proper definition to the emotions in speech. The speech samples used in this paper come from Berlin database which contains 7 kinds of emotions, with 207 speech samples of male voice and 287 speech samples of female voice. A feature set of 50 potentially features is extracted and analyzed, and the best features are selected. A definition of emotions as 3-states emotions is also proposed in this paper.
affective computing and intelligent interaction | 2011
Ningning Liu; Emmanuel Dellandréa; Bruno Tellez; Liming Chen
Many images carry a strong emotional semantic. These last years, some investigations have been driven to automatically identify induced emotions that may arise in viewers when looking at images, based on low-level image properties. Since these features can only catch the image atmosphere, they may fail when the emotional semantic is carried by objects. Therefore additional information is needed, and we propose in this paper to make use of textual information describing the image, such as tags. Thus, we have developed two textual features to catch the text emotional meaning: one is based on the semantic distance matrix between the text and an emotional dictionary, and the other one carries the valence and arousal meanings of words. Experiments have been driven on two datasets to evaluate visual and textual features and their fusion. The results have shown that our textual features can improve the classification accuracy of affective images.
Computer Vision and Image Understanding | 2014
Christian Wolf; Eric Lombardi; Julien Mille; Oya Celiktutan; Mingyuan Jiu; Emre Dogan; Gonen Eren; Moez Baccouche; Emmanuel Dellandréa; Charles-Edmond Bichot; Christophe Garcia; Bülent Sankur
Evaluating the performance of computer vision algorithms is classically done by reporting classification error or accuracy, if the problem at hand is the classification of an object in an image, the recognition of an activity in a video or the categorization and labeling of the image or video. If in addition the detection of an item in an image or a video, and/or its localization are required, frequently used metrics are Recall and Precision, as well as ROC curves. These metrics give quantitative performance values which are easy to understand and to interpret even by non-experts. However, an inherent problem is the dependency of quantitative performance measures on the quality constraints that we need impose on the detection algorithm. In particular, an important quality parameter of these measures is the spatial or spatio-temporal overlap between a ground-truth item and a detected item, and this needs to be taken into account when interpreting the results. We propose a new performance metric addressing and unifying the qualitative and quantitative aspects of the performance measures. The performance of a detection and recognition algorithm is illustrated intuitively by performance graphs which present quantitative performance values, like Recall, Precision and F-Score, depending on quality constraints of the detection. In order to compare the performance of different computer vision algorithms, a representative single performance measure is computed from the graphs, by integrating out all quality parameters. The evaluation method can be applied to different types of activity detection and recognition algorithms. The performance metric has been tested on several activity recognition algorithms participating in the ICPR 2012 HARL competition.
Image and Vision Computing | 2013
Xi Zhao; Emmanuel Dellandréa; Jianhua Zou; Liming Chen
Textured 3D face models capture precise facial surfaces along with the associated textures, making it possible for an accurate description of facial activities. In this paper, we present a unified probabilistic framework based on a novel Bayesian Belief Network (BBN) for 3D facial expression and Action Unit (AU) recognition. The proposed BBN performs Bayesian inference based on Statistical Feature Models (SFM) and Gibbs-Boltzmann distribution and feature a hybrid approach in fusing both geometric and appearance features along with morphological ones. When combined with our previously developed morphable partial face model (SFAM), the proposed BBN has the capacity of conducting fully automatic facial expression analysis. We conducted extensive experiments on the two public databases, namely the BU-3DFE dataset and the Bosphorus dataset. When using manually labeled landmarks, the proposed framework achieved an average recognition rate of 94.2% and 85.6% for the 7 and 16AU on face data from the Bosphorus dataset respectively, and 89.2% for the six universal expressions on the BU-3DFE dataset. Using the landmarks automatically located by SFAM, the proposed BBN still achieved an average recognition rate of 84.9% for the six prototypical facial expressions. These experimental results demonstrate the effectiveness of the proposed approach and its robustness in landmark localization errors.
affective computing and intelligent interaction | 2013
Yoann Baveye; Jean-Noel Bettinelli; Emmanuel Dellandréa; Liming Chen; Christel Chamaret
To contribute to the need for emotional databases and affective tagging, the LIRIS-ACCEDE is proposed in this paper. LIRIS-ACCEDE is an Annotated Creative Commons Emotional DatabasE composed of 9800 video clips extracted from 160 movies shared under Creative Commons licenses. It allows to make this database publicly available without copyright issues. The 9800 video clips (each 8-12 seconds long) are sorted along the induced valence axis, from the video perceived the most negatively to the video perceived the most positively. The annotation was carried out by 1518 annotators from 89 different countries using crowd sourcing. A baseline late fusion scheme using ground truth from annotations is computed to predict emotion categories in video clips.
Computer Vision and Image Understanding | 2013
Ningning Liu; Emmanuel Dellandréa; Liming Chen; Chao Zhu; Yu Zhang; Charles-Edmond Bichot; Stéphane Bres; Bruno Tellez
The text associated with images provides valuable semantic meanings about image content that can hardly be described by low-level visual features. In this paper, we propose a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of textual features along with visual ones. In contrast to the classical Bag-of-Words approach which simply relies on term frequencies, we propose a novel textual descriptor, namely the Histogram of Textual Concepts (HTC), which accounts for the relatedness of semantic concepts in accumulating the contributions of words from the image caption toward a dictionary. In addition to the popular SIFT-like features, we also evaluate a set of mid-level visual features, aiming at characterizing the harmony, dynamism and aesthetic quality of visual content, in relationship with affective concepts. Finally, a novel selective weighted late fusion (SWLF) scheme is proposed to automatically select and weight the scores from the best features according to the concept to be classified. This scheme proves particularly useful for the image annotation task with a multi-label scenario. Extensive experiments were carried out on the MIR FLICKR image collection within the ImageCLEF 2011 photo annotation challenge. Our best model, which is a late fusion of textual and visual features, achieved a MiAP (Mean interpolated Average Precision) of 43.69% and ranked 2nd out of 79 runs. We also provide comprehensive analysis of the experimental results and give some insights for future improvements.
affective computing and intelligent interaction | 2009
Zhongzhe Xiao; Emmanuel Dellandréa; Liming Chen; Weibei Dou
This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multistage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.
affective computing and intelligent interaction | 2015
Yoann Baveye; Emmanuel Dellandréa; Christel Chamaret; Liming Chen
Recently, mainly due to the advances of deep learning, the performances in scene and object recognition have been progressing intensively. On the other hand, more subjective recognition tasks, such as emotion prediction, stagnate at moderate levels. In such context, is it possible to make affective computational models benefit from the breakthroughs in deep learning? This paper proposes to introduce the strength of deep learning in the context of emotion prediction in videos. The two main contributions are as follow: (i) a new dataset, composed of 30 movies under Creative Commons licenses, continuously annotated along the induced valence and arousal axes (publicly available) is introduced, for which (ii) the performance of the Convolutional Neural Networks (CNN) through supervised fine-tuning, the Support Vector Machines for Regression (SVR) and the combination of both (Transfer Learning) are computed and discussed. To the best of our knowledge, it is the first approach in the literature using CNNs to predict dimensional affective scores from videos. The experimental results show that the limited size of the dataset prevents the learning or finetuning of CNN-based frameworks but that transfer learning is a promising solution to improve the performance of affective movie content analysis frameworks as long as very large datasets annotated along affective dimensions are not available.