Marie Tahon
Centre national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marie Tahon.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Marie Tahon; Laurence Devillers
The search of a small acoustic feature set for emotion recognition faces three main challenges. Such a feature set must be robust to large diversity of contexts in real-life applications; model parameters must also be optimized for reduced subsets; finally, the result of feature selection must be evaluated in cross-corpus condition. The goal of the present study is to select a consensual set of acoustic features for valence recognition using classification and non-classification based feature ranking and cross-corpus experiments, and to optimize emotional models simultaneously. Five realistic corpora are used in this study: three of them were collected in the framework of the French project on robotics ROMEO, one is a game corpus (JEMO) and one is the well-known AIBO corpus. Combinations of features found with non-classification based methods (information gain and Gaussian mixture models with Bhattacharyya distance) through multi-corpora experiments are tested under cross-corpus conditions, simultaneously with SVM parameters optimization. Reducing the number of features goes in pair with optimizing model parameters. Experiments carried on randomly selected features from two acoustic feature sets show that a feature space reduction is needed to avoid over-fitting. Since a Grid search tends to find non-standard values with small feature sets, the authors propose a multi-corpus optimization method based on different corpora and acoustic feature subsets which ensures more stability. The results show that acoustic families selected with both feature ranking methods are not relevant in cross-corpus experiments. Promising results have been obtained with a small set of 24 voiced cepstral coefficients while this family was ranked in the 2nd and 5th positions with both ranking methods. The proposed optimization method is more robust than the usual Grid search for cross-corpus experiments with small feature sets.
Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots | 2009
Agnes Delaborde; Marie Tahon; Claude Barras; Laurence Devillers
In this paper, we focus on the recording protocol for gathering emotional audio data during interactions between the Nao robot and children. The robot is operated by a Wizard-of-Oz, according to strategies meant to elicit vocal expressions of emotions in children. These recordings will provide data to develop a real-time emotion detection module for the robot, and will be a starting point for a study on emotional models of exchanges in the interaction with a robot. This work is carried out in the context of the French ROMEO project meant to design a robot which will be able to interact with people (children, elder people), taking into account the behavior of the person. Two kinds of application are studied in this project: the robot acts as a game companion or as an assistant to disabled persons.
conference of the international speech communication association | 2016
Marie Tahon; Raheel Qader; Gwénolé Lecorvé; Damien Lolive
Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem , the present work aims at adapting automatically generated pronunciations to the corpus. The main idea is to train corpus-specific phoneme-to-phoneme conditional random fields with a large set of linguistic, phonological, articulatory and acoustic-prosodic features. Features are first selected in cross-validation condition, then combined to produce the final best feature set. Pronunciation models are evaluated in terms of phoneme error rate and through perceptual tests. Experiments carried out on a French speech corpus show an improvement in the quality of speech synthesis when pronunciation models are included in the phonetization process. Appart from improving TTS quality, the presented pronunciation adaptation method also brings interesting perspectives in terms of expressive speech synthesis.
international conference on social robotics | 2015
Marie Tahon; Mohamed A. Sehili; Laurence Devillers
Social Signal Processing such as laughter or emotion detection is a very important issue, particularly in the field of human-robot interaction (HRI). At the moment, very few studies exist on elderly-people’s voices and social markers in real-life HRI situations. This paper presents a cross-corpus study with two realistic corpora featuring elderly people (ROMEO2 and ARMEN) and two corpora collected in laboratory conditions with young adults (JEMO and OFFICE). The goal of this experiment is to assess how good data from one given corpus can be used as a training set for another corpus, with a specific focus on elderly people voices. First, clear differences between elderly people real-life data and young adults laboratory data are shown on acoustic feature distributions (such as \(F_0\) standard deviation or local jitter). Second, cross-corpus emotion recognition experiments show that elderly people real-life corpora are much more complex than laboratory corpora. Surprisingly, modeling emotions with an elderly people corpus do not generalize to another elderly people corpus collected in the same acoustic conditions but with different speakers. Our last result is that laboratory laughter is quite homogeneous across corpora but this is not the case for elderly people real-life laughter.
Speech prosody | 2018
Marie Tahon; Damien Lolive
In the field of storytelling, speech synthesis is trying to move from a neutral machine-like to an expressive voice. For para-metric and unit-selection systems, building new features or cost functions is necessary to allow a better expressivity control. The present article investigates the classification task between direct and narrative discourse phrases to build a new expressivity score. Different models are trained on different speech units (syllable, word and discourse phrases) from an audiobook with 3 sets of features. Classification experiments are conducted on the Blizzard corpus which features children English audio-books and contains various characters and emotional states. The experiments show that the fusion of SVM classifiers trained with different prosodic and phonologic feature sets increases the classification rate from 67.4% with 14 prosodic features to 71.8% with the 3 merged sets. Also the addition of a decision threshold achieves promising results for expressive speech synthesis according to the strength of the constraint required on expressivity: 71.8% with 100% of the words, 79.9% with 50% and 82.6% with 25%.
conference of the international speech communication association | 2011
Marie Tahon; Agnes Delaborde; Laurence Devillers
International Journal of Social Robotics | 2015
Laurence Devillers; Marie Tahon; Mohamed A. Sehili; Agnes Delaborde
affective computing and intelligent interaction | 2013
Tom Giraud; Mariette Soury; Jiewen Hua; Agnes Delaborde; Marie Tahon; David Antonio Gómez Jáuregui; Victoria Eyharabide; Edith Filaire; Christine Le Scanff; Laurence Devillers; Brice Isableu; Jean-Claude Martin
language resources and evaluation | 2012
Marie Tahon; Agnes Delaborde; Laurence Devillers
language resources and evaluation | 2018
Aghilas Sini; Damien Lolive; Gaëlle Vidal; Marie Tahon; Elisabeth Delais-Roussarie