Is this you? Create Your Porfile

Marie Tahon

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marie Tahon is active.

Explore More

Publication

Featured researches published by Marie Tahon.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Towards a small set of robust acoustic features for emotion recognition: challenges

Marie Tahon; Laurence Devillers

The search of a small acoustic feature set for emotion recognition faces three main challenges. Such a feature set must be robust to large diversity of contexts in real-life applications; model parameters must also be optimized for reduced subsets; finally, the result of feature selection must be evaluated in cross-corpus condition. The goal of the present study is to select a consensual set of acoustic features for valence recognition using classification and non-classification based feature ranking and cross-corpus experiments, and to optimize emotional models simultaneously. Five realistic corpora are used in this study: three of them were collected in the framework of the French project on robotics ROMEO, one is a game corpus (JEMO) and one is the well-known AIBO corpus. Combinations of features found with non-classification based methods (information gain and Gaussian mixture models with Bhattacharyya distance) through multi-corpora experiments are tested under cross-corpus conditions, simultaneously with SVM parameters optimization. Reducing the number of features goes in pair with optimizing model parameters. Experiments carried on randomly selected features from two acoustic feature sets show that a feature space reduction is needed to avoid over-fitting. Since a Grid search tends to find non-standard values with small feature sets, the authors propose a multi-corpus optimization method based on different corpora and acoustic feature subsets which ensures more stability. The results show that acoustic families selected with both feature ranking methods are not relevant in cross-corpus experiments. Promising results have been obtained with a small set of 24 voiced cepstral coefficients while this family was ranked in the 2nd and 5th positions with both ranking methods. The proposed optimization method is more robust than the usual Grid search for cross-corpus experiments with small feature sets.

Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots | 2009

A Wizard-of-Oz game for collecting emotional audio data in a children-robot interaction

Agnes Delaborde; Marie Tahon; Claude Barras; Laurence Devillers

In this paper, we focus on the recording protocol for gathering emotional audio data during interactions between the Nao robot and children. The robot is operated by a Wizard-of-Oz, according to strategies meant to elicit vocal expressions of emotions in children. These recordings will provide data to develop a real-time emotion detection module for the robot, and will be a starting point for a study on emotional models of exchanges in the interaction with a robot. This work is carried out in the context of the French ROMEO project meant to design a robot which will be able to interact with people (children, elder people), taking into account the behavior of the person. Two kinds of application are studied in this project: the robot acts as a game companion or as an assistant to disabled persons.

conference of the international speech communication association | 2016

Improving TTS with Corpus-Specific Pronunciation Adaptation.

Marie Tahon; Raheel Qader; Gwénolé Lecorvé; Damien Lolive

Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem , the present work aims at adapting automatically generated pronunciations to the corpus. The main idea is to train corpus-specific phoneme-to-phoneme conditional random fields with a large set of linguistic, phonological, articulatory and acoustic-prosodic features. Features are first selected in cross-validation condition, then combined to produce the final best feature set. Pronunciation models are evaluated in terms of phoneme error rate and through perceptual tests. Experiments carried out on a French speech corpus show an improvement in the quality of speech synthesis when pronunciation models are included in the phonetization process. Appart from improving TTS quality, the presented pronunciation adaptation method also brings interesting perspectives in terms of expressive speech synthesis.

international conference on social robotics | 2015

Cross-Corpus Experiments on Laughter and Emotion Detection in HRI with Elderly People

Marie Tahon; Mohamed A. Sehili; Laurence Devillers

Social Signal Processing such as laughter or emotion detection is a very important issue, particularly in the field of human-robot interaction (HRI). At the moment, very few studies exist on elderly-people’s voices and social markers in real-life HRI situations. This paper presents a cross-corpus study with two realistic corpora featuring elderly people (ROMEO2 and ARMEN) and two corpora collected in laboratory conditions with young adults (JEMO and OFFICE). The goal of this experiment is to assess how good data from one given corpus can be used as a training set for another corpus, with a specific focus on elderly people voices. First, clear differences between elderly people real-life data and young adults laboratory data are shown on acoustic feature distributions (such as \(F_0\) standard deviation or local jitter). Second, cross-corpus emotion recognition experiments show that elderly people real-life corpora are much more complex than laboratory corpora. Surprisingly, modeling emotions with an elderly people corpus do not generalize to another elderly people corpus collected in the same acoustic conditions but with different speakers. Our last result is that laboratory laughter is quite homogeneous across corpora but this is not the case for elderly people real-life laughter.

Speech prosody | 2018

Discourse phrases classification: direct vs. narrative audio speech

Marie Tahon; Damien Lolive

In the field of storytelling, speech synthesis is trying to move from a neutral machine-like to an expressive voice. For para-metric and unit-selection systems, building new features or cost functions is necessary to allow a better expressivity control. The present article investigates the classification task between direct and narrative discourse phrases to build a new expressivity score. Different models are trained on different speech units (syllable, word and discourse phrases) from an audiobook with 3 sets of features. Classification experiments are conducted on the Blizzard corpus which features children English audio-books and contains various characters and emotional states. The experiments show that the fusion of SVM classifiers trained with different prosodic and phonologic feature sets increases the classification rate from 67.4% with 14 prosodic features to 71.8% with the 3 merged sets. Also the addition of a decision threshold achieves promising results for expressive speech synthesis according to the strength of the constraint required on expressivity: 71.8% with 100% of the words, 79.9% with 50% and 82.6% with 25%.

conference of the international speech communication association | 2011

Real-life Emotion Detection from Speech in Human-Robot Interaction: Experiments across Diverse Corpora with Child and Adult Voices

Marie Tahon; Agnes Delaborde; Laurence Devillers

International Journal of Social Robotics | 2015

Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions

Laurence Devillers; Marie Tahon; Mohamed A. Sehili; Agnes Delaborde

affective computing and intelligent interaction | 2013

Multimodal Expressions of Stress during a Public Speaking Task: Collection, Annotation and Global Analyses

Tom Giraud; Mariette Soury; Jiewen Hua; Agnes Delaborde; Marie Tahon; David Antonio Gómez Jáuregui; Victoria Eyharabide; Edith Filaire; Christine Le Scanff; Laurence Devillers; Brice Isableu; Jean-Claude Martin

language resources and evaluation | 2012