Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Christophe Savariaux is active.

Publication


Featured researches published by Christophe Savariaux.


Cognition | 2004

Seeing to hear better: evidence for early audio-visual interactions in speech identification.

Jean-Luc Schwartz; Frédéric Berthommier; Christophe Savariaux

Lip reading is the ability to partially understand speech by looking at the speakers lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speakers lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speakers lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.


Journal of Phonetics | 2002

Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images.

Pierre Badin; Gérard Bailly; Lionel Revéret; Monica Baciu; Christoph Segebarth; Christophe Savariaux

In this study, previous articulatory midsagittal models of tongue and lips are extended to full three-dimensional models. The geometry of these vocal organs is measured on one subject uttering a corpus of sustained articulations in French. The 3D data are obtained from magnetic resonance imaging of the tongue, and from front and profile video images of the subjects face marked with small beads. The degrees of freedom of the articulators, i.e., the uncorrelated linear components needed to represent the 3D coordinates of these articulators, are extracted by linear component analysis from these data. In addition to a common jaw height parameter, the tongue is controlled by four parameters while the lips and face are also driven by four parameters. These parameters are for the most part extracted from the midsagittal contours, and are clearlyinterpretable in phonetic/biomechanical terms. This implies that most 3D features such as tongue groove or lateral channels can be controlled by articulatory parameters defined for the midsagittal model. Similarly, the 3D geometry of the lips is determined by parameters such as lip protrusion or aperture, that can be measured from a profile view of the face.


Journal of the Acoustical Society of America | 1995

Compensation strategies for the perturbation of the rounded vowel [u] using a lip tube: A study of the control space in speech production

Christophe Savariaux; Pascal Perrier; Jean Pierre Orliaguet

A labial perturbation of the French rounded vowel [u] was used to examine the respective weights of the articulatory and acoustic levels in the control of vowel production. A 20-mm diameter lip-tube was inserted between the lips of the speakers. Acoustic and X-ray articulatory data were obtained for isolated vowel productions by eleven native French speakers in normal and lip-tube conditions. Compensation abilities were evaluated through accuracy of the F1-F2 pattern. Possible compensation were examined from nomograms using the new model of Fant (Fant, 1992). Acoustic interpretations of the articulatory changes were made by generating area functions from mid-sagittal views, used together with harmonic acoustic model. For the first perturbed trial, immediately after the insertion of the tube, no speaker was able to produce a complete compensation. But clear differences between speakers were observed : seven of them moved the tongue and hence limited the deterioration of the F1-F2 pattern, whereas the remaining four did not show any pertinent articulatory change. These data support the idea of speaker-specific internal representations of the articulatory-to-acoustic relationships. The results for the following 19 pertubed trials indicate that speakers used the acoustic signal in order to elaborate an optimal compensation strategy. One speaker achieved complete compensation, by changing his constriction location from a velo-palatal to a velo-pharyngeal region of the vocal tract. Six others moved their tongues in the right direction, achieving partial acoustic compensation, while the remaining four did not compensate. The control of speech production thus seems to be directed towards achieving an auditory goal, but completely achieving the goal may be impossible because of speaker-dependent articulatory constraints. It is suggested that these constraints are due more to speaker-specific internal representation of articulatory-to-acoustic relationships rather than to an


PLOS Computational Biology | 2014

No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag

Jean-Luc Schwartz; Christophe Savariaux

An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call “preparatory gestures”. However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call “comodulatory gestures” providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.


Speech Communication | 2010

The word superiority effect in audiovisual speech perception

Mathilde Fort; Elsa Spinelli; Christophe Savariaux; Sonia Kandel

Seeing the facial gestures of a speaker enhances phonemic identification in noise. The goal of this study was to assess whether the visual information regarding consonant articulation activates lexical representations. We conducted a phoneme monitoring task with word and pseudo-words in audio only (A) and audiovisual (AV) contexts with two levels of white noise masking the acoustic signal. The results confirmed that visual information enhances consonant detection in noisy conditions and also revealed that it accelerates the phoneme detection process. The consonants were detected faster in AV than in A only condition. Furthermore, when the acoustic signal was deteriorated, the consonant phonemes were better recognized when they were embedded in words rather than in pseudo-words in the AV condition. This provides evidence indicating that visual information on phoneme identity can contribute to lexical activation processes during word recognition.


Journal of the Acoustical Society of America | 2008

Compensation strategies for a lip-tube perturbation of French [u]: An acoustic and perceptual study of 4-year-old children

Lucie Ménard; Pascal Perrier; Jérôme Aubin; Christophe Savariaux; Mélanie Thibeault

The relations between production and perception in 4-year-old children were examined in a study of compensation strategies for a lip-tube perturbation. Acoustic and perceptual analyses of the rounded vowel [u] produced by twelve 4-year-old French speakers were conducted under two conditions: normal and with a 15-mm-diam tube inserted between the lips. Recordings of isolated vowels were made in the normal condition before any perturbation (N1), immediately upon insertion of the tube and for the next 19 trials in this perturbed condition, with (P2) or without articulatory instructions (P1), and in the normal condition after the perturbed trials (N2). The results of the acoustic analyses reveal speaker-dependent alterations of F1, F2, and/or F0 in the perturbed conditions and after the removal of the tube. For some subjects, the presence of the tube resulted in very little change; for others, an increase in F2 was observed in P1, which was generally reduced in some of the 20 repetitions, but not systematically and not continuously. The use of articulatory instructions provided in the P2 condition was detrimental to the achievement of a good acoustic target. Perceptual data are used to determine optimal combinations of F0, F1, and F2 (in bark) related to these patterns. The data are compared to a previous study conducted with adults [Savariaux et al., J. Acoust. Soc. Am. 106, 381-393 (1999)].


Language and Cognitive Processes | 2013

Seeing the initial articulatory gestures of a word triggers lexical access

Mathilde Fort; Sonia Kandel; Justine Chipot; Christophe Savariaux; Lionel Granjon; Elsa Spinelli

When the auditory information is deteriorated by noise in a conversation, watching the face of a speaker enhances speech intelligibility. Recent findings indicate that decoding the facial movements of a speaker accelerates word recognition. The objective of this study was to provide evidence that the mere presentation of the first two phonemes—that is, the articulatory gestures of the initial syllable—is enough visual information to activate a lexical unit and initiate the lexical access process. We used a priming paradigm combined with a lexical decision task. The primes were syllables that either shared the initial syllable with an auditory target or not. In Experiment 1, the primes were displayed in audiovisual, auditory-only or visual-only conditions. There was a priming effect in all conditions. Experiment 2 investigated the locus (prelexical vs. lexical or postlexical) of the facilitation effect observed in the visual-only condition by manipulating the targets word frequency. The facilitation produced by the visual prime was significant for low-frequency words but not for high-frequency words, indicating that the locus of the effect is not prelexical. This suggests that visual speech mostly contributes to the word recognition process when lexical access is difficult.


International Journal of Behavioral Development | 2012

Audiovisual vowel monitoring and the word superiority effect in children

Mathilde Fort; Elsa Spinelli; Christophe Savariaux; Sonia Kandel

The goal of this study was to explore whether viewing the speaker’s articulatory gestures contributes to lexical access in children (ages 5–10) and in adults. We conducted a vowel monitoring task with words and pseudo-words in audio-only (AO) and audiovisual (AV) contexts with white noise masking the acoustic signal. The results indicated that children clearly benefited from visual speech from age 6–7 onwards. However, unlike adults, the word superiority effect was not greater in the AV than the AO condition in children, suggesting that visual speech mostly contributes to phonemic—rather than lexical—processing during childhood, at least until the age of 10.


PLOS Computational Biology | 2016

Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

Florent Bocquelet; Thomas Hueber; Laurent Girin; Christophe Savariaux; Blaise Yvert

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.


Clinical Linguistics & Phonetics | 2014

Speech production after glossectomy: Methodological aspects

Audrey Acher; Pascal Perrier; Christophe Savariaux; Cécile Fougeron

Abstract This article focuses on methodological issues related to quantitative assessments of speech quality after glossectomy. Acoustic and articulatory data were collected for 8 consonants from two patients. The acoustic analysis is based on spectral moments and the Klatt VOT. Lingual movements are recorded with ultrasound without calibration. The variations of acoustic and articulatory parameters across pre- and post-surgery conditions are analyzed in the light of perceptual evaluations of the stimuli. A parameter is considered to be relevant if its variation is congruent with perceptual ratings. The most relevant acoustic parameters are the skewness and the Center of Gravity. The Klatt VOT explains differences that could not be explained by spectral parameters. The SNTS ultrasound parameter provides information to describe impairments not accounted for by acoustical parameters. These results suggest that the combination of articulatory, perceptual and acoustic data provides comprehensive complementary information for a quantitative assessment of speech after glossectomy.

Collaboration


Dive into the Christophe Savariaux's collaboration.

Top Co-Authors

Avatar

Pascal Perrier

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Jean-Luc Schwartz

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Pierre Badin

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Gérard Bailly

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Sonia Kandel

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Marc Sato

University of Grenoble

View shared research outputs
Top Co-Authors

Avatar

Lucie Ménard

Université du Québec à Montréal

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge