Is this you? Create Your Porfile

Denis Beautemps

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Denis Beautemps is active.

Explore More

Publication

Featured researches published by Denis Beautemps.

Speech Communication | 1995

Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: a new model for vowels and fricative consonants based on experimental data

Denis Beautemps; Pierre Badin; Rafael Laboissière

In order to achieve better understanding of the articulatory-acoustic relationships, more data are still very much needed. The two-fold aim of the present study was thus (1) to provide a set of coherent midsagittal functions, area functions and formant frequencies, for a small corpus of vowels and fricative consonants produced by one subject, and (2) to derive a midsagittal profile to area function conversion model optimised for this given subject. Simultaneous tomography and sound recording were available for the subject, as well as some complementary data such as lip geometry or casts of the hard palate. The model is based on Heinz and Stevens A = ad area function model, modified so that (Y varies continuously along the vocal tract midline as a function of the midsagittal distance. The coefficients of the model have been determined with the help of an optimisation algorithm based on a gradient descent technique. The gradient of the error between actual and desired formant values was computed through a back-propagation network implementing both sagittal-to-area conversion and acoustic wave propagation. The fact that the model should work for sounds as different as vowels and consonants and be coherent at both midsagittal and acoustic levels ensures the reliability of the area functions determined in such a way.

Speech Communication | 2004

A pilot study of temporal organization in Cued Speech production of French syllables: rules for a Cued Speech synthesizer

Virginie Attina; Denis Beautemps; Marie-Agnès Cathiard; Matthias Odisio

Abstract This study investigated the temporal coordination of the articulators involved in French Cued Speech. Cued Speech is a manual complement to lipreading. It uses handshapes and hand placements to disambiguate series of CV syllables. Hand movements, lip gestures and acoustic data were collected from a speaker certified in manual Cued Speech uttering and coding CV sequences. Experiment I studied hand placement in relation to lip gestures and the corresponding sound. The results show that the hand movement begins up to 239xa0ms before the acoustic onset of the CV syllable. The target position is reached during the consonant, well before the vowel lip target. Experiment II used a data glove to collect finger gesture. It was designed to investigate handshape formation relatively to lip gestures and the corresponding acoustic signal. The results show that the handshape formation gesture takes a large part of the hand transition. Both experiments therefore reveal the anticipatory gesture of the hand motion over the lips. The types of control for vocalic and consonantal information transmitted by the hand are discussed in reference to speech coarticulation. Finally the temporal coordination observed between Cued Speech articulators and the corresponding sound was used as rules to control an audiovisual system delivering Cued Speech for French CV syllables.

Journal of Phonetics | 1995

Recovery of vocal tract geometry fromformants for vowels and fricative consonants using a midsagittal-to-area function conversion model

Pierre Badin; Denis Beautemps; Rafael Laboissière; Jean-Luc Schwartz

This study deals with the ill-posed problem of inversion of the articulatory-to-acoustic relationship, i.e. the recovery of the vocal tract geometry from formant frequencies. A small database of articulatory-acoustic data has been established for one subject. A midsagittal-to-area function conversion model, which works both for vowel and fricative consonants, has been developed from these data. This model has finally been used as a major constraint for an optimization algorithm based on a gradient descente technique, in order to regularize the ill-posed inversion problem. Other spatial and temporal smooting constraints have been also used. Single vocal tract configurations, as well as entire [VC] sequences, could be recovered using adequate initial conditions

GW'05 Proceedings of the 6th international conference on Gesture in Human-Computer Interaction and Simulation | 2005

Temporal measures of hand and speech coordination during french cued speech production

Virginie Attina; Marie-Agnès Cathiard; Denis Beautemps

Cued Speech is an efficient method that allows orally educated deaf people to perceive a complete oral message through the visual channel. Using this system, speakers can clarify what they say with the complement of hand cues near the face; similar lip shapes are disambiguated by the addition of a manual cue. In this context, Cued Speech represents a unique system that closely links hand movements and speech since it is based on spoken language. In a previous study, we investigated the temporal organization of French Cued Speech production for a single cueing talker. A specific pattern of coordination was found: the hand anticipates the lips and speech sounds. In the present study, we investigated the cueing behavior of three additional professional cueing talkers. The same pattern of hand cues anticipation was found. Results are discussed with respect to inter-subject variability. A general pattern of coordination is proposed.

Speech Communication | 2010

Cued Speech automatic recognition in normal-hearing and deaf subjects

Panikos Heracleous; Denis Beautemps; Noureddine Aboutabit

This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the sounds of a spoken language clearly understandable to deaf people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand spoken language completely. In the current study, the authors demonstrate that visible gestures are as discriminant as audible orofacial gestures. Phoneme recognition and isolated word recognition experiments have been conducted using data from a normal-hearing cuer. The results obtained were very promising, and the study has been extended by applying the proposed methods to a deaf cuer. The achieved results have not shown any significant differences compared to automatic Cued Speech recognition in a normal-hearing subject. In automatic recognition of Cued Speech, lip shape and gesture recognition are required. Moreover, the integration of the two modalities is of great importance. In this study, lip shape component is fused with hand component to realize Cued Speech recognition. Using concatenative feature fusion and multi-stream HMM decision fusion, vowel recognition, consonant recognition, and isolated word recognition experiments have been conducted. For vowel recognition, an 87.6% vowel accuracy was obtained showing a 61.3% relative improvement compared to the sole use of lip shape parameters. In the case of consonant recognition, a 78.9% accuracy was obtained showing a 56% relative improvement compared to the use of lip shape only. In addition to vowel and consonant recognition, a complete phoneme recognition experiment using concatenated feature vectors and Gaussian mixture model (GMM) discrimination was conducted, obtaining a 74.4% phoneme accuracy. Isolated word recognition experiments in both normal-hearing and deaf subjects were also conducted providing a word accuracy of 94.9% and 89%, respectively. The obtained results were compared with those obtained using audio signal, and comparable accuracies were observed.

virtual environments human computer interfaces and measurement systems | 2009

HMM-based vowel and consonant automatic recognition in Cued Speech for French

Panikos Heracleous; Noureddine Aboutabit; Denis Beautemps

In this paper, hidden Markov models (HMM)-based vowel and consonant automatic recognition in Cued Speech for French are presented. Cued Speech is a visual communication mode which uses handshapes in different positions and in combination with lip-patterns of speech, makes all the sounds of spoken language clearly understandable to deaf and hearing-impaired people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to fully understand a spoken language. Previously, the authors have reported experimental results on vowel recognition in Cued Speech for French. This study, further investigates the vowel recognition, and also reports automatic consonant recognition experiments in Cued Speech for French. In addition, isolated word recognition experiments both in normal-hearing and deaf subject are presented, showing a promising word accuracy of 92% on average.

IEICE Electronics Express | 2009

Exploiting visual information for NAM recognition

Panikos Heracleous; Denis Beautemps; Viet-Anh Tran; Hélène Loevenbruck; Gérard Bailly

Non-audible murmur (NAM) is an unvoiced speech received through body tissue using special acoustic sensors (i.e., NAM microphones) attached behind the talkers ear. Although NAM has different frequency characteristics compared to normal speech, it is possible to perform automatic speech recognition (ASR) using conventional methods. In using a NAM microphone, body transmission and the loss of lip radiation act as a low-pass filter; as a result, higher frequency components are attenuated in NAM signal. A decrease in NAM recognition performance is attributed to spectral reduction. To address the problem of loss of lip radiation, visual information extracted from the talkers facial movements is fused with NAM speech. Experimental results revealed a relative improvement of 39% when fused NAM speech and facial information were used as compared to using only NAM speech. Results also showed that improvements in the recognition rate depend on the place of articulation.

Archive | 2011

Towards Augmentative Speech Communication

Panikos Heracleous; Denis Beautemps; Hiroshi Ishiguro; Norihiro Hagita

Speech is the most natural form of communication for human beings and is often described as a unimodal communication channel. However, it is well known that speech is multimodal in nature and includes the auditive, visual, and tactile modalities. Other less natural modalities such as electromyographic signal, invisible articulator display, or brain electrical activity or electromagnetic activity can also be considered. Therefore, in situations where audio speech is not available or is corrupted because of disability or adverse environmental condition, people may resort to alternative methods such as augmented speech. In several automatic speech recognition systems, visual information from lips/mouth and facial movements has been used in combination with audio signals. In such cases, visual information is used to complement the audio information to improve the system’s robustness against acoustic noise (Potamianos et al., 2003). For the orally educated deaf or hearing-impaired people, lip reading remains a crucial speech modality, though it is not sufficient to achieve full communication. Therefore, in 1967, Cornett developed the Cued Speech system as a supplement to lip reading (O.Cornett, 1967). Recently, studies have been presented on automatic Cued Speech recognition using hand gestures in combination with lip/mouth information (Heracleous et al., 2009). Several other studies have been introduced that deal with the problem of alternative speech communication based on speech modalities other than audio speech. A method for communication based on inaudible speech received through body tissues has been introduced using the Non-Audible Murmur (NAM) microphone. NAM microphones have been used for receiving and automatically recognizing sounds of speech-impaired people, for ensuring privacy in communication, and for achieving robustness against noise (Heracleous et al., 2007; Nakamura et al., 2008). Aside from automatic recognition of NAM speech, silicon NAM microphones were used for NAM-to-speech conversion (Toda & Shikano, 2005; Tran et al., 2008). 15

IEICE Electronics Express | 2010

Cued Speech: A visual communication mode for the Deaf society

Panikos Heracleous; Denis Beautemps

Cued Speech is a visual mode of communication that uses handshapes and placements in combination with the mouth movements of speech to make the phonemes of a spoken language look different from each other and clearly understandable to deaf individuals. The aim of Cued Speech is to overcome the problems of lip reading and thus enable deaf persons to wholly understand spoken language. In this study, automatic phoneme recognition in Cued Speech for French based on hidden Markov model (HMMs) is introduced. The phoneme correct for a normal-hearing cuer was 82.9%, and for a deaf 81.5%. The results also showed, that creating cuer-independent HMMs should not face any specific difficulties, other than those occured in audio speech recognition.

International Conference on Auditory-Visual Speech Processing (AVSP 2007) | 2007