Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Frédéric Berthommier is active.

Publication


Featured researches published by Frédéric Berthommier.


EURASIP Journal on Advances in Signal Processing | 2002

Noise adaptive stream weighting in audio-visual speech recognition

Martin Heckmann; Frédéric Berthommier; Kristian Kroschel

It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI) architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.


Hearing Research | 1995

Neuronal correlates of perceptual amplitude-modulation detection

Christian Lorenzi; Chrisophe Micheyl; Frédéric Berthommier

The goal of the present paper is to relate the coding of amplitude modulation (AM) in the auditory pathway to the behavioral detection performance. To address this issue, the detectability of AM was estimated by modelling a single neuron located in the central nucleus of the inferior colliculus (IC). The computational model is based on cochlear nucleus responses and a coincidence detection mechanism. The model replicated the main feature of the neuronal AM transfer function, namely a bandpass function. The IC-unit model was initially tuned to a 200-Hz modulation frequency. A single neurometric function for AM detection at this modulation frequency was generated using a 2-interval, 2-alternative forced-choice paradigm. On each trial of the experiments, AM was taken to be correctly detected by the model if the number of spikes in response to the modulated signal exceeded the number of spikes in an otherwise identical interval that contained an unmodulated signal. Psychometric functions for 4 human subjects were also measured under the same stimulus conditions. Comparison of the simulated neurometric and psychometric functions suggested that there was sufficient information in the rate response of an IC neuron well-tuned in the modulation-frequency domain to support behavioral detection performance.


conference of the international speech communication association | 2002

Multichannel signal separation for cocktail party speech recognition: a dynamic recurrent network

Seungjin Choi; Heonseok Hong; Hervé Glotin; Frédéric Berthommier

Abstract This paper addresses a method of multichannel signal separation (MSS) with its application to cocktail party speech recognition. First, we present a fundamental principle for multichannel signal separation which uses the spatial independence of located sources as well as the temporal dependence of speech signals. Second, for practical implementation of the signal separation filter, we consider a dynamic recurrent network and develop a simple new learning algorithm. The performance of the proposed method is evaluated in terms of word recognition error rate (WER) in a large speech recognition experiment. The results show that our proposed method dramatically improves the word recognition performance in the case of two simultaneous speech inputs, and that a timing effect is involved in the segregation process.


international conference on acoustics, speech, and signal processing | 2001

Optimal weighting of posteriors for audio-visual speech recognition

Martin Heckmann; Frédéric Berthommier; Kristian Kroschel

We investigate the fusion of audio and video a posteriori phonetic probabilities in a hybrid ANN/HMM audio-visual speech recognition system. Three basic conditions to the fusion process are stated and implemented in a linear and a geometric weighting scheme. These conditions are the assumption of conditional independence of the audio and video data and the contribution of only one of the two paths when the SNR is very high or very low, respectively. In the case of the geometric weighting a new weighting scheme is developed whereas the linear weighting follows the full combination approach as employed in multi-stream recognition. We compare these two new concepts in audio-visual recognition to a rather standard approach known from the literature. Recognition tests were performed in a continuous number recognition task on a single speaker database containing 1712 utterances with two different types of noise added.


PLOS ONE | 2017

Evidence of a Vocalic Proto-System in the Baboon (Papio papio) Suggests Pre-Hominin Speech Precursors

Louis-Jean Boë; Frédéric Berthommier; Thierry Legou; Guillaume Captier; Caralyn Kemp; Thomas R. Sawallis; Yannick Becker; Arnaud Rey; Joël Fagot

Language is a distinguishing characteristic of our species, and the course of its evolution is one of the hardest problems in science. It has long been generally considered that human speech requires a low larynx, and that the high larynx of nonhuman primates should preclude their producing the vowel systems universally found in human language. Examining the vocalizations through acoustic analyses, tongue anatomy, and modeling of acoustic potential, we found that baboons (Papio papio) produce sounds sharing the F1/F2 formant structure of the human [ɨ æ ɑ ɔ u] vowels, and that similarly with humans those vocalic qualities are organized as a system on two acoustic-anatomic axes. This confirms that hominoids can produce contrasting vowel qualities despite a high larynx. It suggests that spoken languages evolved from ancient articulatory skills already present in our last common ancestor with Cercopithecoidea, about 25 MYA.


Speech Communication | 2004

A phonetically neutral model of the low-level audio-visual interaction

Frédéric Berthommier

Abstract The improvement of detectability of visible speech cues found by Grant and Seitz [2000. The use of visible speech cues for improving auditory detection of spoken sentences. JASA 108, 1197–1208] has been related to the degree of correlation between acoustic envelopes and visible movements. This suggests that audio and visual signals could interact early during the audio-visual perceptual process on the basis of audio envelope cues. On the other hand, acoustic-visual correlations were previously reported by Yehia et al. [1998. Quantitative association of vocal tract and facial behavior. Speech Commun. 26 (1), 23–43]. Taking into account these two main facts, the problem of extraction of the redundant audio-visual components is revisited: the video parametrization of natural images and three types of audio parameters are tested together, leading to new and realistic applications in video synthesis and audio-visual speech enhancement. Consistent with Grant and Seitz’s prediction, the 4-subband envelope energy features are found to be optimal for encoding the redundant components available for the enhancement task. The proposed computational model of audio-visual interaction is based on the product, in the audio pathway, between the time-aligned audio envelopes and video-predicted envelopes. This interaction scheme is shown to be phonetically neutral, so that it will not bias phonetic identification. The low-level stage which is described is compatible with a late integration process, which may be used as a potential front-end for speech recognition applications.


Journal of the Acoustical Society of America | 1999

Discrimination of amplitude-modulation phase spectrum

Christian Lorenzi; Frédéric Berthommier; Laurent Demany

Listeners were asked to discriminate between two amplitude-modulation functions imposed on white noise and consisting of the sum of two sinusoids. The frequency ratio of the sinusoids constituting each function was 2 or 3. In one function, the sinusoids had a constant relative phase. In the other function, their phase relation was continuously and cyclically changing, at a slow rate. For all listeners, the two functions with a frequency ratio of 2 were easily discriminated. However, discrimination was impossible when the frequency ratio was 3. Simulations were performed using an envelope-detector model and various decision statistics. The max/min statistic predicted discrimination above chance level when the frequency ratio was 3. It seems, therefore, that listeners are unable to use this statistic. In contrast, the crest factor and skewness of the envelope accounted well for the discrimination data.


Journal of the Acoustical Society of America | 2015

Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect

Olha Nahorna; Frédéric Berthommier; Jean-Luc Schwartz

While audiovisual interactions in speech perception have long been considered as automatic, recent data suggest that this is not the case. In a previous study, Nahorna et al. [(2012). J. Acoust. Soc. Am. 132, 1061-1077] showed that the McGurk effect is reduced by a previous incoherent audiovisual context. This was interpreted as showing the existence of an audiovisual binding stage controlling the fusion process. Incoherence would produce unbinding and decrease the weight of the visual input in fusion. The present paper explores the audiovisual binding system to characterize its dynamics. A first experiment assesses the dynamics of unbinding, and shows that it is rapid: An incoherent context less than 0.5 s long (typically one syllable) suffices to produce a maximal reduction in the McGurk effect. A second experiment tests the rebinding process, by presenting a short period of either coherent material or silence after the incoherent unbinding context. Coherence provides rebinding, with a recovery of the McGurk effect, while silence provides no rebinding and hence freezes the unbinding process. These experiments are interpreted in the framework of an audiovisual speech scene analysis process assessing the perceptual organization of an audiovisual speech input before decision takes place at a higher processing stage.


Journal of the Acoustical Society of America | 2014

Effects of aging on audio-visual speech integration

Aurélie Huyse; Jacqueline Leybaert; Frédéric Berthommier

This study investigated the impact of aging on audio-visual speech integration. A syllable identification task was presented in auditory-only, visual-only, and audio-visual congruent and incongruent conditions. Visual cues were either degraded or unmodified. Stimuli were embedded in stationary noise alternating with modulated noise. Fifteen young adults and 15 older adults participated in this study. Results showed that older adults had preserved lipreading abilities when the visual input was clear but not when it was degraded. The impact of aging on audio-visual integration also depended on the quality of the visual cues. In the visual clear condition, the audio-visual gain was similar in both groups and analyses in the framework of the fuzzy-logical model of perception confirmed that older adults did not differ from younger adults in their audio-visual integration abilities. In the visual reduction condition, the audio-visual gain was reduced in the older group, but only when the noise was stationary, suggesting that older participants could compensate for the loss of lipreading abilities by using the auditory information available in the valleys of the noise. The fuzzy-logical model of perception confirmed the significant impact of aging on audio-visual integration by showing an increased weight of audition in the older group.


international conference on acoustics, speech, and signal processing | 2004

Characterization and extraction of mouth opening parameters available for audiovisual speech enhancement

Frédéric Berthommier

The strong association existing between audio subband envelope parameters and video parameters extracted using the full DCT (discrete cosine transform) can be exploited for audiovisual speech enhancement, thanks to a good prediction of amplitude variations by a statistical model. Since the video parameter space is highly multidimensional, the causality of this association must be clarified. At first, a new method of retro-marking is proposed in order to build a transformation function of DCT parameters into explicit ABS mouth opening parameters. Secondly, a reduction to single parameter spaces is performed by selection of the best parameters. We show in two noisy conditions that the degradation of the enhancement performance due to the transformation and to the reduction is moderate.

Collaboration


Dive into the Frédéric Berthommier's collaboration.

Top Co-Authors

Avatar

Jean-Luc Schwartz

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Hervé Glotin

Aix-Marseille University

View shared research outputs
Top Co-Authors

Avatar

Christian Lorenzi

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar

Olha Nahorna

Grenoble Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Louis-Jean Boë

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Georg Meyer

University of Liverpool

View shared research outputs
Top Co-Authors

Avatar

Kristian Kroschel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christophe Savariaux

Centre national de la recherche scientifique

View shared research outputs
Researchain Logo
Decentralizing Knowledge