David Sodoyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Sodoyer is active.

Explore More

Publication

Featured researches published by David Sodoyer.

international conference on acoustics, speech, and signal processing | 2006

An Analysis of Visual Speech Information Applied to Voice Activity Detection

David Sodoyer; Bertrand Rivet; Laurent Girin; Jean-Luc Schwartz; Christian Jutten

We present a new approach to the voice activity detection (VAD) problem for speech signals embedded in non-stationary noise. The method is based on automatic lipreading: the objective is to detect voice activity or non-activity by exploiting the coherence between the speech acoustic signal and the speakers lip movements. From a comprehensive analysis of lip shape parameters during speech and non-speech events, we show that a single appropriate visual parameter, defined to characterize the lip movements, can be used for the detection of sections of voice activity or more precisely, for the detection of silence sections. Detection scores obtained on spontaneous speech confirm the efficiency of the visual voice activity detector (VVAD)

Speech Communication | 2004

Developing an audio-visual speech source separation algorithm

David Sodoyer; Laurent Girin; Christian Jutten; Jean-Luc Schwartz

Abstract Looking at the speaker’s face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the “best” sensor with reference to a target source.

EURASIP Journal on Advances in Signal Processing | 2002

Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli

David Sodoyer; Jean-Luc Schwartz; Laurent Girin; Jacob Klinkisch; Christian Jutten

We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker′s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.

international conference on acoustics, speech, and signal processing | 2016

Deep neural networks for automatic detection of screams and shouted speech in subway trains

Pierre Laffitte; David Sodoyer; Charles Tatkeu; Laurent Girin

Deep Neural Networks (DNNs) have recently become a popular technique for regression and classification problems. Their capacity to learn high-order correlations between input and output data proves to be very powerful for automatic speech recognition. In this paper we investigate the use of DNNs for automatic scream and shouted speech detection, within the framework of surveillance systems in public transportation. We recorded a database of sounds occurring in subway trains in real conditions of exploitation and used DNNs to classify the sounds into screams, shouts and other categories. We report encouraging results, given the difficulty of the task, especially when a high level of surrounding noise is present.

information sciences, signal processing and their applications | 2003

Speech extraction based on ICA and audio-visual coherence

David Sodoyer; Laurent Girin; Christian Jutten; Jean-Luc Schwartz

We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the speakers face, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speakers lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the audio-visual coherence. Then, separation can be achieved by maximising this joint probability. Experiments on additive mixtures of 2, 3 and 5 sources show that the algorithm performs well, and systematically better than the classical BSS algorithm JADE.

Expert Systems With Applications | 2019

Assessing the Performances of different Neural Network Architectures for the Detection of Screams and Shouts in Public Transportation

Pierre Laffitte; Yun Wang; David Sodoyer; Laurent Girin

As intelligent transportation systems are becoming more and more prevalent, the relevance of automatic surveillance systems grows larger. While such systems rely heavily on video signals, other types of signals can be used as well to monitor the security of passengers. The present article proposes an audio-based intelligent system for surveillance in public transportation, investigating the use of some state-of-the-art artificial intelligence methods for the automatic detection of screams and shouts. We present test results produced on a database of sounds occurring in subway trains in real working conditions, by classifying sounds into screams, shouts and other categories using different Neural Network architectures. The relevance of these architectures in the analysis of audio signals is analyzed. We report encouraging results, given the difficulty of the task, especially when a high level of surrounding noise is present.

information sciences, signal processing and their applications | 2010

Supervised audio source localisation using microphone array

David Sodoyer; Sébastien Ambellouis

In this paper, we present a supervised method for speaker localisation in an echoic environment using a microphone array. An audio source is located among a set of areas centered on each microphone. Each area is characterized by a multi-dimensional Gaussian Mixture of the propagation channel between the source and several pairs of microphones. The distance between the sensors has been chosen in order to keep using both level difference and phase information between the microphones. The evaluation results show that the algorithm reaches very good localisation rate and does not depend on the speaker.

Journal of the Acoustical Society of America | 2009