Is this you? Create Your Porfile

Laurent Besacier

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Laurent Besacier is active.

Explore More

Publication

Featured researches published by Laurent Besacier.

international conference of the ieee engineering in medicine and biology society | 2006

Information extraction from sound for medical telemonitoring

Dan Istrate; Eric Castelli; Michel Vacher; Laurent Besacier; Jean-François Serignat

Today, the growth of the aging population in Europe needs an increasing number of health care professionals and facilities for aged persons. Medical telemonitoring at home (and, more generally, telemedicine) improves the patients comfort and reduces hospitalization costs. Using sound surveillance as an alternative solution to video telemonitoring, this paper deals with the detection and classification of alarming sounds in a noisy environment. The proposed sound analysis system can detect distress or everyday sounds everywhere in the monitored apartment, and is connected to classical medical telemonitoring sensors through a data fusion process. The sound analysis system is divided in two stages: sound detection and classification. The first analysis stage (sound detection) must extract significant sounds from a continuous signal flow. A new detection algorithm based on discrete wavelet transform is proposed in this paper, which leads to accurate results when applied to nonstationary signals (such as impulsive sounds). The algorithm presented in this paper was evaluated in a noisy environment and is favorably compared to the state of the art algorithms in the field. The second stage of the system is sound classification, which uses a statistical approach to identify unknown sounds. A statistical study was done to find out the most discriminant acoustical parameters in the input of the classification module. New wavelet based parameters, better adapted to noise, are proposed in this paper. The telemonitoring system validation is presented through various real and simulated test sets. The global sound based system leads to a 3% missed alarm rate and could be fused with other medical sensors to improve performance

Computer Speech & Language | 2006

Step-by-step and integrated approaches in broadcast news speaker diarization

Sylvain Meignier; Daniel Moraru; Corinne Fredouille; Jean-François Bonastre; Laurent Besacier

This paper summarizes the collaboration of the LIA and CLIPS laboratories on speaker diarization of broadcast news during the spring NIST Rich Transcription 2003 evaluation campaign (NIST-RTO03S). The speaker diarization task consists of segmenting a conversation into homogeneous segments which are then grouped into speaker classes. Two approaches are described and compared for speaker diarization. The first one relies on a classical two-step speaker diarization strategy based on a detection of speaker turns followed by a clustering process, while the second one uses an integrated strategy where both segment boundaries and speaker tying of the segments are extracted simultaneously and challenged during the whole process. These two methods are used to investigate various strategies for the fusion of diarization results. Furthermore, segmentation into acoustic macro-classes is proposed and evaluated as a priori step to speaker diarization. The objective is to take advantage of the a priori acoustic information in the diariza-tion process. Along with enriching the resulting segmentation with information about speaker gender,

Speech Communication | 2014

Automatic speech recognition for under-resourced languages: A survey

Laurent Besacier; Etienne Barnard; Alexey Karpov; Tanja Schultz

Speech processing for under-resourced languages is an active field of research, which has experienced significant progress during the past decade. We propose, in this paper, a survey that focuses on automatic speech recognition (ASR) for these languages. The definition of under-resourced languages and the challenges associated to them are first defined. The main part of the paper is a literature review of the recent (last 8years) contributions made in ASR for under-resourced languages. Examples of past projects and future trends when dealing with under-resourced languages are also presented. We believe that this paper will be a good starting point for anyone interested to initiate research in (or operational development of) ASR for one or several under-resourced languages. It should be clear, however, that many of the issues and approaches presented here, apply to speech technology in general (text-to-speech synthesis for instance).

IEEE Transactions on Audio, Speech, and Language Processing | 2009

Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language

Viet Bac Le; Laurent Besacier

This paper presents our work in automatic speech recognition (ASR) in the context of under-resourced languages with application to Vietnamese. Different techniques for bootstrapping acoustic models are presented. First, we present the use of acoustic-phonetic unit distances and the potential of crosslingual acoustic modeling for under-resourced languages. Experimental results on Vietnamese showed that with only a few hours of target language speech data, crosslingual context independent modeling worked better than crosslingual context dependent modeling. However, it was outperformed by the latter one, when more speech data were available. We concluded, therefore, that in both cases, crosslingual systems are better than monolingual baseline systems. The proposal of grapheme-based acoustic modeling, which avoids building a phonetic dictionary, is also investigated in our work. Finally, since the use of sub-word units (morphemes, syllables, characters, etc.) can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling for under-resourced languages, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. The proposed lattice combination scheme results in a relative syllable error rate reduction of 6.6% over the sentence MAP baseline method for a Vietnamese ASR task.

international conference on acoustics, speech, and signal processing | 2005

First steps in fast acoustic modeling for a new target language: application to Vietnamese

Viet Bac Le; Laurent Besacier

This paper presents our first steps in fast acoustic modeling for a new target language. Both knowledge-based and data-driven methods were used to obtain phone mapping tables between a source language (French) and a target language (Vietnamese). While acoustic models borrowed directly from the source language did not perform very well, we have shown that using a small amount of adaptation data in the target language (one or two hours) lead to very acceptable automatic speech recognition (ASR) performance. Our best continuous Vietnamese recognition system, adapted with only two hours of Vietnamese data, obtains a word accuracy of 63.9% on one hour of Vietnamese speech dialog for instance.

Speech Communication | 2000

Localization and selection of speaker-specific information with statistical modeling

Laurent Besacier; Jean-François Bonastre; Corinne Fredouille

Abstract Statistical modeling of the speech signal has been widely used in speaker recognition. The performance obtained with this type of modeling is excellent in laboratories but decreases dramatically for telephone or noisy speech. Moreover, it is difficult to know which piece of information is taken into account by the system. In order to solve this problem and to improve the current systems, a better understanding of the nature of the information used by statistical methods is needed. This knowledge should allow to select only the relevant information or to add new sources of information. The first part of this paper presents experiments that aim at localizing the most useful acoustic events for speaker recognition. The relation between the discriminant ability and the speechs events nature is studied. Particularly, the phonetic content, the signal stability and the frequency domain are explored. Finally, the potential of dynamic information contained in the relation between a frame and its p neighbours is investigated. In the second part, the authors suggest a new selection procedure designed to select the pertinent features. Conventional feature selection techniques (ascendant selection, knock-out) allow only global and a posteriori knowledge about the relevance of an information source. However, some speech clusters may be very efficient to recognize a particular speaker, whereas they can be non-informative for another one. Moreover, some information classes may be corrupted or even missing for particular recording conditions. This necessity for speaker-specific processing and for adaptability to the environment (with no a priori knowledge of the degradation affecting the signal) leads the authors to propose a system that automatically selects the most discriminant parts of a speech utterance. The proposed architecture divides the signal into different time–frequency blocks. The likelihood is calculated after dynamically selecting the most useful blocks. This information selection leads to a significative error rate reduction (up to 41% of relative error rate decrease on TIMIT) for short training and test durations. Finally, experiments in the case of simulated noise degradation show that this approach is a very efficient way to deal with partially corrupted speech.

international conference on acoustics, speech, and signal processing | 2000

GSM speech coding and speaker recognition

Laurent Besacier; Sara Grassi; Alain Dufaux; Michael Ansorge; Fausto Pellandini

This paper investigates the influence of GSM speech coding on text independent speaker recognition performance. The three existing GSM speech coder standards were considered. The whole TIMIT database was passed through these coders, obtaining three transcoded databases. In a first experiment, it was found that the use of GSM coding degrades significantly the identification and verification performance (performance in correspondence with the perceptual speech quality of each coder). In a second experiment, the features for the speaker recognition system were calculated directly from the information available in the encoded bit stream. It was found that a low LPC order in GSM coding is responsible for most performance degradations. By extracting the features directly from the encoded bit-stream, we also managed to obtain a speaker recognition system equivalent in performance to the original one which decodes and reanalyzes speech before performing recognition.

ieee automatic speech recognition and understanding workshop | 2003

Audio packet loss over IP and speech recognition

Pedro Mayorga; Laurent Besacier; Richard Lamy; Jean-François Serignat

This paper deals with the effects of packet loss on speech recognition over IP connections. The performance of our continuous French speech recognition system is here evaluated for different transmission scenarios. A packet loss simulation model is first proposed in order to simulate different channel degradation conditions. The packet loss problem is also investigated in real transmissions through IP. Because packet loss impact may be different according to the speech coder used to transmit data, different transmission conditions with different audio codecs are also investigated. Several reconstruction strategies to recover lost information are then proposed, and tested. Another solution for dialog applications is also suggested, where the relative weight of the language and acoustic model is changed according to the packet loss rate. The results show that the speech recognition performance can be augmented by the solutions here presented.

international conference on multimedia and expo | 2012

From Text Detection in Videos to Person Identification

Johann Poignant; Laurent Besacier; Georges Quénot; Franck Thollard

We present in this article a video OCR system that detects and recognizes overlaid texts in video as well as its application to person identification in video documents. We proceed in several steps. First, text detection and temporal tracking are performed. After adaptation of images to a standard OCR system, a final post-processing combines multiple transcriptions of the same text box. The semi-supervised adaptation of this system to a particular video type (video broadcast from a French TV) is proposed and evaluated. The system is efficient as it runs 3 times faster than real time (including the OCR step) on a desktop Linux box. Both text detection and recognition are evaluated individually and through a person recognition task where it is shown that the combination of OCR and audio (speaker) information can greatly improve the performances of a state of the art audio based person identification system.

multimedia signal processing | 2001

The effect of speech and audio compression on speech recognition performance

Laurent Besacier; Carole Bergamini; Dominique Vaufreydaz; Eric Castelli

This paper proposes an in-depth look at the influence of different speech and audio codecs on the performance of our continuous speech recognition engine. GSM full rate, G711, G723.1 and MPEG coders are investigated. It is shown that MPEG transcoding degrades the speech recognition performance for low bitrates whereas performance remains acceptable for specialized speech coders like GSM or G711. A new strategy is proposed to cope with degradation due to low bitrate coding. The acoustic models of the speech recognition system are trained with transcoded speech (one acoustic model for each speech/audio codec). First results show that this strategy allows one to recover acceptable performance.

Explore More