Martine Adda-Decker
University of Paris III: Sorbonne Nouvelle
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martine Adda-Decker.
Speech Communication | 1994
Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Martine Adda-Decker
Abstract In this paper we report on progress made at LIMSI in speaker-independent large vocabulary speech dictation using newspaper-based speech corpora in English and French. The recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n -gram statistics estimated on newspaper texts for language modeling. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models. For English the ARPA Wall Street Journal -based CSR corpus is used and for French the BREF corpus containing recordings of texts from the French newspaper Le Monde is used. Experiments were carried out with both these corpora at the phone level and at the word level with vocabularies containing up to 20,000 words. Word recognition experiments are also described for the ARPA RM task which has been widely used to evaluate and compare systems.
international conference on acoustics, speech, and signal processing | 1994
Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Martine Adda-Decker
We report progress made at LIMSI in speaker-independent large vocabulary speech dictation using the ARPA Wall Street Journal-based CSR corpus. The recognizer makes use of continuous density HMM with Gaussian mixture for acoustic modeling and n-gram statistics estimated on the newspaper texts for language modeling. The recognizer uses a time-synchronous graph-search strategy which is shown to still be viable with vocabularies of up to 20 K words when used with bigram back-off language models. A second forward pass, which makes use of a word graph generated with the bigram, incorporates a trigram language model. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models. The recognizer has been evaluated in the Nov92 and Nov93 ARPA tests for vocabularies of up to 20,000 words.<<ETX>>
international conference on acoustics, speech, and signal processing | 1995
Jean-Luc Gauvain; Lori Lamel; Martine Adda-Decker
We report on our recent development work in large vocabulary, American English continuous speech dictation. We have experimented with (1) alternative analyses for the acoustic front end, (2) the use of an enlarged vocabulary so as to reduce the number of errors due to out-of-vocabulary words, (3) extensions to the lexical representation, (4) the use of additional acoustic training data, and (5) modification of the acoustic models for telephone speech. The recognizer was evaluated on Hubs 1 and 2 of the fall 1994 ARPA NAB CSR Hub and Spoke Benchmark test. Experimental results for development and evaluation test data are given, as well as an analysis of the errors on the development data.
human language technology | 1994
Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Martine Adda-Decker
A major axis of research at LIMSI is directed at multilingual, speaker-independent, large vocabulary speech dictation. In this paper the LIMSI recognizer which was evaluated in the ARPA NOV93 CSR test is described, and experimental results on the WSJ and BREF corpora under closely matched conditions are reported. For both corpora word recognition experiments were carried out with vocabularies containing up to 20k words. The recognizer makes use of continuous density HMM with Gaussian mixture for acoustic modeling and n-gram statistics estimated on the newspaper texts for language modeling. The recognizer uses a time-synchronous graph-search strategy which is shown to still be viable with a 20k-word vocabulary when used with bigram back-off language models. A second forward pass, which makes use of a word graph generated with the bigram, incorporates a trigram language model. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models.
Archive | 2000
Martine Adda-Decker; Lori Lamel
The lexicon plays a pivotal role in automatic speech recognition as it is the link between the acoustic-level representation and the word sequence output by the speech recognizer. The role of the lexicon can be considered as twofold: first, the lexicon specifies what words or lexical items are known by the system; second, the lexicon provides the means to build acoustic models for each entry. Lexical design thus entails two main parts: definition and selection of the vocabulary items and representation of each pronunciation entry using the basic acoustic units of the recognizer. For large vocabulary speech recognition, the vocabulary is usually selected to maximize lexical coverage for a given size lexicon, and the elementary units of choice are usually phonemes or phone-like units.
international conference on acoustics, speech, and signal processing | 1997
Jean-Luc Gauvain; Gilles Adda; Lori Lamel; Martine Adda-Decker
While significant improvements have been made in large vocabulary continuous speech recognition of large read-speech corpora such as the ARPA Wall Street Journal-based CSR corpus (WSJ) for American English and the BREF corpus for French, these tasks remain relatively artificial. In this paper we report on our development work in moving from laboratory read speech data to real-world speech data in order to build a system for the new ARPA broadcast news transcription task. The LIMSI Nov96 speech recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n-gram statistics estimated on newspaper texts. The acoustic models are trained on the WSJO/WSJ1, and adapted using MAP estimation with task-specific training data. The overall word error on the Nov96 partitioned evaluation test was 27.1%.
Procedia Computer Science | 2016
David Blachon; Elodie Gauthier; Laurent Besacier; Guy-Noël Kouarata; Martine Adda-Decker; Annie Rialland
This paper reports on our ongoing efforts to collect speech data in under-resourced or endangered languages of Africa. Data collection is carried out using an improved version of the Android application Aikuma developed by Steven Bird and colleagues 1. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of the French-German ANR/DFG BULB (Breaking the Unwritten Language Barrier) project. The resulting app, called Lig-Aikuma, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). Lig-Aikumas improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping. It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech. Design issues of the mobile app as well as the use of Lig-Aikuma during two recording campaigns, are further described in this paper.
international conference on acoustics, speech, and signal processing | 2004
Lori Lamel; Jean-Luc Gauvain; Gilles Adda; Martine Adda-Decker; L. Canseco; Langzhou Chen; Olivier Galibert; Abdelkhalek Messaoudi; Holger Schwenk
The paper summarizes recent work underway at LIMSI on speech-to-text transcription in multiple languages. The research has been oriented towards the processing of broadcast audio and conversational speech for information access. Broadcast news transcription systems have been developed for seven languages, and it is planned to address several other languages in the near term. Research on conversational speech has mainly focused on the English language, with some initial work on French, Arabic and Spanish. Automatic processing must take into account the characteristics of the audio data, such as needing to deal with the continuous data stream, specificities of the language and the use of an imperfect word transcription for accessing the information content. Our experience thus far indicates that at todays word error rates, the techniques used in one language can be successfully ported to other languages, and most of the language specificities concern lexical and pronunciation modeling.
international conference on acoustics speech and signal processing | 1999
Martine Adda-Decker; Gilles Adda; Jean-Luc Gauvain; Lori Lamel
We present some design considerations concerning our large vocabulary continuous speech recognition system in French. The impact of the epoch of the text training material on lexical coverage, language model perplexity and recognition performance on newspaper texts is demonstrated. The effectiveness of larger vocabulary sizes and larger text training corpora for language modeling is investigated. French is a highly inflected language producing large lexical variety and a high homophone rate. About 30% of recognition errors are shown to be due to substitutions between inflected forms of a given root form. When word error rates are analysed as a function of word frequency, a significant increase in the error rate can be measured for frequency ranks above 5000.
international conference on acoustics speech and signal processing | 1996
Martine Adda-Decker; Gilles Adda; Lori Lamel; Jean-Luc Gauvain
We describe our large vocabulary continuous speech recognition system for the German language, the development of which was partly carried out within the context of the European LRE project 62-058 SQALE. The recognition system is the LIMSI recognizer originally developed for French and American English, which has been adapted to German. Specificities of German, as relevant to the recognition system, are presented. These specificities have been accounted for during the recognizers adaptation process. We present experimental results on a first test set ger-dev95 to measure progress in system development. Results are given with the final system using different acoustic model sets on two test sets ger-dev95 and ger-eval95. This system achieved a word error rate of 17.3% (official word error rate of 16.1% after SQALE adjudication process) on the ger-eval95 test set.