Marc El-Beze
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marc El-Beze.
international conference on acoustics, speech, and signal processing | 1990
Marc El-Beze; Anne-Marie Derouault
A morphological model, applicable to inflected languages, which combines the robustness of the tripos model with the prediction power of the lemma is proposed. A semantic component acts at the lemma level, without taking into account the different inflections of a lemma, thus making its trainable even for 200000 words. The training corpus for the lemma model (consisting of 38 million words) is labeled in terms of lemma and part of speech, using a semiautomatic process. The results obtained with this new model are reported. The model shows another way to put knowledge in the pure probabilistic framework of hidden Markov models.<<ETX>>
international conference on acoustics, speech, and signal processing | 1991
H. Cerf-Danon; Marc El-Beze
The authors outline the different problems that arise when using a statistical language model for speech recognition, especially for inflected languages such as French, Italian or German. After a brief review of two classical models (TriPOS and Trigram), the authors present a refinement of the morphological language model (Trilemma). They give the different methods used to evaluate performances. They discuss combination experiments between two of these three building blocks and present a model which takes advantage of all three models through a backing-off strategy. Assuming the same vocabulary (20000 forms), experiments show equivalent results using either a classical trigram language model or a trilemma model. The second model can be extended to a full dictionary containing all the inflected forms of each lemma, whereas the first needs a large amount of data to perform such a task.<<ETX>>
international conference on acoustics, speech, and signal processing | 1997
Frédéric Bimbot; Marc El-Beze; Michèle Jardino
Language models are usually evaluated on test texts using the perplexity derived directly from the model likelihood function. In order to use this measure in the framework of a comparative evaluation campaign, we have developed an alternative scheme for perplexity estimation. The method is derived from the Shannon (1951) game and based on a gambling approach on the next word to come in a truncated sentence. We also use entropy bounds proposed by Shannon and based on the rank of the correct answer, in order to estimate a perplexity interval for non-probabilistic language models. The relevance of the approach is assessed on an example.
Computer Speech & Language | 2001
Frédéric Bimbot; Marc El-Beze; Stéphane Igounet; Michèle Jardino; Kamel Smaïli; Imed Zitouni
Language models are usually evaluated on test texts using the perplexity derived from the model likelihood function computed on these texts (test set perplexity). In order to use this measure in the framework of a comparative evaluation campaign, we have developed an alternative scheme for estimating the test set perplexity. The method is derived from the Shannon game and based on a gambling approach on the next word to come in a truncated sentence. We also study the entropy bounds proposed by Shannon and based on the rank of the correct answer, in order to estimate a perplexity interval for non-probabilistic language models. The relevance of the approach is validated on an example. We then report the results of a preliminary comparative evaluation using the proposed scheme.
Archive | 1999
Marc El-Beze; Bernard Merialdo
In the previous chapters we have seen how the context of a word in a sentence is used to help identify the proper tag for that word. It is clear that such consultation of the context is necessary if we want the tagging to reach an acceptable level of correctness. It is not clear, however, that the mechanism used for this consultation should be rule-based.
international conference on computational linguistics | 1992
Jean-Pierre Chanod; Marc El-Beze; Sylvie Guillemin-Lanne
Automatic dictation systems (ADS) are nowadays powerful and rellable. However, some inadequacies of the underlying models still cause errors. In this paper, we are essentially interested in the language model implemented in the linguistic component, and we leave aside the acoustic module. More precisely, we aim at improving this linguistic model by coupling the ADS with a syntactic parser, able to diagnose and correct grammatical errors. We describe the characteristics of such a coupling, and show how the performance of the ADS improves with the actual coupling realized for French between the Tangora ADS and the grammar checker developed at the IBM France Scientific Center.
Proc. of the NATO Advanced Study Institute on Pattern recognition theory and applications | 1987
Helene Cerf-Danon; Anne-Marie Derouault; Marc El-Beze; Bernard Merialdo; Serge Soudoplatoff
Important progress has been achieved in Speech Recognition during the last ten years. Some small recognition tasks like vocal commands can now be accomplished, and more fundamental research involves the study and design of large vocabulary recognition. The research on Automatic Dictation is becoming very active, and recent realizations have shown very good performances. A key factor in the development of such Listening Typewriters is the ability to support Large Size Dictionary (LSD, several thousands words), or even Very Large Size Dictionaries (VLSD, several hundred of thousands words), because any restriction on the vocabulary is a restriction on potential users. This is even more important for inflected languages, such as French, because of the number of different forms for each lemma (on the average: 2.2 for English, 5 for German, 7 for French).
international conference on acoustics, speech, and signal processing | 1986
Marc El-Beze
Between speech training and speech recognition, we provide speech teachers with an innovative tool for teaching deaf children how to master the articulation of voiced sounds. Two theoretical aspects of this work are important :bulletPhonemes are seen as elements of an organized structurebulletThe phonetic recognition is based on rejection principles The following developments derive from those two main ideas:bulletIn a set of N phonemes, a phoneme is defined by N - 1 relations which the N - 1 possible pairs determine.bulletIn the recognition phase, when confronting two phonemes, the refutation discards the least probable one. Eventually, if a phoneme is never rejected, its label is selected.bulletThe method valuation takes the types of errors into account when measuring the recognition rate. Last, we describe a speaker independent program (with a high recognition rate) producing an attractive real time feed-back for vowel articulation.
conference of the international speech communication association | 1989
Helene Cerf-Danon; Anne-Marie Derouault; Marc El-Beze; Bernard Merialdo
Archive | 1991
Helene Cerf-Danon; Marc El-Beze; Bernard Mrialdo