Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Benoît Maison is active.

Publication


Featured researches published by Benoît Maison.


international conference on acoustics, speech, and signal processing | 2004

Combination of hidden Markov models with dynamic time warping for speech recognition

Scott Axelrod; Benoît Maison

We combine hidden Markov models of various topologies and nearest neighbor classification techniques in an exponential modeling framework with a model selection algorithm to obtain significant error rate reductions on an isolated word digit recognition task. This work is a preliminary investigation of large scale modeling techniques to be applied to large vocabulary continuous speech recognition.


multimedia signal processing | 1999

Audio-visual speaker recognition for video broadcast news: some fusion techniques

Benoît Maison; Chalapathy Neti; Andrew W. Senior

Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data suggest that significant improvements can be achieved by the combination in acoustically degraded conditions.


ieee automatic speech recognition and understanding workshop | 2003

Pronunciation modeling for names of foreign origin

Benoît Maison; Stanley E Chen; Paul S. Cohen

The pronunciation of a proper name is influenced by both a speakers native language as well as the language of origin of the name itself. Thus, creating suitable sets of pronunciations for names in speech recognition applications is extremely challenging. We investigate whether automatic language identification and grapheme-to-phoneme conversion algorithms can be effective for this task. We train grapheme-to-phoneme models for eight foreign languages and use automatic language identification to select the models with which to generate additional pronunciations for words in a baseline pronunciation dictionary. As compared to the baseline dictionary in a US name recognition task, we achieve a 25% reduction in sentence-error rate for foreign names spoken by native speakers of the language in question, and a 10% reduction in sentence-error rate for foreign names spoken by American speakers.


international conference on acoustics, speech, and signal processing | 2001

Robust confidence annotation and rejection for continuous speech recognition

Benoît Maison; Ramesh A. Gopinath

We are looking for confidence scoring techniques that perform well on a broad variety of tasks. Our main focus is on word-level error rejection, but most results apply to other scenarios as well. A variation of the normalized cross entropy that is adapted to that purpose is introduced. It is successfully used to automatically select features and optimize the word-level confidence measure on several test sets. Sentence-level confidence geared toward the rejection of out-of-grammar utterances is also investigated. The combination of a word graph based technique and the acoustic score shows excellent performance across all the tasks we considered.


signal processing systems | 2001

Audio-Visual Speaker Recognition for Video Broadcast News

Benoît Maison; Chalapathy Neti; Andrew W. Senior

Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions due either to channel or to noise. In this paper, we explore various techniques to combine video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data show that significant improvements can be achieved by the fusion in acoustically degraded conditions.


international conference on acoustics, speech, and signal processing | 2001

Automatic generation and selection of multiple pronunciations for dynamic vocabularies

Sabine Deligne; Benoît Maison; Ramesh A. Gopinath

We present a scheme for the acoustic modeling of speech recognition applications requiring dynamic vocabularies. It applies especially to the acoustic modeling of out-of-vocabulary words which need to be added to a recognition lexicon based on the observation of a few (say one or two) speech utterances of these words. Standard approaches to this problem derive a single pronunciation from each speech utterance by combining acoustic and phone transition scores. In our scheme, multiple pronunciations are generated from each speech utterance of a word to enroll by varying the relative weights assigned to the acoustic and phone transition models. In our experiments, the use of these multiple baseforms dramatically outperforms the standard approach with a relative decrease of the word error rate ranging from 20% to 40% on all our test sets.


international conference on acoustics, speech, and signal processing | 2001

Toward island-of-reliability-driven very-large-vocabulary on-line handwriting recognition using character confidence scoring

John F. Pitrelli; Jayashree Subrahmonia; Benoît Maison

We explore a novel approach for handwriting recognition tasks whose intrinsic vocabularies are too large to be applied directly as constraints during recognition. Our approach makes use of vocabulary constraints, and addresses the issue that some parts of words may be written more recognizably than others. An initial pass is made with an HMM recognizer, without vocabulary constraints, generating a lattice of character-hypothesis arcs representing likely segmentations of the handwriting signal. Arc confidence scores are computed using a posteriori probabilities. The most confidently recognized characters are used to filter the overall vocabulary, generating a word subset manageable for constraining a second recognition pass. With a vocabulary of 273000 words, we can limit to 50000 words in the second pass and eliminate 39.3% of the word errors made by a one-pass recognizer without vocabulary constraints, and 18.3% of errors made using a fixed 30000-word set.


Journal of the Acoustical Society of America | 1999

Methods and apparatus for audio-visual speaker recognition and utterance verification

Sankar Basu; Homayoon S. M. Beigi; Stephane Herman Maes; Benoît Maison; Chalapathy Neti; Andrew W. Senior


Archive | 2008

Natural error handling in speech recognition

Ramesh A. Gopinath; Benoît Maison; Brian C. Wu


Journal of the Acoustical Society of America | 2005

Hierarchical transcription and display of input speech

Sara H. Basson; Dimitri Kanevsky; Benoît Maison

Researchain Logo
Decentralizing Knowledge