Patrick Kenny
Institut national de la recherche scientifique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Patrick Kenny.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1990
Patrick Kenny; Matthew Lennig; Paul Mermelstein
The authors describe a new type of Markov model developed to account for the correlations between successive frames of a speech signal. The idea is to treat the sequence of frames as a nonstationary autoregressive process whose parameters are controlled by a hidden Markov chain. It is shown that this type of model performs better than the standard multivariate Gaussian HMM (hidden Markov model) when it is incorporated into a large-vocabulary isolated-word recognizer. >
IEEE Transactions on Speech and Audio Processing | 1993
Patrick Kenny; Rene Hollan; Vishwa Gupta; Matthew Lennig; Paul Mermelstein; Douglas D. O'Shaughnessy
A new class of A* algorithms for Viterbi phonetic decoding subject to lexical constraints is presented. This type of algorithm can be made to run substantially faster than the Viterbi algorithm in an isolated word recognizer having a vocabulary of 1600 words. In addition, multiple recognition hypotheses can be generated on demand and the search can be constrained in respect conditions on phone durations in such a way that computational requirements are substantially reduced. Results are presented on a 60000-word recognition task. >
IEEE Transactions on Signal Processing | 1991
Li Deng; Patrick Kenny; Matthew Lennig; Vishwa Gupta; Franz Seitz; Paul Mermelstein
The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another. >
IEEE Transactions on Signal Processing | 1992
Li Deng; Patrick Kenny; Matthew Lennig; Paul Mermelstein
The authors present a new type of hidden Markov model (HMM) for vowel-to-consonant (VC) and consonant-to-vowel (CV) transitions based on the locus theory of speech perception. The parameters of the model can be trained automatically using the Baum-Welch algorithm and the training procedure does not require that instances of all possible CV and VC pairs be present. When incorporated into an isolated word recognizer with a 75000 word vocabulary it leads to the modest improvement in recognition rates. The authors give recognition results for the state interpolation HMM and compare them to those obtained by standard context-independent HMMs and generalized triphone models. >
international conference on acoustics, speech, and signal processing | 1991
Vishwa Gupta; Matthew Lennig; Paul Mermelstein; Patrick Kenny; F. Seitz; Douglas D. O'Shaughnessy
Minimum duration constraints and energy thresholds for phonemes were used to increase the recognition accuracy of an 86000-word speaker-trained isolated word recognizer. Minimum duration constraints force the phoneme models to map to acoustic segments longer than the duration minima for the phonemes. Such constraints result in significant lowering of likelihoods of many incorrect word choices, improving the accuracy of acoustic recognition and recognition with the language model. The phoneme models were also improved by correcting the segmentation of the phonemes in the training set. During training, the boundaries between phonemes are not marked accurately. Energy is used to correct these boundaries. Application of an energy threshold improves the segment boundaries between stops and sonorants (vowels, liquids, and glides), between fricatives and sonorants, between affricates and sonorants. and between breath noise and sonorants. On two speakers, the overall reduction in errors using minimum durations and energy thresholds is from 27.3% to 23.1% for acoustic recognition and from 14.3% to 8.8% with the language model.<<ETX>>
Computer Speech & Language | 1990
Philip F. Seitz; Vishwa Gupta; Matthew Lennig; Patrick Kenny; Li Deng; Douglas D. O'Shaughnessy; Paul Mermelstein
Abstract It is not too difficult to select a fairly small (on the order of 20 000 words) fixed recognition vocabulary that will cover over 99% of new input words when the task is limited to text in a specific knowledge domain and when one disregards names and acronyms. Achieving such a level of coverage is much more difficult when restrictions on knowledge domain and names are lifted, however. This report describes how we selected a 75 000-word English recognition vocabulary that covers over 98% of words in new newspaper text, including names and acronyms. Observations collected during the vocabulary selection process indicate the limiting factors for coverage of general knowledge domain text such as newspaper stories.
Speech Communication | 1994
Patrick Kenny; Gilles Boulianne; Harinath Garudadri; S. Trudelle; Rene Hollan; Matthew Lennig; Douglas D. O'Shaughnessy
Abstract We present a new search algorithm for very large vocabulary continuous speech recognition. Continuous speech recognition with this algorithm is only about 10 times more computationally expensive than isolated word recognition. We report preliminary recognition results obtained by testing our recognizer on books on tape using a 60 000 word dictionary.
human language technology | 1992
Patrick Kenny; Rene Hollan; Gilles Boulianne; Harinath Garudadri; Matthew Lennig; Douglas D. O'Shaughnessy
We present a new search algorithm for very large vocabulary continuous speech recognition. Continuous speech recognition with this algorithm is only about 10 times more computationally expensive than isolated word recognition. We report preliminary recognition results obtained by testing our recognizer on books on tape using a 60,000 word dictionary.
Computer Speech & Language | 2000
M. Barszcz; W. Chen; Gilles Boulianne; Patrick Kenny
Abstract We describe some new methods for constructing discrete acoustic phonetic hidden Markov models (HMMs) using tree quantizers having very large numbers (16–64 K) of leaf nodes and tree-structured smoothing techniques. We consider two criteria for constructing tree quantizers (minimum distortion and minimum entropy) and three types of smoothing (mixture smoothing, smoothing by adding 1 and Gaussian smoothing). We show that these methods are capable of achieving recognition accuracies which are generally comparable to those obtained with Gaussian mixture HMMs at a computational cost which is only marginally greater than that of conventional discrete HMMs. We present some evidence of superior performance in situations where the number of HMM distributions to be estimated is small compared with the amount of training data. We also show how our methods can accommodate feature vectors of much higher dimensionality than are traditionally used in speech recognition.
international conference on acoustics, speech, and signal processing | 1992
Yan Ming Cheng; Douglas D. O'Shaughnessy; Vishwa Gupta; Patrick Kenny; Matthew Lennig; Paul Mermelstein; S. Parthasarathy
The authors have assessed the possibility of modeling phone trajectories to accomplish speech recognition. This approach has been considered as one of the ways to model context-dependency in speech recognition based on the acoustic variability of phones in the current database. A hybrid segmental learning vector quantization/hidden Markov model (SLVQ/HMM) system has been developed and evaluated on a telephone speech database. The authors have obtained 85.27% correct phrase recognition with SLVQ alone. By combining the likelihoods issued by SLVQ and by HMM, the authors have obtained 94.5% correct phrase recognition, a small improvement over that obtained with HMM alone.<<ETX>>