Is this you? Create Your Porfile

Patrick Kenny

Institut national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick Kenny is active.

Explore More

Publication

Featured researches published by Patrick Kenny.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1990

A linear predictive HMM for vector-valued observations with applications to speech recognition

Patrick Kenny; Matthew Lennig; Paul Mermelstein

The authors describe a new type of Markov model developed to account for the correlations between successive frames of a speech signal. The idea is to treat the sequence of frames as a nonstationary autoregressive process whose parameters are controlled by a hidden Markov chain. It is shown that this type of model performs better than the standard multivariate Gaussian HMM (hidden Markov model) when it is incorporated into a large-vocabulary isolated-word recognizer. >

IEEE Transactions on Speech and Audio Processing | 1993

A*-admissible heuristics for rapid lexical access

Patrick Kenny; Rene Hollan; Vishwa Gupta; Matthew Lennig; Paul Mermelstein; Douglas D. O'Shaughnessy

A new class of A* algorithms for Viterbi phonetic decoding subject to lexical constraints is presented. This type of algorithm can be made to run substantially faster than the Viterbi algorithm in an isolated word recognizer having a vocabulary of 1600 words. In addition, multiple recognition hypotheses can be generated on demand and the search can be constrained in respect conditions on phone durations in such a way that computational requirements are substantially reduced. Results are presented on a 60000-word recognition task. >

IEEE Transactions on Signal Processing | 1991

Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition

Li Deng; Patrick Kenny; Matthew Lennig; Vishwa Gupta; Franz Seitz; Paul Mermelstein

The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another. >

IEEE Transactions on Signal Processing | 1992

Modeling acoustic transitions in speech by state-interpolation hidden Markov models

Li Deng; Patrick Kenny; Matthew Lennig; Paul Mermelstein

The authors present a new type of hidden Markov model (HMM) for vowel-to-consonant (VC) and consonant-to-vowel (CV) transitions based on the locus theory of speech perception. The parameters of the model can be trained automatically using the Baum-Welch algorithm and the training procedure does not require that instances of all possible CV and VC pairs be present. When incorporated into an isolated word recognizer with a 75000 word vocabulary it leads to the modest improvement in recognition rates. The authors give recognition results for the state interpolation HMM and compare them to those obtained by standard context-independent HMMs and generalized triphone models. >

international conference on acoustics, speech, and signal processing | 1991

Using phoneme duration and energy contour information to improve large vocabulary isolated-word recognition

Vishwa Gupta; Matthew Lennig; Paul Mermelstein; Patrick Kenny; F. Seitz; Douglas D. O'Shaughnessy

Minimum duration constraints and energy thresholds for phonemes were used to increase the recognition accuracy of an 86000-word speaker-trained isolated word recognizer. Minimum duration constraints force the phoneme models to map to acoustic segments longer than the duration minima for the phonemes. Such constraints result in significant lowering of likelihoods of many incorrect word choices, improving the accuracy of acoustic recognition and recognition with the language model. The phoneme models were also improved by correcting the segmentation of the phonemes in the training set. During training, the boundaries between phonemes are not marked accurately. Energy is used to correct these boundaries. Application of an energy threshold improves the segment boundaries between stops and sonorants (vowels, liquids, and glides), between fricatives and sonorants, between affricates and sonorants. and between breath noise and sonorants. On two speakers, the overall reduction in errors using minimum durations and energy thresholds is from 27.3% to 23.1% for acoustic recognition and from 14.3% to 8.8% with the language model.<<ETX>>

Computer Speech & Language | 1990

A dictionary for a very large vocabulary word recognition system

Philip F. Seitz; Vishwa Gupta; Matthew Lennig; Patrick Kenny; Li Deng; Douglas D. O'Shaughnessy; Paul Mermelstein

Abstract It is not too difficult to select a fairly small (on the order of 20 000 words) fixed recognition vocabulary that will cover over 99% of new input words when the task is limited to text in a specific knowledge domain and when one disregards names and acronyms. Achieving such a level of coverage is much more difficult when restrictions on knowledge domain and names are lifted, however. This report describes how we selected a 75 000-word English recognition vocabulary that covers over 98% of words in new newspaper text, including names and acronyms. Observations collected during the vocabulary selection process indicate the limiting factors for coverage of general knowledge domain text such as newspaper stories.

Speech Communication | 1994

Experiments in continuous speech recognition using books on tape

Patrick Kenny; Gilles Boulianne; Harinath Garudadri; S. Trudelle; Rene Hollan; Matthew Lennig; Douglas D. O'Shaughnessy

Abstract We present a new search algorithm for very large vocabulary continuous speech recognition. Continuous speech recognition with this algorithm is only about 10 times more computationally expensive than isolated word recognition. We report preliminary recognition results obtained by testing our recognizer on books on tape using a 60 000 word dictionary.

human language technology | 1992

An A* algorithm for very large vocabulary continuous speech recognition

Patrick Kenny; Rene Hollan; Gilles Boulianne; Harinath Garudadri; Matthew Lennig; Douglas D. O'Shaughnessy

We present a new search algorithm for very large vocabulary continuous speech recognition. Continuous speech recognition with this algorithm is only about 10 times more computationally expensive than isolated word recognition. We report preliminary recognition results obtained by testing our recognizer on books on tape using a 60,000 word dictionary.

Computer Speech & Language | 2000

Tree-structured vector quantization for speech recognition

M. Barszcz; W. Chen; Gilles Boulianne; Patrick Kenny

Abstract We describe some new methods for constructing discrete acoustic phonetic hidden Markov models (HMMs) using tree quantizers having very large numbers (16–64 K) of leaf nodes and tree-structured smoothing techniques. We consider two criteria for constructing tree quantizers (minimum distortion and minimum entropy) and three types of smoothing (mixture smoothing, smoothing by adding 1 and Gaussian smoothing). We show that these methods are capable of achieving recognition accuracies which are generally comparable to those obtained with Gaussian mixture HMMs at a computational cost which is only marginally greater than that of conventional discrete HMMs. We present some evidence of superior performance in situations where the number of HMM distributions to be estimated is small compared with the amount of training data. We also show how our methods can accommodate feature vectors of much higher dimensionality than are traditionally used in speech recognition.

international conference on acoustics, speech, and signal processing | 1992

Hybrid segmental-LVQ/HMM for large vocabulary speech recognition

Yan Ming Cheng; Douglas D. O'Shaughnessy; Vishwa Gupta; Patrick Kenny; Matthew Lennig; Paul Mermelstein; S. Parthasarathy

The authors have assessed the possibility of modeling phone trajectories to accomplish speech recognition. This approach has been considered as one of the ways to model context-dependency in speech recognition based on the acoustic variability of phones in the current database. A hybrid segmental learning vector quantization/hidden Markov model (SLVQ/HMM) system has been developed and evaluated on a telephone speech database. The authors have obtained 85.27% correct phrase recognition with SLVQ alone. By combining the likelihoods issued by SLVQ and by HMM, the authors have obtained 94.5% correct phrase recognition, a small improvement over that obtained with HMM alone.<<ETX>>

Explore More