Erik McDermott | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik McDermott is active.

Explore More

Publication

Featured researches published by Erik McDermott.

Journal of the Acoustical Society of America | 2007

Speech production knowledge in automatic speech recognition

Simon King; Joe Frankel; Karen Livescu; Erik McDermott; Korin Richmond; Mirjam Wester

Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.

IEEE Transactions on Speech and Audio Processing | 2001

An application of discriminative feature extraction to filter-bank-based speech recognition

Alain Biem; Shigeru Katagiri; Erik McDermott; Biing-Hwang Juang

A pattern recognizer is usually a modular system which consists of a feature extractor module and a classifier module. Traditionally, these two modules have been designed separately, which may not result in an optimal recognition accuracy. To alleviate this fundamental problem, the authors have developed a design method, named discriminative feature extraction (DFE), that enables one to design the overall recognizer, i.e., both the feature extractor and the classifier, in a manner consistent with the objective of minimizing recognition errors. This paper investigates the application of this method to designing a speech recognizer that consists of a filter-hank feature extractor and a multi-prototype distance classifier. Carefully investigated experiments demonstrate that DFE achieves the design of a better recognizer and provides an innovative recognition-oriented analysis of the filter-bank, as an alternative to conventional analysis based on psychoacoustic expertise or heuristics.

international conference on acoustics, speech, and signal processing | 1989

Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2

Erik McDermott; Shigeru Katagiri

The authors describe a shift-tolerant neural network architecture for phoneme recognition. The system is based on LVQ2, an algorithm which pays close attention to approximating the optimal Bayes decision line in a discrimination task. Recognition performances in the 98-99% correct range were obtained for LVQ2 networks aimed at speaker-dependent recognition of phonemes in small but ambiguous Japanese phonemic classes. A correct recognition rate of 97.7% was achieved by a single, larger LVQ2 network covering all Japanese consonants. These recognition results are at least as high as those obtained in the time delay neural network system and suggest that LVQ2 could be the basis for a successful speech recognition system.<<ETX>>

Computer Speech & Language | 1994

Prototype-based minimum classification error/generalized probabilistic descent training for various speech units

Erik McDermott; Shigeru Katagiri

Abstract In previous work we reported high classification rates for learning vector quantization (LVQ) networks trained to classify phoneme tokens shifted in time. It has since been shown that the framework of minimum classification error (MCE) and generalized probabilistic descent (GPD) can treat LVQ as a special case of a general method for gradient descent on a rigorously defined classification loss measure that closely reflects the misclassification rate. This framework allows us to extend LVQ into a prototype-based minimum error classifier (PBMEC) appropriate for the classification of various speech units which the original LVQ was unable to treat. Speech categories are represented using a prototype-based multi-state architecture incorporating a dynamic time warping procedure. We present results for the difficult E-set task, as well as for isolated word recognition for a vocabulary of 5240 words, that reveal clear gains in performance as a result of using PBMEC. In addition, we discuss the issue of smoothing the loss function from the perspective of increasing classifier robustness.

IEEE Transactions on Signal Processing | 1991

LVQ-based shift-tolerant phoneme recognition

Erik McDermott; Shigeru Katagiri

A shift-tolerant neural network architecture for phoneme recognition is described. The system is based on algorithms for learning vector quantization (LVQ), recently developed by Kohonen (1986, 1988), which pay close attention to approximating optimal decision lines in a discrimination task. Recognition performances in the 98%-99% correct range were obtained for LVQ networks aimed at speaker-dependent recognition of phonemes in small but ambiguous Japanese phonemic classes. A correct recognition rate of 97.7% was achieved by a large LVQ network covering all Japanese consonants. These recognition results are as good as those obtained in the time delay neural network system developed by Waibel et al. (1989), and suggest that LVQ could be the basis for a high-performance speech recognition system. >

international conference on acoustics, speech, and signal processing | 2010

Discriminative training based on an integrated view of MPE and MMI in margin and error space

Erik McDermott; Shinji Watanabe; Atsushi Nakamura

Recent work has demonstrated that the Maximum Mutual Information (MMI) objective function is mathematically equivalent to a simple integral of recognition error, if the latter is expressed as a margin-based Minimum Phone Error (MPE) style error-weighted objective function. This led to the proposal of a general approach to discriminative training based on integrals of MPE-style loss, calculated using “differenced MMI” (dMMI), a finite difference of MMI functionals evaluated at the edges of a margin interval. This article aims to clarify the essence and practical consequences of the new framework. The recently proposed Error-Indexed Forward-Backward Algorithm is used to visualize the close agreement between dMMI and MPE statistics for narrow margin intervals, and to illustrate the flexible control of the weight that can be given to different error levels using broader intervals. New speech recognition results are presented for the MIT OpenCourseWare/MIT-World corpus, showing small performance gains for dMMI compared to MPE for some choices of margin interval. Evaluation with an expanded 44K word trigram language model confirms that dMMI with a narrow margin interval yields the same performance as MPE.

international conference on acoustics, speech, and signal processing | 2004

Minimum classification error training of landmark models for real-time continuous speech recognition

Erik McDermott; Timothy J. Hazen

Though many studies have shown the effectiveness of the minimum classification error (MCE) approach to discriminative training of HMM for speech recognition, few if any have reported MCE results for large (> 100 hours) training sets in the context of real-world, continuous speech recognition. Here we report large gains in performance for the MIT JUPITER weather information task as a result of MCE-based batch optimization of acoustic models. Investigation of word error rate versus computation time showed that small MCE models significantly outperform the maximum likelihood (ML) baseline at all points of equal computation time, resulting in up to 20% word error rate reduction for in-vocabulary utterances. The overall MCE loss function was minimized using Quickprop, a simple but effective second-order optimization method suited to parallelization over large training sets.

Computer Speech & Language | 2004

A derivation of minimum classification error from the theoretical classification risk using Parzen estimation

Erik McDermott; Shigeru Katagiri

The minimum classification error (MCE) framework is an approach to discriminative training for pattern recognition that explicitly incorporates a smoothed version of classification performance into the recognizer design criterion. Many studies have confirmed the effectiveness of MCE for speech recognition. In this article, we present a theoretical analysis of the smoothness of the MCE loss function. Specifically, we show that the MCE criterion function is equivalent to a Parzen window-based estimate of the theoretical classification risk. In this analysis, each training token is mapped to the center of a Parzen kernel in the domain of a suitably defined random variable. The kernels are summed to produce a density estimate; this estimate in turn can easily be integrated over the domain of incorrect classifications, yielding the risk estimate. The expression of risk for each kernel corresponds directly to the usual MCE loss function. The specific form of the Parzen window corresponds to the specific form of the MCE loss function. The derivation presented here shows that the smooth MCE loss function, far from being an ad-hoc approximation of the true error, can be seen as the direct consequence of using a well-understood type of smoothing, Parzen estimation, to estimate the theoretical risk from a finite training set. This analysis provides a novel link between the MCE empirical cost measured on a finite training set and the theoretical classification risk.

international conference on acoustics, speech, and signal processing | 1992

Prototype-based discriminative training for various speech units

Erik McDermott; Shigeru Katagiri

It has since been shown that learning vector quantisation (LVQ) is a special case of a more general method, generalized probabilistic descent (GPD), for gradient descent on a rigorously defined classification loss measure that closely reflects the misclassification rate. The authors to extend LVQ into a prototype-based classifier appropriate for the classification of various long speech units. For word recognition, a dynamic time warping procedure is integrated into the GPD learning procedure. The resulting minimum error classifier (MEC) is no longer a purely LVQ-like method, and it is called the prototype-based minimum error classifier (PBMEC). Results for the difficult Bell Labs E-set task as well as for speaker-dependent isolated word recognition for a vocabulary of 5240 words are presented. They reveal clear gains in performance as a result of using PBMEC.<<ETX>>

international conference on acoustics, speech, and signal processing | 1991

Speaker-independent large vocabulary word recognition using an LVQ/HMM hybrid algorithm

Hitoshi Iwamida; Shigeru Katagiri; Erik McDermott

An LVQ-HMM (learning vector quantization/hidden Markov model) hybrid algorithm was evaluated. For this evaluation, large vocabulary (5240 word) Japanese word recognition and Japanese phrase recognition were examined. Experiments in both a speaker-dependent and a speaker-independent mode were conducted. Comparison with conventional HMMs showed that LVQ-HMM improved recognition rates for words and phrases as well as for phonemes. In the speaker-dependent mode, LVQ-HMM yielded clear increases in word/phrase recognition accuracy; improvements in the recognition rates ranged between 0.8% and 4.3%. In the speaker-independent mode, however, increases in the word/phrase accuracy were small, but the results suggested some points for further study, e.g., an increase of 8% in recognition rate was achieved for one of the unknown (test) speakers, and LVQ-HMM seemed particularly effective in improving the performance for the test speakers on which conventional HMMs performed poorly.<<ETX>>

Explore More