Douglas B. Paul
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Douglas B. Paul.
human language technology | 1992
Douglas B. Paul; Janet M. Baker
The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.
international conference on acoustics, speech, and signal processing | 1990
Richard C. Rose; Douglas B. Paul
A speaker-independent hidden Markov model (HMM) keyword recognizer (KWR) based on a continuous-speech-recognition model is presented. The baseline keyword recognition system is described, and techniques for dealing with nonkeyword speech and linear channel effects are discussed. The training of acoustic models to provide an explicit representation of nonvocabulary speech is investigated. A likelihood ratio scoring procedure is used to account for sources of variability affecting keyword likelihood scores. An acoustic class-dependent spectral normalization procedure is used to provide explicit compensation for linear channel effects. Keyword recognition results for a standard conversational speech task with a 20-keyword vocabulary reach 82% probability of detection at a false alarm rate of 12 false alarms per keyword per hour.<<ETX>>
international conference on acoustics, speech, and signal processing | 1987
Richard P. Lippmann; Edward A. Martin; Douglas B. Paul
A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.
international conference on acoustics, speech, and signal processing | 1991
Douglas B. Paul
Two algorithms are presented for accelerating the operation of a stack decoder. The first is a method for computing the true least upper bound so that an optimal admissible A* search can be performed. The second is a set of methods for linearizing the computation required by a stack decoder. The A* search has been implemented in a continuous speech recognizer simulator and has demonstrated a significant speedup. The linearizing algorithm has been partially implemented in the simulator and has also shown significant computational savings.<<ETX>>
international conference on acoustics, speech, and signal processing | 1989
Douglas B. Paul
The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA resource management task (991-word vocabulary, perplexity 60 word-pair grammar) is 3.5% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied-mixture triphone models.<<ETX>>
international conference on acoustics speech and signal processing | 1988
Douglas B. Paul; Edward A. Martin
Most speech recognizers are sensitive to the speech style and the speakers environment. This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilots facemask) database. Performance on a 207 word, perplexity 14 task is 0.9% word error rate under office conditions and 2.5% (best speaker) and 5% (4 speaker average) for the normal test condition of the database.<<ETX>>
international conference on acoustics, speech, and signal processing | 1987
Douglas B. Paul
Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress. More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database.
international conference on acoustics, speech, and signal processing | 1985
Douglas B. Paul
Hidden Markov models (HMM) are the basis for some of the more successful systems for continuous and discrete utterance speech recognition. One of the reasons for the success of these models is their ability to train automatically from marked speech data. The currently known forward-backward and gradient training methods suffer from the problem that they converge to a local maximum rather than to the global maximum. Simulated annealing is a stochastic optimization procedure which can escape a local optimum in the hope of finding the global optimum when presented with a system which contains many local optima. This paper shows how simulated annealing may be used to train HMM systems. It is experimentally shown to locate what appears to be the global maximum with a higher probability than the forward-backward algorithm.
human language technology | 1989
Douglas B. Paul
The Lincoln stress-resistant HMM CSR has been extended to large vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA Resource Management task (991 word vocabulary, perplexity 60 word-pair grammar) [1] is 3.4% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied mixture triphone models.
international conference on acoustics, speech, and signal processing | 1993
Douglas B. Paul; Burhan F. Necioglu
Recognition of the Wall Street Journal (WSJ) pilot database, a continuous-speech-recognition (CSR) database which supports 5 K, 20 K, and up to 64 K-word CSR tasks, is examined. The original Lincoln tied-mixture hidden Markov model (HMM) CSR was implemented using a time-synchronous beam-pruned search of a static network which does not extend well to this task because the recognition network would be too large. Therefore, the recognizer has been converted to a stack decoder-based search strategy. This decoder has been shown to function effectively on up to 64 K-word recognition of continuous speech. Recognition-time adaptation has also been added to the recognizer. The acoustic modeling techniques and the implementation of the stack decoder used to obtain these results are described.<<ETX>>