Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Douglas B. Paul is active.

Publication


Featured researches published by Douglas B. Paul.


human language technology | 1992

The design for the wall street journal-based CSR corpus

Douglas B. Paul; Janet M. Baker

The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.


international conference on acoustics, speech, and signal processing | 1990

A hidden Markov model based keyword recognition system

Richard C. Rose; Douglas B. Paul

A speaker-independent hidden Markov model (HMM) keyword recognizer (KWR) based on a continuous-speech-recognition model is presented. The baseline keyword recognition system is described, and techniques for dealing with nonkeyword speech and linear channel effects are discussed. The training of acoustic models to provide an explicit representation of nonvocabulary speech is investigated. A likelihood ratio scoring procedure is used to account for sources of variability affecting keyword likelihood scores. An acoustic class-dependent spectral normalization procedure is used to provide explicit compensation for linear channel effects. Keyword recognition results for a standard conversational speech task with a 20-keyword vocabulary reach 82% probability of detection at a false alarm rate of 12 false alarms per keyword per hour.<<ETX>>


international conference on acoustics, speech, and signal processing | 1987

Multi-style training for robust isolated-word speech recognition

Richard P. Lippmann; Edward A. Martin; Douglas B. Paul

A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.


international conference on acoustics, speech, and signal processing | 1991

Algorithms for an optimal A* search and linearizing the search in the stack decoder

Douglas B. Paul

Two algorithms are presented for accelerating the operation of a stack decoder. The first is a method for computing the true least upper bound so that an optimal admissible A* search can be performed. The second is a set of methods for linearizing the computation required by a stack decoder. The A* search has been implemented in a continuous speech recognizer simulator and has demonstrated a significant speedup. The linearizing algorithm has been partially implemented in the simulator and has also shown significant computational savings.<<ETX>>


international conference on acoustics, speech, and signal processing | 1989

The Lincoln robust continuous speech recognizer

Douglas B. Paul

The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA resource management task (991-word vocabulary, perplexity 60 word-pair grammar) is 3.5% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied-mixture triphone models.<<ETX>>


international conference on acoustics speech and signal processing | 1988

Speaker stress-resistant continuous speech recognition

Douglas B. Paul; Edward A. Martin

Most speech recognizers are sensitive to the speech style and the speakers environment. This system extends an earlier robust continuous observation HMM IWR system to continuous speech using the DARPA-robust (multi-condition with a pilots facemask) database. Performance on a 207 word, perplexity 14 task is 0.9% word error rate under office conditions and 2.5% (best speaker) and 5% (4 speaker average) for the normal test condition of the database.<<ETX>>


international conference on acoustics, speech, and signal processing | 1987

A speaker-stress resistant HMM isolated word recognizer

Douglas B. Paul

Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress. More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database.


international conference on acoustics, speech, and signal processing | 1985

Training of HMM recognizers by simulated annealing

Douglas B. Paul

Hidden Markov models (HMM) are the basis for some of the more successful systems for continuous and discrete utterance speech recognition. One of the reasons for the success of these models is their ability to train automatically from marked speech data. The currently known forward-backward and gradient training methods suffer from the problem that they converge to a local maximum rather than to the global maximum. Simulated annealing is a stochastic optimization procedure which can escape a local optimum in the hope of finding the global optimum when presented with a system which contains many local optima. This paper shows how simulated annealing may be used to train HMM systems. It is experimentally shown to locate what appears to be the global maximum with a higher probability than the forward-backward algorithm.


human language technology | 1989

The Lincoln Continuous Speech Recognition system: recent developments and results

Douglas B. Paul

The Lincoln stress-resistant HMM CSR has been extended to large vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA Resource Management task (991 word vocabulary, perplexity 60 word-pair grammar) [1] is 3.4% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied mixture triphone models.


international conference on acoustics, speech, and signal processing | 1993

The Lincoln large-vocabulary stack-decoder HMM CSR

Douglas B. Paul; Burhan F. Necioglu

Recognition of the Wall Street Journal (WSJ) pilot database, a continuous-speech-recognition (CSR) database which supports 5 K, 20 K, and up to 64 K-word CSR tasks, is examined. The original Lincoln tied-mixture hidden Markov model (HMM) CSR was implemented using a time-synchronous beam-pruned search of a static network which does not extend well to this task because the recognition network would be too large. Therefore, the recognizer has been converted to a stack decoder-based search strategy. This decoder has been shown to function effectively on up to 64 K-word recognition of continuous speech. Recognition-time adaptation has also been added to the recognizer. The acoustic modeling techniques and the implementation of the stack decoder used to obtain these results are described.<<ETX>>

Collaboration


Dive into the Douglas B. Paul's collaboration.

Top Co-Authors

Avatar

Clifford J. Weinstein

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Edward A. Martin

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Richard P. Lippmann

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Burhan F. Necioglu

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Richard C. Rose

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge