Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David B. Roe is active.

Publication


Featured researches published by David B. Roe.


IEEE Communications Magazine | 1993

Whither speech recognition: the next 25 years

David B. Roe; Jay G. Wilpon

The fundamentals of speech recognition are reviewed. The dimensions of the speech recognition task, speech feature analysis, pattern classification using hidden Markov models, language processing, and the current accuracy of speech recognition systems are discussed. The applications of speech recognition in telecommunications, voice dictation, speech understanding for data retrieval, and consumer products are examined.<<ETX>>


international conference on acoustics, speech, and signal processing | 1987

Speech recognition with a noise-adapting codebook

David B. Roe

Speech recognizers trained in quiet conditions but operating in noise usually have poor accuracy. This paper reports two methods for improving the accuracy of an LPC vector-quantization speech recognizer by adapting the vector codebook to noisy conditions. First, each codebook vector is changed to reflect the way people speak in noise. Second, the estimated spectrum of the background noise is added to the codebook vectors. These ideas have been tested on a total of 2400 utterances of digits recorded in a car by 4 speakers. A baseline word spotter similar to NTTs SPLIT system was modified by adapting its vector codebook to noise. This adapted codebook, when used with a new word decision criterion, yields error rates at least 4 times lower for noisy conditions. The accuracy is significantly better than without codebook adaptation techniques.


international conference on acoustics, speech, and signal processing | 1992

Efficient grammar processing for a spoken language translation system

David B. Roe; Fernando Pereira; Richard Sproat; Michael Riley; Pedro J. Moreno; Alejandro Macarrón

A problem with many speech understanding systems is that grammars that are more suitable for representing the relation between sentences and their meanings, such as context free grammars (CFGs) and augmented phrase structure grammars (APSGs), are computationally very demanding. On the other hand, finite state grammars are efficient, but cannot represent directly the sentence-meaning relation. The authors describe how speech recognition and language analysis can be tightly coupled by developing an APSG for the analysis component and deriving automatically from it a finite-state approximation that is used as the recognition language model. Using this technique, the authors have built an efficient translation system that is fast compared to others with comparably sized language models.<<ETX>>


conference of the international speech communication association | 1992

A spoken language translator for restricted-domain context-free languages

David B. Roe; Pedro J. Moreno; Richard Sproat; Fernando Pereira; Michael Riley; Alejandro Macarrón

Abstract An effort is underway at AT&T Bell Laboratories and Telefonica Investigacion y Desarrollo to build a restricted domain spoken language translation system, which we call VEST (Voice English/Spanish Translator). The eventual goal is a voice output translator which is speaker-independent, and has a vocabulary of several thousand words covering a specific application. This paper describes the first step of our research, a system which recognizes two speakers in each of Spanish and English and is limited to some four hundred words. The key new idea is that the speech recognition and the language analysis are tightly coupled by using the same language model, an augmented phrase-structure grammar, for both.


IEEE Communications Magazine | 1990

A perspective on speech recognition

Stephen E. Levinson; David B. Roe

The authors outline the science behind speech recognition technology and describe briefly the contributions of engineering, computer science, and mathematics to it. They discuss the state-of-the-art in both technique and performance, including some examples of successful applications. This is followed by a critical evaluation of the technology with respect to technical, commercial, and societal criteria. They conclude that even with todays suboptimum technology, there are types of applications in which useful deployment is possible and desirable but that those applications that will transform our society must wait until speech recognizers have nearly the capabilities of humans.<<ETX>>


IEEE Transactions on Parallel and Distributed Systems | 1994

Spoken language recognition on a DSP array processor

Stephen C. Glinski; David B. Roe

A new architecture is presented to support the general class of real-time large-vocabulary speaker-independent continuous speech recognizers incorporating language models. Many such recognizers require multiple high-performance central processing units (CPUs) as well as high interprocessor communication bandwidth. This array processor provides a peak CPU performance of 2.56 giga-floating point operations per second (GFLOPS) as well as a high-speed communication network. In order to efficiently utilize these resources, algorithms were devised for partitioning speech models for mapping into the array processor. Also, a novel scheme is presented for a functional partitioning of the speech recognizer computations. The recognizer is functionally partitioned into six stages, namely, the linear predictive coding (LPC) based feature extractor, mixture probability computer, (phone) state probability computer, word probability computer, phrase probability computer, and traceback computer. Each of these stages is further subdivided as many times as necessary to fit the individual processing elements (PEs). The functional stages are pipelined and synchronized with the frame rate of the incoming speech signal. This partitioning also allows a multistage stack decoder to be implemented for reduction of computation. >


international conference on acoustics speech and signal processing | 1988

Parallel level-building on a tree machine (speech recognition)

Allen L. Gorin; David B. Roe

The authors describe a parallel frame-synchronous level-building algorithm, utilizing HMM word-models, for connected-speech recognition on a tree-structured parallel computer. The algorithm is scalable in the sense that the source code and execution time remain essentially the same as vocabulary size increases, so long as the hardware is scaled proportionally. An illustrative sizing and timing analysis of a speaker-independent connected-digit recognizer on the ASPEN tree-machine is described. This algorithm executes in real-time and achieves 98.3% string accuracy on the Texas Instruments digit data base.<<ETX>>


Journal of the Acoustical Society of America | 1988

Improved training procedures for hidden Markov models

Lawrence R. Rabiner; Chin-Hui Lee; Biing-Hwang Juang; David B. Roe; Jay G. Wilpon

Techniques for training hidden Markov model (HMM) parameters from a labeled training set of data are well established and include the forward‐backward algorithm as well as the segmental K‐means algorithm. These algorithms have been shown to be capable of estimating the parameters of an HMM based on mathematically well‐founded techniques. In practice, however, difficulties are often encountered when estimating some of the HMM parameters. These difficulties are generally the result of having insufficient training data to give robust and reliable parameter estimates. Typically, the model parameters most affected by having insufficient training data are the spectral parameter variance estimates, and the estimates of parameters related to the modeling of state duration. Although techniques have been proposed for improving estimates of the variances due to the effects of insufficient training data, the results have not proven adequate in some cases. As such, improved training techniques (which give better recognition performance) have been devised for controlling the minimum variance estimate of any spectral parameter, and for thresholding and clipping state duration parameter estimates. These improved training methods have been tested on several databases with good success. In addition, advanced techniques for creating multiple HMMs from the training data (i.e., for speaker independent recognition) have been devised and have proven successful for modeling large databases of training material.Techniques for training hidden Markov model (HMM) parameters from a labeled training set of data are well established and include the forward‐backward algorithm as well as the segmental K‐means algorithm. These algorithms have been shown to be capable of estimating the parameters of an HMM based on mathematically well‐founded techniques. In practice, however, difficulties are often encountered when estimating some of the HMM parameters. These difficulties are generally the result of having insufficient training data to give robust and reliable parameter estimates. Typically, the model parameters most affected by having insufficient training data are the spectral parameter variance estimates, and the estimates of parameters related to the modeling of state duration. Although techniques have been proposed for improving estimates of the variances due to the effects of insufficient training data, the results have not proven adequate in some cases. As such, improved training techniques (which give better recog...


conference of the international speech communication association | 1992

Speaker independent recognition of spontaneously spoken connected digits

Padma Ramesh; Jay G. Wilpon; Maureen A. McGee; David B. Roe; Chin-Hui Lee; Lawrence R. Rabiner

Abstract An important area of speech recognition is automatic recognition of connected digit strings (i.e., sequences composed of the digits zero through nine , and oh ). Applications of this technology include credit card authorization, catalog ordering, dialing of telephone numbers, and data entry. For the past two years AT&T has experimented with a system for automatic recognition of 10 digit merchant identification codes, and 15 digit customer credit card numbers, for the purpose of authorizing purchases charged to a credit card. Our evaluation used data collected from about 1000 customers who provided 2000 connected digit strings over 800-based dialed up telephone connections. The recognizer correctly recognized 97% of the digit strings with no rejections using constraints on the validity of both merchant identifications and credit card numbers. Several schemes for applying these task constraints in a practical implementation are discussed in this paper. Also, recognition of the dollar amounts of the transaction are presented with some preliminary results.


Proceedings of the National Academy of Sciences of the United States of America | 1994

Voice communication between humans and machines

David B. Roe; Jay G. Wilpon

Researchain Logo
Decentralizing Knowledge