Mitch Weintraub | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mitch Weintraub is active.

Explore More

Publication

Featured researches published by Mitch Weintraub.

international conference on acoustics, speech, and signal processing | 1993

Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques

Hy Murveit; John Butzberger; Vassilios Digalakis; Mitch Weintraub

The authors describe a technique called progressive search which is useful for developing and implementing speech recognition systems with high computational requirements. The scheme iteratively uses more and more complex recognition schemes, where each iteration constrains the speech space of the next. An algorithm called the forward-backward word-life algorithm is described. It can generate a word lattice in a progressive search that would be used as a language model embedded in a succeeding recognition pass to reduce computation requirements. It is shown that speed-ups of more than an order of magnitude are achievable with only minor costs in accuracy.<<ETX>>

international conference on acoustics, speech, and signal processing | 1997

Neural-network based measures of confidence for word recognition

Mitch Weintraub; Françoise Beaufays; Ze'ev Rivlin; Yochai Konig; Andreas Stolcke

This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance of the recognition and confidence systems. The definitions and algorithms are illustrated with results on the Switchboard Corpus.

human language technology | 1992

Reduced channel dependence for speech recognition

Hy Murveit; John Butzberger; Mitch Weintraub

Speech recognition systems tend to be sensitive to unimportant steady-state variation in speech spectra (i.e. those caused by varying the microphone or channel characteristics). There have been many attempts to solve this problem; however, these techniques are often computationally burdensome, especially for real-time implementation. Recently, Hermansy et al. [1] and Hirsch et al. [2] have suggested a simple technique that removes slow-moving linear channel variation with little adverse effect on speech recognition performance. In this paper we examine this technique, known as RASTA filtering, and evaluate its performance when applied to SRIs DECIPHER™ speech recognition system [3]. We show that RASTA filtering succeeds in reducing DECIPHER™s dependence on the channel.

international conference on acoustics, speech, and signal processing | 1990

The decipher speech recognition system

Moshik Cohen; Hy Murveit; Jared Bernstein; Patti Price; Mitch Weintraub

A large-vocabulary, continuous-speech system, called DECIPHER, which is based on a hidden Markov model (HMM) approach and is designed to achieve high word accuracy in a speaker-independent mode is described. The results of a series of experiments that test acoustic and phonological adaptation of the DECIPHER system to the pronunciations of a single speaker in a speaker-dependent task are presented. Estimating the probabilities of alternative pronunciations and speaker-dependent phonology are discussed.<<ETX>>

human language technology | 1989

SRI's DECIPHER system

Hy Murveit; Michael Cohen; Patti Price; Mitch Weintraub; Jared Bernstein

SRI has developed a speaker-independent continuous speech, large vocabulary speech recognition system, DECIPHER, that provides state-of-the-art performance on the DARPA standard speaker-independent resource management training and testing materials. SRIs approach is to integrate speech and linguistic knowledge into the HMM framework. This paper describes performance improvements arising from detailed phonological modeling and from the incorporation of cross-word coarticulatory constraints.

international conference on acoustics, speech, and signal processing | 1990

Estimation using log-spectral-distance criterion for noise-robust speech recognition

Adoram Erell; Mitch Weintraub

A spectral-estimation algorithm designed to improve the noise robustness of speech-recognition systems is presented and evaluated. The algorithm is tailored for filter-bank-based systems, where the estimation seeks to minimize the distortion as measured by the recognizers distance metric. This minimization is achieved by modeling the speech distribution as consisting of clusters; the energies at different frequency channels are assumed to be uncorrelated within each cluster. The algorithm was tested with a continuous-speech, speaker-independent hidden Markov model (HMM) recognition system using the NIST Resource Management Task speech database. When trained on a clean speech database and tested with additive white Gaussian noise, the recognition accuracy with the new algorithm is comparable to that under the ideal condition of training and testing at constant SNR. When trained on clean speech and tested with a desktop microphone in a noisy environment, the error rate is only slightly higher than that with a close-talking microphone.<<ETX>>

human language technology | 1993

Progressive-search algorithms for large-vocabulary speech recognition

Hy Murveit; John Butzberger; Vassilios Digalakis; Mitch Weintraub

We describe a technique we call Progressive Search which is useful for developing and implementing speech recognition systems with high computational requirements. The scheme iteratively uses more and more complex recognition schemes, where each iteration constrains the search space of the next. An algorithm, the Forward-Backward Word-Life Algorithm, is described. It can generate a word lattice in a progressive search that would be used as a language model embedded in a succeeding recognition pass to reduce computation requirements. We show that speed-ups of more than an order of magnitude are achievable with only minor costs in accuracy.

human language technology | 1990

Training set issues in SRI's DECIPHER speech recognition system

Hy Murveit; Mitch Weintraub; Mike Cohen

SRI has developed the DECIPHER system, a hidden Markov model (HMM) based continuous speech recognition system typically used in a speaker-independent manner. Initially we review the DECIPHER system, then we show that DECIPHERs speaker-independent performance improved by 20% when the standard 3990-sentence speaker-independent test set was augmented with training data from the 7200-sentence resource management speaker-dependent training sentences. We show a further improvement of over 20% when a version of corrective training was implemented. Finally we show improvement using parallel male- and female-trained models in DECIPHER. The word-error rate when all three improvements were combined was 3.7% on DARPAs February 1989 speaker-independent test set using the standard perplexity 60 wordpair grammar.

Journal of the Acoustical Society of America | 1989

Automatic evaluation of English spoken by Japanese students

Jared Bernstein; Mitch Weintraub; Mike Cohen; Hy Murveit

The paper describes the methods and results of a study of the feasibility of automatically grading the performance of Japanese students when reading English aloud. SRI recorded 31 adult Japanese speakers: 22 men and 9 women. Each Japanese speaker read six sentences aloud. All 186 recorded utterances were presented in a random order for rating by three expert listeners who rated the utterances on two occasions. Speech‐grading software was developed from an adaptive hidden‐Markov‐model (HMM) speech‐recognition system. The grading procedure is a two‐step process: First, the speech to be graded is aligned, then the segments of the speech signal that are located are compared with models of those segments that have been developed from a database of speech from native speakers of English. Important points in the results are: (1) ratings of speech quality by expert listeners are extremely reliable, and (2) automatic grades from the system correlate well (>0.8) with those ratings.

international conference on acoustics, speech, and signal processing | 1991

Pitch-aided spectral estimation for noise-robust speech recognition

Adoram Erell; Mitch Weintraub

A method for utilizing the quasi-periodicity of speech in a minimum-mean-square-error (MMSE) estimation of the DFT log-amplitude, either for speech enhancement or for noise-robust speech recognition, is described. The estimator takes into account the periodicity by conditioning the estimate of voiced speech on the distance between the frequency of any given DFT coefficient and the nearest harmonic. The DFT estimator is also made conditional on the broadband spectrum, so that the correlation between distant frequencies is partially taken into account. The algorithm has been tested with computer-room noise using an MSE criterion for the spectral envelope, defined by Mel-scale filterbank log-energies, and in recognition experiments. The MSE for voiced speech is reduced significantly by the periodicity-conditioning. Recognition accuracy is not improved because the overwhelming majority of errors occur in unvoiced speech.<<ETX>>

Explore More