Mitchel Weintraub | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mitchel Weintraub is active.

Explore More

Publication

Featured researches published by Mitchel Weintraub.

Speech Communication | 2000

Automatic scoring of pronunciation quality

Leonardo Neumeyer; Horacio Franco; Vassilios Digalakis; Mitchel Weintraub

We present a paradigm for the automatic assessment of pronunciation quality by machine. In this scoring paradigm, both native and nonnative speech data is collected and a database of human-expert ratings is created to enable the development of a variety of machine scores. We first discuss issues related to the design of speech databases and the reliability of human ratings. We then address pronunciation evaluation as a prediction problem, trying to predict the grade a human expert would assign to a particular skill. Using the speech and the expert-ratings databases, we build statistical models and introduce different machine scores that can be used as predictor variables. We validate these machine scores on the Voice Interactive Language Training System (VILTS) corpus, evaluating the pronunciation of American speakers speaking French and we show that certain machine scores, like the log-posterior and the normalized duration, achieve a correlation with the targeted human grades that is comparable to the human-to-human correlation when a sufficient amount of speech data is available.

international conference on spoken language processing | 1996

Automatic text-independent pronunciation scoring of foreign language student speech

Leonardo Neumeyer; Horacio Franco; Mitchel Weintraub; Patti Price

SRI International is currently involved in the development of a new generation of software systems for automatic scoring of pronunciation as part of the Voice Interactive Language Training System (VILTS) project. This paper describes the goals of the VILTS system, the speech corpus and the algorithm development. The automatic grading system uses SRIs Decipher/sup TM/ continuous speech recognition system to generate phonetic segmentations that are used to produce pronunciation scores at the end of each lesson. The scores produced by the system are similar to those of expert human listeners. Unlike previous approaches, in which models were built for specific sentences or phrases, we present a new family of algorithms designed to perform well even when knowledge of the exact text to be used is not available.

international conference on acoustics, speech, and signal processing | 1994

Probabilistic optimum filtering for robust speech recognition

Leonardo Neumeyer; Mitchel Weintraub

We present a new mapping algorithm for speech recognition that relates the features of simultaneous recordings of clean and noisy speech. The model is a piecewise linear transformation applied to the noisy speech feature. The transformation is a set of multidimensional linear least-squares filters whose outputs are combined using a conditional Gaussian model. The algorithm was tested using SRIs DECIPHER speech recognition system. Experimental results show how the mapping is used to reduce recognition errors when the training and testing acoustic environments do not match.<<ETX>>

international conference on acoustics, speech, and signal processing | 1995

LVCSR log-likelihood ratio scoring for keyword spotting

Mitchel Weintraub

A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system to generate the N-best answers along with their Viterbi alignments. The score for a putative hit is computed by summing the likelihoods for all hypotheses that contain the keyword normalized by dividing by the sum of all hypothesis likelihoods in the N-best list. Using a test set of conversational speech from Switchboard Credit Card conversations, we achieved an 81% figure of merit (FOM). Our word recognition error rate on this same test set is 54.7%.

international conference on acoustics, speech, and signal processing | 1997

Handset-dependent background models for robust text-independent speaker recognition

Larry P. Heck; Mitchel Weintraub

This paper studies the effects of handset distortion on telephone-based speaker recognition performance, resulting in the following observations: (1) the major factor in speaker recognition errors is whether the handset type (e.g., electret, carbon) is different across training and testing, not whether the telephone lines are mismatched, (2) the distribution of speaker recognition scores for true speakers is bimodal, with one mode dominated by matched handset tests and the other by mismatched handsets, (3) cohort-based normalization methods derive much of their performance gains from implicitly selecting cohorts trained with the same handset type as the claimant, and (4) utilizing a handset-dependent background model which is matched to the handset type of the claimants training data sharpens and separates the true and false speaker score distributions. Results on the 1996 NIST Speaker Recognition Evaluation corpus show that using handset-matched background models reduces false acceptances (at a 10% miss rate) by more than 60% over previously reported (handset-independent) approaches.

international conference on acoustics, speech, and signal processing | 1989

Linguistic constraints in hidden Markov model based speech recognition

Mitchel Weintraub; Hy Murveit; Michael Cohen; Patti Price; Jared Bernstein; G. Baldwin; D. Bell

A speaker-independent, continuous-speech, large-vocabulary speech recognition system, DECIPHER, has been developed. It provides state-of-the-art performance on the DARPA standard speaker-independent resource management training and testing materials. The approach is to integrate speech and linguistic knowledge into the HMM (hidden Markov model) framework. Performance improvements arising from detailed phonological modeling and from the incorporation of cross-word coarticulatory constraints are described. It is concluded that speech and linguistic knowledge sources can be used to improve the performance of HMM-based speech recognition systems provided that care is taken to incorporate these knowledge sources appropriately.<<ETX>>

international conference on acoustics, speech, and signal processing | 1993

Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system

Mitchel Weintraub

The application of the speaker-independent large-vocabulary CSR (continuous speech recognition) system DECIPHER to the keyword-spotting task is described. A transcription is generated for the incoming spontaneous speech by using a CSR system, and any keywords that occur in the transcription are hypothesized. It is shown that the use of improved models of nonkeyword speech with a CSR system can yield significantly improved keyword spotting performance. The algorithm for computing the score of a keyword combines information from acoustics, language, and duration. One key limitation of this approach is that keywords are only hypothesized if they are included in the Viterbi backtrace. This does not allow the system builder to operate effectively at high false alarm levels if desired. Other algorithms are being considered for hypothesizing good score keywords that are on high scoring paths. An algorithm for smoothing language model probabilities was also introduced. This algorithm combines small task-specific language model training data with large task-independent language training data, and provided a 14% reduction in test set perplexity.<<ETX>>

international conference on acoustics, speech, and signal processing | 1986

A computational model for separating two simultaneous talkers

Mitchel Weintraub

This paper describes a computational model that attempts to separate two simultaneous talkers. The goal of this model is to improve a speech recognition systems ability to recognize what each of the two talkers say. The model consists of the following stages: (1) an iterative dynamic programming algorithm to track the pitch period for each of the two talkers, (2) a Markov model to determine the characteristics (e.g. voiced-unvoiced) of each speakers voice, (3) a recursive algorithm that uses both local periodicity information and local spectral continuity constraints to compute a spectral estimate of each talker, (4) a resynthesis algorithm to convert the spectral estimate of each talker into a speech waveform, and (5) a speaker-independent continuous-digit-recognition system that attempts to recognize what each of two talkers is saying. The system was trained and tested on a database of simultaneous digit strings spoken by a male and female talker. An evaluation of the different stages of this model is presented.

international conference on acoustics, speech, and signal processing | 1995

Robust speech recognition in noise using adaptation and mapping techniques

Leonardo Neumeyer; Mitchel Weintraub

This paper compares three techniques for recognizing continuous speech in the presence of additive car noise: (1) transforming the noisy acoustic features using a mapping algorithm, (2) adaptation of the hidden Markov models (HMMs), and (3) combination of mapping and adaptation. To make the signal processing robust to additive noise, we apply a technique called probabilistic optimum filtering. We show that at low signal-to-noise ratio (SNR) levels, compensating in the feature and model domains yields similar performance. We also show that adapting the HMMs with the mapped features produces the best performance. The algorithms were implemented using SRIs DECIPHER speech recognition system and were tested on the 1994 ARPA-sponsored CSR evaluation test spoke 10.

IEEE Transactions on Speech and Audio Processing | 1993

Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech

Adoram Erell; Mitchel Weintraub

An estimation algorithm for noise robust speech recognition, the minimum mean log spectral distance (MMLSD), is presented. The estimation is matched to the recognizer by seeking to minimize the average distortion as measured by a Euclidean distance between filterbank log-energy vectors, approximating the weighted-cepstral distance used by the recognizer. The estimation is computed using a clean speech spectral probability distribution, estimated from a database, and a stationary, ARMA model for the noise. When trained on clean speech and tested with additive white noise at 10-dB SNR, the recognition accuracy with the MMLSD algorithm is comparable to that achieved with training the recognizer at the same constant 10-dB SNR. The algorithm is also highly efficient with a quasi-stationary environmental noise, recorded with a desktop microphone, and requires almost no tuning to differences between this noise and the computer-generated white noise. >

Explore More