Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Leonardo Neumeyer is active.

Publication


Featured researches published by Leonardo Neumeyer.


IEEE Transactions on Speech and Audio Processing | 1995

Speaker adaptation using constrained estimation of Gaussian mixtures

Vassilios Digalakis; Dimitry Rtischev; Leonardo Neumeyer

A trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, the authors propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both native and nonnative speakers of American English. For nonnative speakers, the recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers. For native speakers, the recognition performance after adaptation improves to the accuracy of speaker-dependent systems that use six times as much training data. >


Speech Communication | 2000

Automatic scoring of pronunciation quality

Leonardo Neumeyer; Horacio Franco; Vassilios Digalakis; Mitchel Weintraub

We present a paradigm for the automatic assessment of pronunciation quality by machine. In this scoring paradigm, both native and nonnative speech data is collected and a database of human-expert ratings is created to enable the development of a variety of machine scores. We first discuss issues related to the design of speech databases and the reliability of human ratings. We then address pronunciation evaluation as a prediction problem, trying to predict the grade a human expert would assign to a particular skill. Using the speech and the expert-ratings databases, we build statistical models and introduce different machine scores that can be used as predictor variables. We validate these machine scores on the Voice Interactive Language Training System (VILTS) corpus, evaluating the pronunciation of American speakers speaking French and we show that certain machine scores, like the log-posterior and the normalized duration, achieve a correlation with the targeted human grades that is comparable to the human-to-human correlation when a sufficient amount of speech data is available.


international conference on acoustics, speech, and signal processing | 1997

Automatic pronunciation scoring for language instruction

Horacio Franco; Leonardo Neumeyer; Yoon Kim; Orith Ronen

This work is part of an effort aimed at developing computer-based systems for language instruction; we address the task of grading the pronunciation quality of the speech of a student of a foreign language. The automatic grading system uses SRIs Decipher/sup TM/ continuous speech recognition system to generate phonetic segmentations. Based on these segmentations and probabilistic models we produce pronunciation scores for individual or groups of sentences. Scores obtained from expert human listeners are used as the reference to evaluate the different machine scores and to provide targets when training some of the algorithms. In previous work we had found that duration-based scores outperformed HMM log-likelihood-based scores. In this paper we show that we can significantly improve HMM-based scores by using average phone segment posterior probabilities. Correlation between machine and human scores went up from r=0.50 with likelihood-based scores to r=0.88 with posterior-based scores. The new measures also outperformed duration-based scores in their ability to produce reliable scores from only a few sentences.


international conference on spoken language processing | 1996

Automatic text-independent pronunciation scoring of foreign language student speech

Leonardo Neumeyer; Horacio Franco; Mitchel Weintraub; Patti Price

SRI International is currently involved in the development of a new generation of software systems for automatic scoring of pronunciation as part of the Voice Interactive Language Training System (VILTS) project. This paper describes the goals of the VILTS system, the speech corpus and the algorithm development. The automatic grading system uses SRIs Decipher/sup TM/ continuous speech recognition system to generate phonetic segmentations that are used to produce pronunciation scores at the end of each lesson. The scores produced by the system are similar to those of expert human listeners. Unlike previous approaches, in which models were built for specific sentences or phrases, we present a new family of algorithms designed to perform well even when knowledge of the exact text to be used is not available.


Speech Communication | 2000

Combination of machine scores for automatic grading of pronunciation quality

Horacio Franco; Leonardo Neumeyer; Vassilios Digalakis; Orith Ronen

This work is part of an effort aimed at developing computer-based systems for language instruction; we address the task of grading the pronunciation quality of the speech of a student of a foreign language. The automatic grading system uses SRIs DecipherTM continuous speech recognition system to generate phonetic segmentations. Based on these segmentations and probabilistic models we produce different pronunciation scores for individual or groups of sentences that can be used as predictors of the pronunciation quality. Different types of these machine scores can be combined to obtain a better prediction of the overall pronunciation quality. In this paper we review some of the best-performing machine scores and discuss the application of several methods based on linear and nonlinear mapping and combination of individual machine scores to predict the pronunciation quality grade that a human expert would have given. We evaluate these methods in a database that consists of pronunciation-quality-graded speech from American students speaking French. With predictors based on spectral match and on durational characteristics, we find that the combination of scores improved the prediction of the human grades and that nonlinear mapping and combination methods performed better than linear ones. Characteristics of the different nonlinear methods studied are discussed.


IEEE Transactions on Speech and Audio Processing | 1996

Speaker adaptation using combined transformation and Bayesian methods

Vassilios Digalakis; Leonardo Neumeyer

Adapting the parameters of a statistical speaker independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers.


IEEE Journal on Selected Areas in Communications | 1999

Quantization of cepstral parameters for speech recognition over the World Wide Web

Vassilios Digalakis; Leonardo Neumeyer; Manolis Perakakis

We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web (WWW). We compare a server-only processing model where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second.


international conference on acoustics, speech, and signal processing | 1994

Probabilistic optimum filtering for robust speech recognition

Leonardo Neumeyer; Mitchel Weintraub

We present a new mapping algorithm for speech recognition that relates the features of simultaneous recordings of clean and noisy speech. The model is a piecewise linear transformation applied to the noisy speech feature. The transformation is a set of multidimensional linear least-squares filters whose outputs are combined using a conditional Gaussian model. The algorithm was tested using SRIs DECIPHER speech recognition system. Experimental results show how the mapping is used to reduce recognition errors when the training and testing acoustic environments do not match.<<ETX>>


Computer Speech & Language | 2000

Efficient speech recognition using subvector quantization and discrete-mixture HMMS

Vassilios Digalakis; Stavros Tsakalidis; Costas Harizakis; Leonardo Neumeyer

This paper introduces a new form of observation distributions for hidden Markov models (HMMs), combining subvector quantization and mixtures of discrete distributions. Despite what is generally believed, we show that discrete-distribution HMMs can outperform continuous-density HMMs at significantly faster decoding speeds. Performance of the discrete HMMs is improved by using product-code vector quantization (VQ) and mixtures of discrete distributions. The decoding speed of the discrete HMMs is also improved by quantizing subvectors of coefficients, since this reduces the number of table lookups needed to compute the output probabilities. We present efficient training and decoding algorithms for the discrete-mixture HMMs (DMHMMs). Our experimental results in the air-travel information domain show that the high level of recognition accuracy of continuous-mixture-density HMMs (CDHMMs) can be maintained at significantly faster decoding speeds. Moreover, we show that when the same number of mixture components is used in DMHMMs and CDHMMs, the new models exhibit superior recognition performance.


international conference on acoustics, speech, and signal processing | 1997

Development of dialect-specific speech recognizers using adaptation methods

Vassilios Diakoloukas; Vassilios Digalakis; Leonardo Neumeyer; Jaan Kaja

Several adaptation approaches have been proposed in an effort to improve the speech recognition performance in mismatched conditions. However, the application of these approaches had been mostly constrained to the speaker or channel adaptation tasks. We first investigate the effect of mismatched dialects between training and testing speakers in an automatic speech recognition (ASR) system. We find that a mismatch in dialects significantly influences the recognition accuracy. Consequently, we apply several adaptation approaches to develop a dialect-specific recognition system using a dialect-dependent system trained on a different dialect and a small number of training sentences from the target dialect. We show that adaptation improves the recognition performance dramatically with small amounts of training sentences. We further show that, although the recognition performance of traditionally trained systems highly degrades as we decrease the number of training speakers, the performance of adapted systems is not influenced so much.

Collaboration


Dive into the Leonardo Neumeyer's collaboration.

Top Co-Authors

Avatar

Vassilios Digalakis

Technical University of Crete

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge