Horacio Franco | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Horacio Franco is active.

Explore More

Publication

Featured researches published by Horacio Franco.

IEEE Transactions on Speech and Audio Processing | 1994

Connectionist probability estimators in HMM speech recognition

Steve Renals; Nelson Morgan; Michael Cohen; Horacio Franco

The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review the basis of HMM speech recognition and point out the possible benefits of incorporating connectionist networks. Issues necessary to the construction of a connectionist HMM recognition system are discussed, including choice of connectionist probability estimator. They describe the performance of such a system using a multilayer perceptron probability estimator evaluated on the speaker-independent DARPA Resource Management database. In conclusion, they show that a connectionist component improves a state-of-the-art HMM system. >

Speech Communication | 2000

Automatic scoring of pronunciation quality

Leonardo Neumeyer; Horacio Franco; Vassilios Digalakis; Mitchel Weintraub

We present a paradigm for the automatic assessment of pronunciation quality by machine. In this scoring paradigm, both native and nonnative speech data is collected and a database of human-expert ratings is created to enable the development of a variety of machine scores. We first discuss issues related to the design of speech databases and the reliability of human ratings. We then address pronunciation evaluation as a prediction problem, trying to predict the grade a human expert would assign to a particular skill. Using the speech and the expert-ratings databases, we build statistical models and introduce different machine scores that can be used as predictor variables. We validate these machine scores on the Voice Interactive Language Training System (VILTS) corpus, evaluating the pronunciation of American speakers speaking French and we show that certain machine scores, like the log-posterior and the normalized duration, achieve a correlation with the targeted human grades that is comparable to the human-to-human correlation when a sufficient amount of speech data is available.

international conference on acoustics, speech, and signal processing | 1997

Automatic pronunciation scoring for language instruction

Horacio Franco; Leonardo Neumeyer; Yoon Kim; Orith Ronen

This work is part of an effort aimed at developing computer-based systems for language instruction; we address the task of grading the pronunciation quality of the speech of a student of a foreign language. The automatic grading system uses SRIs Decipher/sup TM/ continuous speech recognition system to generate phonetic segmentations. Based on these segmentations and probabilistic models we produce pronunciation scores for individual or groups of sentences. Scores obtained from expert human listeners are used as the reference to evaluate the different machine scores and to provide targets when training some of the algorithms. In previous work we had found that duration-based scores outperformed HMM log-likelihood-based scores. In this paper we show that we can significantly improve HMM-based scores by using average phone segment posterior probabilities. Correlation between machine and human scores went up from r=0.50 with likelihood-based scores to r=0.88 with posterior-based scores. The new measures also outperformed duration-based scores in their ability to produce reliable scores from only a few sentences.

international conference on spoken language processing | 1996

Automatic text-independent pronunciation scoring of foreign language student speech

Leonardo Neumeyer; Horacio Franco; Mitchel Weintraub; Patti Price

SRI International is currently involved in the development of a new generation of software systems for automatic scoring of pronunciation as part of the Voice Interactive Language Training System (VILTS) project. This paper describes the goals of the VILTS system, the speech corpus and the algorithm development. The automatic grading system uses SRIs Decipher/sup TM/ continuous speech recognition system to generate phonetic segmentations that are used to produce pronunciation scores at the end of each lesson. The scores produced by the system are similar to those of expert human listeners. Unlike previous approaches, in which models were built for specific sentences or phrases, we present a new family of algorithms designed to perform well even when knowledge of the exact text to be used is not available.

Speech Communication | 2000

Combination of machine scores for automatic grading of pronunciation quality

Horacio Franco; Leonardo Neumeyer; Vassilios Digalakis; Orith Ronen

This work is part of an effort aimed at developing computer-based systems for language instruction; we address the task of grading the pronunciation quality of the speech of a student of a foreign language. The automatic grading system uses SRIs DecipherTM continuous speech recognition system to generate phonetic segmentations. Based on these segmentations and probabilistic models we produce different pronunciation scores for individual or groups of sentences that can be used as predictors of the pronunciation quality. Different types of these machine scores can be combined to obtain a better prediction of the overall pronunciation quality. In this paper we review some of the best-performing machine scores and discuss the application of several methods based on linear and nonlinear mapping and combination of individual machine scores to predict the pronunciation quality grade that a human expert would have given. We evaluate these methods in a database that consists of pronunciation-quality-graded speech from American students speaking French. With predictors based on spectral match and on durational characteristics, we find that the combination of scores improved the prediction of the human grades and that nonlinear mapping and combination methods performed better than linear ones. Characteristics of the different nonlinear methods studied are discussed.

international conference on acoustics, speech, and signal processing | 2012

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

Vikramjit Mitra; Horacio Franco; Martin Graciarena; Arindam Mandal

Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teagers nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRIs DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.

international conference on acoustics, speech, and signal processing | 1992

Connectionist probability estimation in the DECIPHER speech recognition system

Steve Renals; Nelson Morgan; Michael Cohen; Horacio Franco

The authors have previously demonstrated that feedforward networks can be used to estimate local output probabilities in hidden Markov model (HMM) speech recognition systems (Renals et al., 1991). These connectionist techniques are integrated into the DECIPHER system, with experiments being performed using the speaker-independent DARPA RM database. The results indicate that: connectionist probability estimation can improve performance of a context-independent maximum-likelihood-trained HMM system; performance of the connectionist system is close to what can be achieved using (context-dependent) HMM systems of much higher complexity; and mixing connectionist and maximum-likelihood estimates can improve the performance of the state-of-the-art context-independent HMM system.<<ETX>>

Language Testing | 2010

EduSpeak[R]: A Speech Recognition and Pronunciation Scoring Toolkit for Computer-Aided Language Learning Applications

Horacio Franco; Harry Bratt; Romain Rossier; Venkata Ramana Rao Gadde; Elizabeth Shriberg; Victor Abrash; Kristin Precoda

SRI International’s EduSpeak® system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to specific production problems. We review our approach to pronunciation scoring, where our aim is to estimate the grade that a human expert would assign to the pronunciation quality of a paragraph or a phrase. Using databases of nonnative speech and corresponding human ratings at the sentence level, we evaluate different machine scores that can be used as predictor variables to estimate pronunciation quality. For more specific feedback on pronunciation, the EduSpeak toolkit supports a phone-level mispronunciation detection functionality that automatically flags specific phone segments that have been mispronounced. Phone-level information makes it possible to provide the student with feedback about specific pronunciation mistakes.Two approaches to mispronunciation detection were evaluated in a phonetically transcribed database of 130,000 phones uttered in continuous speech sentences by 206 nonnative speakers. Results show that classification error of the best system, for the phones that can be reliably transcribed, is only slightly higher than the average pairwise disagreement between the human transcribers.

International Journal of Pattern Recognition and Artificial Intelligence | 1993

HYBRID NEURAL NETWORK/HIDDEN MARKOV MODEL SYSTEMS FOR CONTINUOUS SPEECH RECOGNITION

Nelson Morgan; Steve Renals; Michael Cohen; Horacio Franco

MultiLayer Perceptrons (MLP) are an effective family of algorithms for the smooth estimation of highly-dimensioned probability density functions that are useful in continuous speech recognition. Hidden Markov Models (HMM) provide a structure for the mapping of a temporal sequence of acoustic vectors to a generating sequence of states. For HMMs that are independent of phonetic context, the MLP approaches have consistently provided significant improvements (once we learned how to use them). Recently, these results have been extended to context-dependent models. In this paper, after having reviewed the basic principles of our hybrid HMM/MLP approach, we describe a series of experiments with continuous speech recognition. The hybrid methods directly trade off computational complexity for reduced requirements of memory and memory bandwidth. Results are presented on the widely used Resource Management speech database that is distributed by the National Institute of Standards and Technology. These results demons...

international conference on acoustics, speech, and signal processing | 2014

Medium-duration modulation cepstral feature for robust speech recognition

Vikramjit Mitra; Horacio Franco; Martin Graciarena; Dimitra Vergyri

Studies have shown that the performance of state-of-the-art automatic speech recognition (ASR) systems significantly deteriorate with increased noise levels and channel degradations, when compared to human speech recognition capability. Traditionally, noise-robust acoustic features are deployed to improve speech recognition performance under varying background conditions to compensate for the performance degradations. In this paper, we present the Modulation of Medium Duration Speech Amplitude (MMeDuSA) feature, which is a composite feature capturing subband speech modulations and a summary modulation. We analyze MMeDuSAs speech recognition performance using SRI Internationals DECIPHER® large vocabulary continuous speech recognition (LVCSR) system, on noise and channel degraded Levantine Arabic speech distributed through the Defense Advance Research Projects Agency (DARPA) Robust Automatic Speech Transcription (RATS) program. We also analyzed MMeDuSAs performance against the Aurora-4 noise-and-channel degraded English corpus. Our results from all these experiments suggest that the proposed MMeDuSA feature improved recognition performance under both noisy and channel degraded conditions in almost all the recognition tasks.

Explore More