Fileno A. Alleva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fileno A. Alleva is active.

Explore More

Publication

Featured researches published by Fileno A. Alleva.

Computer Speech & Language | 1992

The SPHINX-II Speech Recognition System: An Overview

Xuedong Huang; Fileno A. Alleva; Hsiao-Wuen Hon; Mei-Yuh Hwang; Ronald Rosenfeld

In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition.

international conference on acoustics, speech, and signal processing | 1993

Predicting unseen triphones with senones

Mei-Yuh Hwang; Xuedong Huang; Fileno A. Alleva

In large-vocabulary speech recognition, there are always new triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context-independent monophones. It is proposed that decision-tree-based senones be used to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. A Markov state of any triphone traverses the corresponding tree until it reaches a leaf to find the senone it is to be associated with. The DARPA 5000-word peaker-independent Wall Street Journal dictation task is used to evaluate the proposed method. The word error rate is reduced by more than 10% when unseen triphones are modeled by the decision-tree-based senones.<<ETX>>

Journal of the Acoustical Society of America | 1998

System and method for speech recognition using dynamically adjusted confidence measure

Fileno A. Alleva; Douglas H. Beeferman; Xuedong Huang

A computer-implemented method of recognizing an input speech utterance compares the input speech utterance to a plurality of hidden Markov models to obtain a constrained acoustic score that reflects the probability that the hidden Markov model matches the input speech utterance. The method computes a confidence measure for each hidden Markov model that reflects the probability of the constrained acoustic score being correct. The computed confidence measure is then used to adjust the constrained acoustic score. Preferably, the confidence measure is computed based on a difference between the constrained acoustic score and an unconstrained acoustic score that is computed independently of any language context. In addition, a new confidence measure preferably is computed for each input speech frame from the input speech utterance so that the constrained acoustic score is adjusted for each input speech frame.

human language technology | 1993

An overview of the SPHINX-II speech recognition system

Xuedong Huang; Fileno A. Alleva; Mei-Yuh Hwang; Ronald Rosenfeld

In the past year at Carnegie Mellon steady progress has been made in the area of acoustic and language modeling. The result has been a dramatic reduction in speech recognition errors in the SPHINX-II system. In this paper, we review SPHINX-II and summarize our recent efforts on improved speech recognition. Recently SPHINX-II achieved the lowest error rate in the November 1992 DARPA evaluations. For 5000-word, speaker-independent, continuous, speech recognition, the error rate was reduced to 5%.

Journal of the Acoustical Society of America | 1999

Senone tree representation and evaluation

Fileno A. Alleva; Xuedong Huang; Mei-Yuh Hwang

A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone. As a result, unseen triphones not encountered in the training data can be modeled with senones created using the triphones actually found in the training data.

CSREAHCI | 1996

From Sphinx-II to Whisper — Making Speech Recognition Usable

Xuedong Huang; Alejandro Acero; Fileno A. Alleva; Mei-Yuh Hwang; Li Jiang; Milind Mahajan

In this chapter, we first review Sphinx-II, a large-vocabulary speaker-independent continuous speech recognition system developed at Carnegie Mellon University, summarizing the techniques that helped Sphinx-II achieve state-of-the-art recognition performance. We then review Whisper, a system we developed here at Microsoft Corporation, focusing on recognition accuracy, efficiency and usability issues. These three issues are critical to the success of commercial speech applications. Whisper has significantly improved its performance in these three areas. It can be configured as a spoken language front-end (telephony or desktop) or dictation application.

Journal of the Acoustical Society of America | 1988

The Carnegie‐Mellon Portable Speech Library

Fileno A. Alleva; Eric H. Thayer

In order to promote the dissemination of research results and to help advance the general state of the art in speech recognition, a public domain library of plug‐compatible subroutines, modules, and programs has been created. This paper describes the Carnegie‐Mellon Portable Speech Library (CMPSL). CMPSL provides algorithms written in C that can be used to quickly prototype state‐of‐the‐art speech recognition systems that are useful in their ability to perform the speech recognition task while also providing a basis from which to pursue further research. CMPSLs chief contributions are to make available recent advances in speech recognition at Carnegie‐Mellon in the context of Lees large‐vocabulary, speaker‐independent recognition system, SPHINX [K.‐F. Lee and H.‐W. Hon, Proc. IEEE ICASSP‐88, 123–126 (1988)], and to suggest a framework within which speech researchers can make the results of their work available to other members of the automatic speech recognition community. The scope of CMPSL includes di...

Archive | 2000