Mei-Yuh Hwang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mei-Yuh Hwang is active.

Explore More

Publication

Featured researches published by Mei-Yuh Hwang.

Computer Speech & Language | 1992

The SPHINX-II Speech Recognition System: An Overview

Xuedong Huang; Fileno A. Alleva; Hsiao-Wuen Hon; Mei-Yuh Hwang; Ronald Rosenfeld

In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition.

IEEE Transactions on Speech and Audio Processing | 1993

Shared-distribution hidden Markov models for speech recognition

Mei-Yuh Hwang; Xuedong Huang

A shared-distribution hidden Markov model (HMM) is presented for speaker-independent continuous speech recognition. The output distributions across different phonetic HMMs are shared with each other when they exhibit acoustic similarity. This sharing provides the freedom to use a larger number of Markov states for each phonetic model. Although an increase in the number of states will increase the total number of free parameters, with distribution sharing one can collapse redundant states while maintaining necessary ones. The shared-distribution model reduced the word error rate on the DARPA Resource Management task by 20% in comparison with the generalized-triphone model. >

international conference on acoustics, speech, and signal processing | 1993

Predicting unseen triphones with senones

Mei-Yuh Hwang; Xuedong Huang; Fileno A. Alleva

In large-vocabulary speech recognition, there are always new triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context-independent monophones. It is proposed that decision-tree-based senones be used to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. A Markov state of any triphone traverses the corresponding tree until it reaches a leaf to find the senone it is to be associated with. The DARPA 5000-word peaker-independent Wall Street Journal dictation task is used to evaluate the proposed method. The word error rate is reduced by more than 10% when unseen triphones are modeled by the decision-tree-based senones.<<ETX>>

international conference on acoustics, speech, and signal processing | 1989

The SPHINX speech recognition system

Kai-Fu Lee; H.-W. Hon; Mei-Yuh Hwang; Sanjoy Mahajan; Raj Reddy

A description is given of SPHINX an accurate large-vocabulary speaker-independent continuous speech recognition system. The authors have made several recent enhancements, including generalized triphone models, word duration modeling, function-phrase modeling, between-word coarticulation modeling, and corrective training. On the 997-word resource management task, SPHINX attained a word accuracy of 96% with a grammar (perplexity 60), and 82% without grammar (perplexity 997).<<ETX>>

international conference on acoustics, speech, and signal processing | 2006

Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons

Andreas Stolcke; Frantisek Grezl; Mei-Yuh Hwang; Xin Lei; Nelson Morgan; Dimitra Vergyri

Recent results with phone-posterior acoustic features estimated by multilayer perceptrons (MLPs) have shown that such features can effectively improve the accuracy of state-of-the-art large vocabulary speech recognition systems. MLP features are trained discriminatively to perform phone classification and are therefore, like acoustic models, tuned to a particular language and application domain. In this paper we investigate how portable such features are across domains and languages. We show that even without retraining, English-trained MLP features can provide a significant boost to recognition accuracy in new domains within the same language, as well as in entirely different languages such as Mandarin and Arabic. We also show the effectiveness of feature-level adaptation in porting MLP features to new domains

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Andreas Stolcke; Barry Y. Chen; H. Franco; Venkata Ramana Rao Gadde; Martin Graciarena; Mei-Yuh Hwang; Katrin Kirchhoff; Arindam Mandal; Nelson Morgan; Xin Lei; Tim Ng; Mari Ostendorf; M. Kemal Sönmez; Anand Venkataraman; Dimitra Vergyri; Wen Wang; Jing Zheng; Qifeng Zhu

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin

international conference on acoustics, speech, and signal processing | 1992

Subphonetic modeling with Markov states-Senone

Mei-Yuh Hwang; Xuedong Huang

There will never be sufficient training data to model all the various acoustic-phonetic phenomena. How to capture important clues and estimate those needed parameters reliably is one of the central issues in speech recognition. Successful examples include subword models, fenones and many other smoothing techniques. In comparison with subword models, subphonetic modeling may provide a finer level of details. The authors propose to model subphonetic events with Markov states and treat the state in phonetic hidden Markov models as the basic subphonetic unit-senone. Senones generalize fenones in several ways. A word model is a concatenation of senones and senones can be shared across different word models. Senone models not only allow parameter sharing, but also enable pronunciation optimization. The authors report preliminary senone modeling results, which have significantly reduced the word error rate for speaker-independent continuous speech recognition.<<ETX>>

international conference on acoustics, speech, and signal processing | 1995

Microsoft Windows highly intelligent speech recognizer: Whisper

Xuedong Huang; Alex Acero; Fil Alleva; Mei-Yuh Hwang; Li Jiang; Milind Mahajan

Since January 1993, the authors have been working to refine and extend Sphinx-II technologies in order to develop practical speech recognition at Microsoft. The result of that work has been the Whisper (Windows Highly Intelligent Speech Recognizer). Whisper represents significantly improved recognition efficiency, usability, and accuracy, when compared with the Sphinx-II system. In addition Whisper offers speech input capabilities for Microsoft Windows and can be scaled to meet different PC platform configurations. It provides features such as continuous speech recognition, speaker-independence, on-line adaptation, noise robustness, dynamic vocabularies and grammars. For typical Windows Command-and-Control applications (less than 1000 words), Whisper provides a software only solution on PCs equipped with a 486DX, 4MB of memory, and a standard sound card and a desk-top microphone.

human language technology | 1993

An overview of the SPHINX-II speech recognition system

Xuedong Huang; Fileno A. Alleva; Mei-Yuh Hwang; Ronald Rosenfeld

In the past year at Carnegie Mellon steady progress has been made in the area of acoustic and language modeling. The result has been a dramatic reduction in speech recognition errors in the SPHINX-II system. In this paper, we review SPHINX-II and summarize our recent efforts on improved speech recognition. Recently SPHINX-II achieved the lowest error rate in the November 1992 DARPA evaluations. For 5000-word, speaker-independent, continuous, speech recognition, the error rate was reduced to 5%.

international conference on acoustics, speech, and signal processing | 1991

Improved acoustic modeling with the SPHINX speech recognition system

Xuedong Huang; Kai-Fu Lee; H.-W. Hon; Mei-Yuh Hwang

The authors report recent efforts to further improve the performance of the SPHINX system for speaker-independent continuous speech recognition. They adhere to the basic architecture of the SPHINX system and use the DARPA resource management task and training corpus. The improvements are evaluated on the 600 sentences that comprise the DARPA February and October 1989 test sets. Several techniques that substantially reduced SPHINXs error rate are presented. These techniques include dynamic features, semicontinuous hidden Markov models, speaker clustering, and the shared distribution modeling. The error rate of the baseline system was reduced by 45%.<<ETX>>

Explore More