Is this you? Create Your Porfile

Barry Y. Chen

International Computer Science Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Barry Y. Chen is active.

Explore More

Publication

Featured researches published by Barry Y. Chen.

IEEE Signal Processing Magazine | 2005

Pushing the envelope - aside [speech recognition]

Nelson Morgan; Qifeng Zhu; Andreas Stolcke; K. Sonmez; S. Sivadas; T. Shinozaki; Mari Ostendorf; P. Jain; Hynek Hermansky; Daniel P. W. Ellis; G. Doddington; Barry Y. Chen; O. Cretin; H. Bourlard; M. Athineos

Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Andreas Stolcke; Barry Y. Chen; H. Franco; Venkata Ramana Rao Gadde; Martin Graciarena; Mei-Yuh Hwang; Katrin Kirchhoff; Arindam Mandal; Nelson Morgan; Xin Lei; Tim Ng; Mari Ostendorf; M. Kemal Sönmez; Anand Venkataraman; Dimitra Vergyri; Wen Wang; Jing Zheng; Qifeng Zhu

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin

international conference on acoustics, speech, and signal processing | 2004

Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition

Nelson Morgan; Barry Y. Chen; Qifeng Zhu; Andreas Stolcke

Temporal patterns (TRAP) and tandem MLP/HMM approaches incorporate feature streams computed from longer time intervals than the conventional short-time analysis. These methods have been used for challenging small- and medium-vocabulary recognition tasks, such as Aurora and SPINE. Conversational telephone speech recognition is a difficult large-vocabulary task, with current systems giving incorrect output for 20-40% of the words, depending on the system complexity and test set. Training and test times for this problem also tend to be relatively long, making rapid development quite difficult. In this paper we report experiments with a reduced conversational speech task that led to the adoption of a number of engineering decisions for the design of an acoustic front end. We then describe our results with this front end on a full-vocabulary conversational telephone speech task. In both cases the front end yielded significant improvements over the baseline.

international conference on machine learning | 2004

Tandem connectionist feature extraction for conversational speech recognition

Qifeng Zhu; Barry Y. Chen; Nelson Morgan; Andreas Stolcke

Multi-Layer Perceptrons (MLPs) can be used in automatic speech recognition in many ways. A particular application of this tool over the last few years has been the Tandem approach, as described in [7] and other more recent publications. Here we discuss the characteristics of the MLP-based features used for the Tandem approach, and conclude with a report on their application to conversational speech recognition. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. We also report recognition results that show that MLP features can significantly improve recognition performance for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even for complex systems incorporating MMIE training and other enhancements.

international conference on acoustics, speech, and signal processing | 2000

Data-driven RASTA filters in reverberation

Michael L. Shire; Barry Y. Chen

In this work we test the performance of RASTA-style modulation filters derived under reverberant conditions. The modulation filters are constructed through linear discriminant analysis of log critical band energies in a manner described by van Vuuren and Hermansky (1997). In previous work we had observed the properties of the resultant filters under a number of acoustic conditions that were artificially applied to the training speech. Here, we present automatic speech recognition results that compare the performance of these filters under some training and testing reverberant conditions. We also test the effectiveness and robustness of a multi-stream combination using probability streams trained under different reverberant environments. The experiments reveal some performance improvement in severe reverberation.

international conference on machine learning | 2004

Long-Term temporal features for conversational speech recognition

Barry Y. Chen; Qifeng Zhu; Nelson Morgan

The automatic transcription of conversational speech, both from telephone and in-person interactions, is still an extremely challenging task. Our efforts to recognize speech from meetings is likely to benefit from any advances we achieve with conversational telephone speech, a topic of considerable focus for our research. Towards both of these ends, we have developed, in collaboration with our colleagues at SRI and IDIAP, techniques to incorporate long-term (~500 ms) temporal information using multi-layered perceptrons (MLPs). Much of this work is based on prior achievements in recent years at the former lab of Hynek Hermansky at the Oregon Graduate Institute (OGI), where the TempoRAl Pattern (TRAP) approach was developed. The contribution here is to present experiments showing: 1) that simply widening acoustic context by using more frames of full band speech energies as input to the MLP is suboptimal compared to a more constrained two-stage approach that first focuses on long-term temporal patterns in each critical band separately and then combines them, 2) that the best two-stage approach studied utilizes hidden activation values of MLPs trained on the log critical band energies (LCBEs) of 51 consecutive frames, and 3) that combining the best two-stage approach with conventional short-term features significantly reduces word error rates on the 2001 NIST Hub-5 conversational telephone speech (CTS) evaluation set with models trained using the Switchboard Corpus.

conference of the international speech communication association | 2004

On using MLP features in LVCSR.

Qifeng Zhu; Barry Y. Chen; Nelson Morgan; Andreas Stolcke

conference of the international speech communication association | 2005

Using MLP features in SRI's conversational speech recognition system.

Qifeng Zhu; Andreas Stolcke; Barry Y. Chen; Nelson Morgan

conference of the international speech communication association | 2004

Learning long-term temporal features in LVCSR using neural networks.

Barry Y. Chen; Qifeng Zhu; Nelson Morgan

conference of the international speech communication association | 2001

Robust ASR front-end using spectral-based and discriminant features: experiments on the Aurora tasks

M. Carmen Benítez; Lukas Burget; Barry Y. Chen; Stéphane Dupont; Harinath Garudadri; Hynek Hermansky; Pratibha Jain; Sachin S. Kajarekar; Nelson Morgan; Sunil Sivadas

Explore More