Is this you? Create Your Porfile

Chao Weng

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chao Weng is active.

Explore More

Publication

Featured researches published by Chao Weng.

international conference on acoustics, speech, and signal processing | 2014

Recurrent deep neural networks for robust speech recognition

Chao Weng; Dong Yu; Shinji Watanabe; Biing Hwang Fred Juang

In this work, we propose recurrent deep neural networks (DNNs) for robust automatic speech recognition (ASR). Full recurrent connections are added to certain hidden layer of a conventional feedforward DNN and allow the model to capture the temporal dependency in deep representations. A new backpropagation through time (BPTT) algorithm is introduced to make the minibatch stochastic gradient descent (SGD) on the proposed recurrent DNNs more efficient and effective. We evaluate the proposed recurrent DNN architecture under the hybrid setup on both the 2nd CHiME challenge (track 2) and Aurora-4 tasks. Experimental results on the CHiME challenge data show that the proposed system can obtain consistent 7% relative WER improvements over the DNN systems, achieving state-of-the-art performance without front-end preprocessing, speaker adaptive training or multiple decoding passes. For the experiments on Aurora-4, the proposed system achieves 4% relative WER improvement over a strong DNN baseline system.

IEEE Transactions on Audio, Speech, and Language Processing | 2015

Deep neural networks for single-channel multi-talker speech recognition

Chao Weng; Dong Yu; Michael L. Seltzer; Jasha Droppo

We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach contains five key ingredients: a multi-style training strategy on artificially mixed speech data, a separate DNN to estimate senone posterior probabilities of the louder and softer speakers at each frame, a weighted finite-state transducer (WFST)-based two-talker decoder to jointly estimate and correlate the speaker and speech, a speaker switching penalty estimated from the energy pattern change in the mixed-speech, and a confidence based system combination strategy. Experiments on the 2006 speech separation and recognition challenge task demonstrate that our proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an average word error rate (WER) of 18.8% across different SNRs and outperforms the state-of-the-art IBM superhuman system by 2.8% absolute with fewer assumptions.

international conference on acoustics, speech, and signal processing | 2014

Single-channel mixed speech recognition using deep neural networks

Chao Weng; Dong Yu; Michael L. Seltzer; Jasha Droppo

In this work, we study the problem of single-channel mixed speech recognition using deep neural networks (DNNs). Using a multi-style training strategy on artificially mixed speech data, we investigate several different training setups that enable the DNN to generalize to corresponding similar patterns in the test data. We also introduce a WFST-based two-talker decoder to work with the trained DNNs. Experiments on the 2006 speech separation and recognition challenge task demonstrate that the proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an overall WER of 19.7% which improves upon the results obtained by the state-of-the-art IBM superhuman system by 1.9% absolute, with fewer assumptions and lower computational complexity.

international conference on acoustics, speech, and signal processing | 2013

Adaptive boosted non-uniform mce for keyword spotting on spontaneous speech

Chao Weng; Biing-Hwang Juang

In this work, we present a complete framework of discriminative training using non-uniform criteria for keyword spotting, adaptive boosted non-uniform minimum classification error (MCE) for keyword spotting on spontaneous speech. To further boost the spotting performance and tackle the potential issue of over-training in the non-uniform MCE proposed in our prior work, we make two improvements to the fundamental MCE optimization procedure. Furthermore, motivated by AdaBoost, we introduce an adaptive scheme to embed error cost functions together with model combinations during the decoding stage. The proposed framework is comprehensively validated on two challenging large-scale spontaneous conversational telephone speech (CTS) tasks in different languages (English and Mandarin) and the experimental results show it can achieve significant and consistent figure of merit (FOM) gains over both ML and discriminatively trained systems.

international conference on acoustics, speech, and signal processing | 2014

Deep learning vector quantization for acoustic information retrieval

Zhen Huang; Chao Weng; Kehuang Li; You-Chi Cheng; Chin-Hui Lee

We propose a novel deep learning vector quantization (DLVQ) algorithm based on deep neural networks (DNNs). Utilizing a strong representation power of this deep learning framework, with any vector quantization (VQ) method as an initializer, the proposed DLVQ technique is capable of learning a code-constrained codebook and thus improves over conventional VQ to be used in classification problems. Tested on an audio information retrieval task, the proposed DLVQ achieves a quite promising performance when it is initialized by the k-means VQ technique. A 10.5% relative gain in mean average precision (MAP) is obtained after fusing the k-means and DLVQ results together.

international conference on acoustics, speech, and signal processing | 2013

Latent semantic rational kernels for topic spotting on spontaneous conversational speech

Chao Weng; Biing-Hwang Juang

In this work, we propose latent semantic rational kernels (LSRK) for topic spotting on spontaneous conversational speech. Rather than mapping the input weighted finite-state transducers (WFSTs) onto a high dimensional n-gram feature space as in n-gram rational kernels, the proposed LSRK maps the WFSTs onto a latent semantic space. Moreover, with the LSRK framework, all available external knowledge can be flexibly incorporated to boost the topic spotting performance. The experiments we conducted on a spontaneous conversational task, Switchboard, show that our method can achieve significant performance gain over the baselines from 27.33% to 57.56% accuracy and almost double the classification accuracy over the n-gram rational kernels in all cases.

international conference on acoustics, speech, and signal processing | 2011

Recent development of discriminative training using non-uniform criteria for cross-level acoustic modeling

Chao Weng; Biing-Hwang Juang

In this paper, we extend our previous study on discriminative training using non-uniform criteria for speech recognition. The work will put emphasis on how the acoustic modeling interacts with the risk at a higher level, which is more relevant to the most used evaluation measures, e.g., word error rate(WER). To be specific, the non-uniform error cost is first derived at the word level to minimize the risk w.r.t. WER and then computed on the word lattice using the forward-backward algorithm. With the statistics obtained from the forward-backward algorithm, the competing hypotheses for each label word are searched by performing dynamic programming between the label word sequence and the word lattice at the phone level. In order to alleviate the level inconsistency between the acoustic model(phone level) and the evaluation measure(word level), the derived error cost is embedded into the overall objective function in a cross-level fashion. Experiments on a large vocabulary task WSJ0 demonstrate the effectiveness of the overall approach, which show it outperforms two prevalent discriminative training methods and achieves about 13% relative improvement over the baseline system.

international conference on acoustics, speech, and signal processing | 2012

A comparative study of discriminative training using non-uniform criteria for cross-layer acoustic modeling

Chao Weng; Biing-Hwang Juang

This work focuses on a comparative study of discriminative training using non-uniform criteria for cross-layer acoustic modeling. Two kinds of discriminative training (DT) frameworks, minimum classification error like (MCE-like) and minimum phone error like (MPE-like) DT frameworks, are augmented to allow the error cost embedding at the phoneme (model) level respectively. To facilitate this comparative study, we implement both augmented DT frameworks under the same umbrella, using the error cost derived from the same cross-layer confusion matrix. Experiments on a large vocabulary task WSJ0 demonstrated the effectiveness of both DT frameworks with the formulated non-uniform error cost embedded. Several preliminary investigations on the effect of the dynamic range of error cost are also presented.

conference of the international speech communication association | 2014