Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rongqing Huang is active.

Publication


Featured researches published by Rongqing Huang.


IEEE Transactions on Speech and Audio Processing | 2005

SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

John H. L. Hansen; Rongqing Huang; Bowen Zhou; Michael Seadle; John R. Deller; Aparna Gurijala; Mikko Kurimo; Pongtep Angkititrakul

In this study, we discuss a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). NGSW is the first largescale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20 th Century. We propose a system diagram and discuss critical tasks associated with effective audio information retrieval that include: advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and natural language processing for text query requests. A number of questions regarding copyright assessment, metadata construction, digital watermarking must also be addressed for a sustainable audio collection of this magnitude. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from a portion of the NGSW corpus. We discuss a number of research challenges to address the overall task of robust phrase searching in unrestricted audio corpora. 1. Overview The problem of reliable speech recognition for spoken


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora

Rongqing Huang; John H. L. Hansen

The problem of unsupervised audio classification and segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in 1) audio classification for speech recognition and 2) audio segmentation for unsupervised multispeaker change detection. A new algorithm is proposed for audio classification, which is based on weighted GMM Networks (WGN). Two new extended-time features: variance of the spectrum flux (VSF) and variance of the zero-crossing rate (VZCR) are used to preclassify the audio and supply weights to the output probabilities of the GMM networks. The classification is then implemented using weighted GMM networks. Since historically there have been no features specifically designed for audio segmentation, we evaluate 16 potential features including three new proposed features: perceptual minimum variance distortionless response (PMVDR), smoothed zero-crossing rate (SZCR), and filterbank log energy coefficients (FBLC) in 14 noisy environments to determine the best robust features on the average across these conditions. Next, a new distance metric, T/sup 2/-mean, is proposed which is intended to improve segmentation for short segment turns (i.e., 1-5 s). A new false alarm compensation procedure is implemented, which can compensate the false alarm rate significantly with little cost to the miss rate. Evaluations on a standard data set-Defense Advanced Research Projects Agency (DARPA) Hub4 Broadcast News 1997 evaluation data-show that the WGN classification algorithm achieves over a 50% improvement versus the GMM network baseline algorithm, and the proposed compound segmentation algorithm achieves 23%-10% improvement in all metrics versus the baseline Mel-frequency cepstral coefficients (MFCC) and traditional Bayesian information criterion (BIC) algorithm. The new classification and segmentation algorithms also obtain very satisfactory results on the more diverse and challenging National Gallery of the Spoken Word (NGSW) corpus.


international conference on acoustics, speech, and signal processing | 2004

Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora

Rongqing Huang; John H. L. Hansen

The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in audio segmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (perceptual minimum variance distortionless response), SZCR ( smoothed zero crossing rate), and FBLC (filterbank log coefficients); next we consider a new distance metric, T/sup 2/-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after the segmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and frame accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Dialect/Accent Classification Using Unrestricted Audio

Rongqing Huang; John H. L. Hansen; Pongtep Angkititrakul

This study addresses novel advances in English dialect/accent classification. A word-based modeling technique is proposed that is shown to outperform a large vocabulary continuous speech recognition (LVCSR)-based system with significantly less computational costs. The new algorithm, which is named Word-based Dialect Classification (WDC), converts the text-independent decision problem into a text-dependent decision problem and produces multiple combination decisions at the word level rather than making a single decision at the utterance level. The basic WDC algorithm also provides options for further modeling and decision strategy improvement. Two sets of classifiers are employed for WDC: a word classifier DW(k) and an utterance classifier D u. DW(k) is boosted via the AdaBoost algorithm directly in the probability space instead of the traditional feature space. Du is boosted via the dialect dependency information of the words. For a small training corpus, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a context adapted training (CAT) algorithm is formulated, which adapts the universal phoneme Gaussian mixture models (GMMs) to dialect-dependent word hidden Markov models (HMMs) via linear regression. Three separate dialect corpora are used in the evaluations that include the Wall Street Journal (American and British English), NATO N4 (British, Canadian, Dutch, and German accent English), and IViE (eight British dialects). Significant improvement in dialect classification is achieved for all corpora tested


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Unsupervised Discriminative Training With Application to Dialect Classification

Rongqing Huang; John H. L. Hansen

Automatic dialect classification has gained interest in the field of speech research because of its importance in characterizing speaker traits and knowledge estimation which could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous dialect classification in English and Spanish. The problem considers the case where no transcripts are available for training and test data, and speakers are talking spontaneously. The Gaussian mixture model (GMM) is used for unsupervised dialect classification in our study. Techniques which aim to deal with confused acoustic regions in the GMMs are proposed, where confused regions in the GMMs are identified through data driven methods. The first technique excludes confused regions by finding dialect dependence in the untranscribed audio by selecting the most discriminative Gaussian mixtures [mixture selection (MS)]. The second technique includes the confused regions in the model, but the confused regions are balanced over all classes. This technique is implemented by identifying discriminative frames and confused frames in the audio data [frame selection (FS)]. The new confused regions contribute to model representation but does not impact classification performance. The third technique is to reduce the confused regions in the original model. Minimum classification error (MCE) is applied to achieve this objective. All three techniques implement discriminative training for GMM-based classification. Both the first technique (MS-GMM, GMM trained with mixture selection) and the second technique (FS-GMM, GMM trained with frame selection) improve dialect classification performance. Further improvement is achieved after applying the third technique (MCE training) before the first or second techniques. The system is evaluated using British English dialects and Latin American Spanish dialects. Measurable improvement is achieved in both corpora. Finally, the system is compared with human listener performance, and shown to outperform human listeners in terms of classification accuracy.


international conference on acoustics, speech, and signal processing | 2005

Dialect/accent classification via boosted word modeling

Rongqing Huang; John H. L. Hansen

The paper addresses novel advances in English dialect/accent classification/identification. A word level based modeling technique is proposed that is shown to outperform a LVCSR based system with significantly less computational cost. The new algorithm, which is named WDC (word-based dialect classification), converts the text independent decision problem into a text dependent problem and produces multiple combination decisions at the word level rather than make a single decision at the utterance level. There are two sets of classifiers employed for WDC, word classifier, D/sub W(k)/, and utterance classifier, D/sub u/. D/sub W(k)/ is boosted via the real AdaBoost.MH algorithm in the probability space directly instead of the feature space. D/sub u/ is boosted via the dialect dependency information of the words. Two dialect corpora are used in the evaluation. Significant improvement in dialect classification is achieved for both corpora.


international conference on acoustics, speech, and signal processing | 2007

Dialect Classification on Printed Text using Perplexity Measure and Conditional Random Fields

Rongqing Huang; John H. L. Hansen

Studies have shown that dialect variation has a significant impact in speech recognition performance, and therefore it is important to be able to perform effective dialect classification to improve speech systems. Dialects differ at the acoustic, grammar, and vocabulary levels. In this study, topic-specific printed text dialect data are collected from the ten major newspapers in Australia, United Kingdom, and United States. An n-gram language model is trained for each topic in each country/dialect. The perplexity measure is applied to classify the dialect-dependent documents. In addition to the n-gram information, further features can be extracted from text structure. Conditional random fields (CRF) is such a model which can extract different levels of features and is still mathematically tractable. The CRF is applied to train the language model and classify documents. Significant improvement on dialect classification is achieved by using the CRF based classifier, especially on the small size documents (10% to 22% relative error reduction). Text classification based on variable size documents is explored and a document with several hundred words is shown to be sufficient for dialect classification. The vocabulary difference among the text documents from different countries are explored and the dialect difference is smoothly connected with the vocabulary difference. Five document topics are evaluated and performance for cross topic dialect classification is explored.


nordic signal processing symposium | 2006

A Preliminary Study on Applying the Conditional Modeling to Automatic Dialect Classification

Rongqing Huang; John H. L. Hansen

This paper addresses the advances in unsupervised dialect classification. There are no transcripts for both the training data and the testing data. In this study, we view the classification problem in speech in an recognition-based way instead of the conventional generative model-based approach and try to bypass the unknown transcript problem. The new algorithm is based on conditional model. The new algorithm has two notable advantages: first, it can train a statistical model without transcripts, so it can work in our transcript-free classification problem; second, the conditional model in the new algorithm can allow arbitrary feature representations, therefore, it can encode more discriminative features than the generative models such as hidden Markov model (HMM), which has to use the independent and local features due to the model restrictions. The conditional model used in the study is the conditional random fields (CRF). Further study on combining the generative model and conditional model is presented. In the Spanish dialect classification evaluation, the CRF and the combined modeling technique show some interesting results


conference of the international speech communication association | 2004

Dialect analysis and modeling for automatic classification.

John H. L. Hansen; Umit H. Yapanel; Rongqing Huang; Ayako Ikeno


nordic signal processing symposium | 2004

SPEECHFIND: spoken document retrieval for a national gallery of the spoken word

John H. L. Hansen; Rongqing Huang; Praful Mangalath; Bowen Zhou; Michael Seadle; John R. Deller

Collaboration


Dive into the Rongqing Huang's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Bowen Zhou

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John R. Deller

Michigan State University

View shared research outputs
Top Co-Authors

Avatar

Michael Seadle

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aparna Gurijala

Michigan State University

View shared research outputs
Top Co-Authors

Avatar

Ayako Ikeno

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Praful Mangalath

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Umit H. Yapanel

University of Colorado Boulder

View shared research outputs
Researchain Logo
Decentralizing Knowledge