Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qiang Huo is active.

Publication


Featured researches published by Qiang Huo.


IEEE Transactions on Speech and Audio Processing | 1995

Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition

Qiang Huo; Chorkin Chan; Chin Hui Lee

A theoretical framework for Bayesian adaptive training of the parameters of a discrete hidden Markov model (DHMM) and of a semi-continuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forward-backward MAP (maximum a posteriori) and the segmental MAP algorithms for estimating the above HMM parameters, a computationally efficient segmental quasi-Bayes algorithm for estimating the state-specific mixture coefficients in SCHMM is developed. For estimating the parameters of the prior densities, a new empirical Bayes method based on the moment estimates is also proposed. The MAP algorithms and the prior parameter specification are directly applicable to training speaker adaptive HMMs. Practical issues related to the use of the proposed techniques for HMM-based speaker adaptation are studied. The proposed MAP algorithms are shown to be effective especially in the cases in which the training or adaptation data are limited. >


international conference on acoustics, speech, and signal processing | 2001

High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training

Qiang Huo; Yong Ge; Zhi-Dan Feng

We have developed a Chinese OCR engine for machine printed documents. Currently, our OCR engine can support a vocabulary of 6921 characters which include 6707 simplified Chinese characters in GB2312-80, 12 frequently used GBK Chinese characters, 62 alphanumeric characters, 140 punctuation marks and symbols. The supported font styles include Song, Fang Song, Kat, He, Yuan, LiShu, WeiBei, XingKai, etc. The averaged character recognition accuracy is above 99% for newspaper quality documents with a recognition speed of about 250 characters per second on a Pentium III-450 MHz PC yet only consuming less than 2 MB memory. We describe the key technologies we used to construct the above recognizer. Among them, we highlight three key techniques contributing to the high recognition accuracy, namely the use of Gabor features, the use of discriminative feature extraction, and the use of minimum classification error as a criterion for model training.


IEEE Transactions on Speech and Audio Processing | 1999

Robust speech recognition based on a Bayesian prediction approach

Hui Jiang; Keikichi Hirose; Qiang Huo

We study a category of robust speech recognition problem in which mismatches exist between training and testing conditions, and no accurate knowledge of the mismatch mechanism is available. The only available information is the test data along with a set of pretrained Gaussian mixture continuous density hidden Markov models (CDHMMs). We investigate the problem from the viewpoint of Bayesian prediction. A simple prior distribution, namely constrained uniform distribution, is adopted to characterize the uncertainty of the mean vectors of the CDHMMs. Two methods, namely a model compensation technique based on Bayesian predictive density and a robust decision strategy called Viterbi Bayesian predictive classification are studied. The proposed methods are compared with the conventional Viterbi decoding algorithm in speaker-independent recognition experiments on isolated digits and TI connected digit strings (TIDTGITS), where the mismatches between training and testing conditions are caused by: (1) additive Gaussian white noise, (2) each of 25 types of actual additive ambient noises, and (3) gender difference. The experimental results show that the adopted prior distribution and the proposed techniques help to improve the performance robustness under the examined mismatch conditions.


international conference on acoustics, speech, and signal processing | 1997

A Bayesian predictive classification approach to robust speech recognition

Qiang Huo; Hui Jiang; Chin-Hui Lee

We introduce a new Bayesian predictive classification (BPC) approach to robust speech recognition and apply the BPC framework to Gaussian mixture continuous density hidden Markov model based speech recognition. We propose and focus on one of the approximate BPC approaches called quasi-Bayesian predictive classification (QBPC). In comparison with the standard plug-in maximum a posteriori decoding, when the QBPC method is applied to speaker independent recognition of a confusable vocabulary namely 26 English letters, where a broad range of mismatches between training and testing conditions exist, the QBPC achieves around 14% relative recognition error rate reduction. While the QBPC method is applied to cross-gender testing on a less confusable vocabulary, namely 20 English digits and commands, the QBPC method achieves around 24% relative recognition error rate reduction.


international symposium on chinese spoken language processing | 2006

An HMM compensation approach using unscented transformation for noisy speech recognition

Yu Hu; Qiang Huo

The performance of current HMM-based automatic speech recognition (ASR) systems degrade significantly in real-world applications where there exist mismatches between training and testing conditions caused by factors such as mismatched signal capturing and transmission channels and additive environmental noises. Among many approaches proposed previously to cope with the above robust ASR problem, two notable HMM compensation approaches are the so-called Parallel Model Combination (PMC) and Vector Taylor Series (VTS) approaches, respectively. In this paper, we introduce a new HMM compensation approach using a technique called Unscented Transformation (UT). As a first step, we have studied three implementations of the UT approach with different computational complexities for noisy speech recognition, and evaluated their performance on Aurora2 connected digits database. The UT approaches achieve significant improvements in recognition accuracy compared to log-normal-approximation-based PMC and first-order-approximation-based VTS approaches.


Archive | 2006

Chinese Spoken Language Processing

Qiang Huo; Bin Ma; Eng Siong Chng

Plenary.- Interactive Computer Aids for Acquiring Proficiency in Mandarin.- The Affective and Pragmatic Coding of Prosody.- Challenges in Machine Translation.- Automatic Indexing and Retrieval of Large Broadcast News Video Collections - The TRECVID Experience.- Tutorial.- An HMM-Based Approach to Flexible Speech Synthesis.- Text Information Extraction and Retrieval.- Topics in Speech Science.- Mechanisms of Question Intonation in Mandarin.- Comparison of Perceived Prosodic Boundaries and Global Characteristics of Voice Fundamental Frequency Contours in Mandarin Speech.- Linguistic Markings of Units in Spontaneous Mandarin.- Phonetic and Phonological Analysis of Focal Accents of Disyllabic Words in Standard Chinese.- Focus, Lexical Stress and Boundary Tone: Interaction of Three Prosodic Features.- Speech Analysis.- A Robust Voice Activity Detection Based on Noise Eigenspace Projection.- Pitch Mean Based Frequency Warping.- A Study of Knowledge-Based Features for Obstruent Detection and Classification in Continuous Mandarin Speech.- Speaker-and-Environment Change Detection in Broadcast News Using Maximum Divergence Common Component GMM.- UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection.- Design of Cubic Spline Wavelet for Open Set Speaker Classification in Marathi.- Speech Synthesis and Generation.- Rhythmic Organization of Mandarin Utterances - A Two-Stage Process.- Prosodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification.- Prosodic Words Prediction from Lexicon Words with CRF and TBL Joint Method.- Prosodic Word Prediction Using a Maximum Entropy Approach.- Predicting Prosody from Text.- Nonlinear Emotional Prosody Generation and Annotation.- A Unified Framework for Text Analysis in Chinese TTS.- Speech Synthesis Based on a Physiological Articulatory Model.- An HMM-Based Mandarin Chinese Text-To-Speech System.- HMM-Based Emotional Speech Synthesis Using Average Emotion Model.- A Hakka Text-To-Speech System.- Speech Enhancement.- Adaptive Null-Forming Algorithm with Auditory Sub-bands.- Multi-channel Noise Reduction in Noisy Environments.- Acoustic Modeling for Automatic Speech Recognition.- Minimum Phone Error (MPE) Model and Feature Training on Mandarin Broadcast News Task.- State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition.- Non-uniform Kernel Allocation Based Parsimonious HMM.- Consistent Modeling of the Static and Time-Derivative Cepstrums for Speech Recognition Using HSPTM.- Robust Speech Recognition.- Vector Autoregressive Model for Missing Feature Reconstruction.- Auditory Contrast Spectrum for Robust Speech Recognition.- Signal Trajectory Based Noise Compensation for Robust Speech Recognition.- An HMM Compensation Approach Using Unscented Transformation for Noisy Speech Recognition.- Noisy Speech Recognition Performance of Discriminative HMMs.- Distributed Speech Recognition of Mandarin Digits String.- Speech Adaptation/Normalization.- Unsupervised Speaker Adaptation Using Reference Speaker Weighting.- Automatic Construction of Regression Class Tree for MLLR Via Model-Based Hierarchical Clustering.- General Topics in Speech Recognition.- A Minimum Boundary Error Framework for Automatic Phonetic Segmentation.- Large Vocabulary Continuous Speech Recognition.- Advances in Mandarin Broadcast Speech Transcription at IBM Under the DARPA GALE Program.- Improved Large Vocabulary Continuous Chinese Speech Recognition by Character-Based Consensus Networks.- All-Path Decoding Algorithm for Segmental Based Speech Recognition.- Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models.- On Using Entropy Information to Improve Posterior Probability-Based Confidence Measures.- Vietnamese Automatic Speech Recognition: The FLaVoR Approach.- Multilingual Recognition and Identification.- Language Identification by Using Syllable-Based Duration Classification on Code-Switching Speech.- Speaker Recognition and Characterization.- CCC Speaker Recognition Evaluation 2006: Overview, Methods, Data, Results and Perspective.- The IIR Submission to CSLP 2006 Speaker Recognition Evaluation.- A Novel Alternative Hypothesis Characterization Using Kernel Classifiers for LLR-Based Speaker Verification.- Speaker Verification Using Complementary Information from Vocal Source and Vocal Tract.- ISCSLP SR Evaluation, UVA-CS_es System Description. A System Based on ANNs.- Evaluation of EMD-Based Speaker Recognition Using ISCSLP2006 Chinese Speaker Recognition Evaluation Corpus.- Integrating Complementary Features with a Confidence Measure for Speaker Identification.- Discriminative Transformation for Sufficient Adaptation in Text-Independent Speaker Verification.- Fusion of Acoustic and Tokenization Features for Speaker Recognition.- Spoken Language Understanding.- Contextual Maximum Entropy Model for Edit Disfluency Detection of Spontaneous Speech.- Human Language Acquisition, Development and Learning.- Automatic Detection of Tone Mispronunciation in Mandarin.- Towards Automatic Tone Correction in Non-native Mandarin.- Spoken and Multimodal Dialog Systems.- A Corpus-Based Approach for Cooperative Response Generation in a Dialog System.- A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion.- The Implementation of Service Enabling with Spoken Language of a Multi-modal System Ozone.- Spoken Correction for Chinese Text Entry.- Speech Data Mining and Document Retrieval.- Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models.- Meeting Segmentation Using Two-Layer Cascaded Subband Filters.- A Multi-layered Summarization System for Multi-media Archives by Understanding and Structuring of Chinese Spoken Documents.- Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities.- Machine Translation of Speech.- Some Improvements in Phrase-Based Statistical Machine Translation.- Automatic Spoken Language Translation Template Acquisition Based on Boosting Structure Extraction and Alignment.- Spoken Language Resources and Annotation.- HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus.- The Paradigm for Creating Multi-lingual Text-To-Speech Voice Databases.- Multilingual Speech Corpora for TTS System Development.- Construct Trilingual Parallel Corpus on Demand.- The Contribution of Lexical Resources to Natural Language Processing of CJK Languages.- Multilingual Spoken Language Corpus Development for Communication Research.- Development of Multi-lingual Spoken Corpora of Indian Languages.


IEEE Transactions on Speech and Audio Processing | 2001

Online adaptive learning of continuous-density hidden Markov models based on multiple-stream prior evolution and posterior pooling

Qiang Huo; Bin Ma

We introduce a new adaptive Bayesian learning framework, called multiple-stream prior evolution and posterior pooling, for online adaptation of the continuous density hidden Markov model (CDHMM) parameters. Among three architectures we proposed for this framework, we study in detail a specific two stream system where linear transformations are applied to the mean vectors of the CDHMMs to control the evolution of their prior distribution. This new stream of prior distribution can be combined with another stream of prior distribution evolved without any constraints applied. In a series of speaker adaptation experiments on the task of continuous Mandarin speech recognition, we show that the new adaptation algorithm achieves a similar fast-adaptation performance as that of the incremental maximum likelihood linear regression (MLLR) in the case of small amount of adaptation data, while maintains the good asymptotic convergence property as that of our previously proposed quasi-Bayes adaptation algorithms.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

A discrete contextual stochastic model for the off-line recognition of handwritten Chinese characters

Yan Xiong; Qiang Huo; Chorkin Chan

We study a discrete contextual stochastic (CS) model for complex and variant patterns like handwritten Chinese characters. Three fundamental problems of using CS models for character recognition are discussed, and several practical techniques for solving these problems are investigated. A formulation for discriminative training of CS model parameters is also introduced and its practical usage investigated. To illustrate the characteristics of the various algorithms, comparative experiments are performed on a recognition task with a vocabulary consisting of 50 pairs of highly similar handwritten Chinese characters. The experimental results confirm the effectiveness of the discriminative training for improving recognition performance.


international conference on acoustics speech and signal processing | 1999

Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree

Qiang Huo; Bin Ma

We propose to apply the concept of irrelevant variability normalization to the general problem of learning structure from data. Because of the problems of a diversified training data set and/or possible acoustic mismatches between training and testing conditions, the structure learned from the training data by using a maximum likelihood training method will not necessarily generalize well on mismatched tasks. We apply the above concept to the structural learning problem of phonetic decision-tree based hidden Markov model (HMM) state tying. We present a new method that integrates a linear-transformation based normalization mechanism into the decision-tree construction process to make the learned structure have a better modeling capability and generalizability. The viability and efficacy of the proposed method are confirmed in a series of experiments for continuous speech recognition of Mandarin Chinese.


international conference on acoustics, speech, and signal processing | 2002

Supervised adaptation of MCE-trained CDHMMS using minimum classification error linear regression

Jian Wu; Qiang Huo

In this paper, we present a formulation of minimum classification error linear regression (MCELR) for adaptation of Gaussian mixture continuous density HMM (CDHMM) parameters. We demonstrate that the MCELR can be used to adapt the MCE-trained HMM parameters under a consistent criterion. In a supervised speaker adaptation application, we observe that such adapted models perform better than the ones adapted using MLLR from the ML-trained seed models. We also observe that the MCELR performs consistently better than the MLLR for either sets of seed models.

Collaboration


Dive into the Qiang Huo's collaboration.

Top Co-Authors

Avatar

Chorkin Chan

University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jian Wu

University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Donglai Zhu

University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eng Siong Chng

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yu Hu

University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Zhi-Dan Feng

University of Hong Kong

View shared research outputs
Researchain Logo
Decentralizing Knowledge