Qianhua He
South China University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qianhua He.
IEEE Signal Processing Magazine | 1996
K. S. Tang; K.F. Man; Sam Kwong; Qianhua He
This article introduces the genetic algorithm (GA) as an emerging optimization algorithm for signal processing. After a discussion of traditional optimization techniques, it reviews the fundamental operations of a simple GA and discusses procedures to improve its functionality. The properties of the GA that relate to signal processing are summarized, and a number of applications, such as IIR adaptive filtering, time delay estimation, active noise control, and speech processing, that are being successfully implemented are described.
International Journal of Pattern Recognition and Artificial Intelligence | 1998
Sam Kwong; Qianhua He; K.F. Man; K. S. Tang; C. W. Chau
Dynamic Time Warping (DTW) is a common technique widely used for nonlinear time normalization of different utterances in many speech recognition systems. Two major problems are usually encountered when the DTW is applied for recognizing speech utterances: (i) the normalization factors used in a warping path; and (ii) finding the K-best warping paths. Although DTW is modified to compute multiple warping paths by using the Tree-Trellis Search (TTS) algorithm, the use of actual normalization factor still remains a major problem for the DTW. In this paper, a Parallel Genetic Time Warping (PGTW) is proposed to solve the above said problems. A database extracted from the TIMIT speech database of 95 isolated words is set up for evaluating the performance of the PGTW. In the database, each of the first 15 words had 70 different utterances, and the remaining 80 words had only one utterance. For each of the 15 words, one utterance is arbitrarily selected as the test template for recognition. Distance measure for each test template to the utterances of the same word and to those of the 80 words is calculated with three different time warping algorithms: TTS, PGTW and Sequential Genetic Time Warping (SGTW). A Normal Distribution Model based on Rabiner23 is used to evaluate the performance of the three algorithms analytically. The analyzed results showed that the PGTW had performed better than the TTS. It also showed that the PGTW had very similar results as the SGTW, but about 30% CPU time is saved in the single processor system.
Pattern Recognition | 1998
Sam Kwong; Qianhua He; Kim-Fung Man; Kit-Sang Tang
Abstract This paper presents a new approach for HMM-training which is based on the maximum model distance (MMD) criterion for different similar utterances. This approach differs from the traditional maximum likelihood (ML) approach in that the ML only considers the likelihood P(Oν|λν) for a single utterance, while the MMD compares the likelihood P(Oν|λν) against those similar utterances and maximizes their likelihood differences. Theoretical and practical issues concerning this approach are investigated. In addition, the corrective training [Bahl, Brown, de Souza and Mercer, IEEE Trans. Speech Audio Process. 1(1), (1993)] of the MMD was also included in this paper and we proved that the corrective training proposed by Bahl et al. (1993) is a special case of our MMD approach. Both speaker-dependent and multi-speaker experiments have been carried out on the Chinese An-set syllables and also the 599 most common utterances from the TIMIT database. Experimental results showed that significant error reduction can be achieved through the proposed approach.
Signal Processing | 2002
Sam Kwong; Qianhua He; K. W. Ku; Tak-Ming Chan; Kim-Fung Man; Kit-Sang Tang
In this paper, we present a genetic approach for training hidden Markov models using minimum classification error (MCE) as the reestimation criteria. This approach is discriminative and proved to be better than other non-discriminative approach such as the maximum likelihood (ML) method. The major problem of using the MCE is to formulate the error rate estimate as a smooth continuous loss function such that the gradient search techniques can be applied to search for the solutions. A genetic approach for this particular classification error method aimed at finding the global solution or better optimal solutions is proposed. Comparing our approach with the ML and MCE approaches, the experimental results showed that it is superior to both the MCE and ML methods.
Signal Processing | 2009
Yanxiong Li; Qianhua He; Sam Kwong; Tao Li; Jichen Yang
Applause frequently occurs in multi-participants meeting speech. In fact, detecting applause is quite important for meeting speech recognition, semantic inference, highlight extraction, etc. In this paper, we will first study the characteristic differences between applause and speech, such as duration, pitch, spectrogram and occurrence locations. Then, an effective algorithm based on these characteristics is proposed for detecting applause in meeting speech stream. In the algorithm, the non-silence signal segments are first extracted by using voice activity detection. Afterward, applause segments are detected from the non-silence signal segments based on the characteristic differences between applause and speech without using any complex statistical models, such as hidden Markov models. The proposed algorithm can accurately determine the boundaries of applause in meeting speech stream, and is also computationally efficient. In addition, it can extract applause sub-segments from the mixed segments. Experimental evaluations show that the proposed algorithm can achieve satisfactory results in detecting applause of the meeting speech. Precision rate, recall rate, and F1-measure are 94.34%, 98.04%, and 96.15%, respectively. When compared with the traditional algorithm under the same experimental conditions, 3.62% improvement in F1-measure is achieved, and about 35.78% of computational time is saved.
International Journal of Pattern Recognition and Artificial Intelligence | 1996
Sam Kwong; Qianhua He; Kim-Fung Man
In this paper, a Genetic Time Warping (GTW) algorithm for isolated word recognition was proposed. Relative representation techniques, fitness techniques and reproduction techniques were described and genetic operators were also discussed in detail. Different from the conventional genetic algorithms with fixed genes, every chromosome has its own number of genes. A modified order-based crossover operator was introduced in order to deal with the chromosomes with a different number of genes. Besides the mutation and crossover operators, a new heuristic local optimum operator was also built and it could alter part of a chromosome based on a function of local distance and average distortion of the paths. Finally, experimental investigations were carried out to test the performance of GTW. Based on Rabiners normal assumptions23 on the distributions of the distances, the overall probability of making a word error could be calculated experimentally. Results demonstrated that GTW performed better or much better than the DTW method for most of the tested words.
annual acis international conference on computer and information science | 2008
Yanxiong Li; Qianhua He; Tao Li
Filled pause is one of the hesitation phenomena that current speech recognizers can not effectively handle. Detecting filled pauses is important in spontaneous speech dialogue systems because they play valuable roles, such as helping a speaker keep a conversational turn, in oral communication. In this paper, a novel detection method for filled pause is proposed based on a strategy of two-level decisions. Firstly, hypothetical filled pauses are extracted from non-silence signal segments. Then HMMs are trained and employed to recognize filled pauses from hypothetical filled pauses. Experiment results show that the average precision and recall rate of filled pauses are 80.66% and 92.59% respectively. Moreover, this filled pause detector can distinguish filled pauses from elongated words, which are not achieved in the previous works.
Pattern Recognition | 2000
Qianhua He; Sam Kwong; Kim-Fung Man; Kit-Sang Tang
Abstract This paper proposes an improved maximum model distance (IMMD) approach for HMM-based speech recognition systems based on our previous work [S. Kwong, Q.H. He, K.F. Man, K.S. Tang. A maximum model distance approach for HMM-based speech recognition, Pattern Recognition 31 (3) (1998) 219–229]. It defines a more realistic model distance definition for HMM training, and utilizes the limited training data in a more effective manner. Discriminative information contained in the training data was used to improve the performance of the recognizer. HMM parameter adjustment rules were induced in details. Theoretical and practical issues concerning this approach are also discussed and investigated in this paper. Both isolated word and continuous speech recognition experiments showed that a significant error reduction could be achieved by IMMD when compared with the maximum model distance (MMD) criterion and other training methods using the minimum classification error (MCE) and the maximum mutual information (MMI) approaches.
international conference on acoustics, speech, and signal processing | 2015
Ling Zou; Qianhua He; Xiaohui Feng
Source recording device recognition is an important emerging research field of digital media forensic. Most of the prior literature focus on the recording device identification problem. In this study we propose a source cell phone verification scheme based on sparse representation. We employed Gaussian supervectors (GSVs) based on Mel-frequency cepstral coefficients (MFCCs) extracted from the speech recordings to characterize the intrinsic fingerprint of the cell phone. For the sparse representation, both exemplar based dictionary and dictionary learned by K-SVD algorithm were examined to this problem. Evaluation experiments were conducted on a corpus consists of speech recording recorded by 14 cell phones. The achieved equal error rate (EER) demonstrated the feasibility of the proposed scheme.
International Journal of Speech Technology | 2010
Yanxiong Li; Sam Kwong; Qianhua He; Jun He; Jichen Yang
Feature subsets and hidden Markov model (HMM) parameters are the two major factors that affect the classification accuracy (CA) of the HMM-based classifier. This paper proposes a genetic algorithm based approach for simultaneously optimizing both feature subsets and HMM parameters with the aim to obtain the best HMM-based classifier. Experimental data extracted from three spontaneous speech corpora were used to evaluate the effectiveness of the proposed approach and the three other approaches (i.e. the approaches to single optimization of feature subsets, single optimization of HMM parameters, and no optimization of both feature subsets and HMM parameters) that were adopted in the previous work for discrimination between speech and non-speech events (e.g. filled pause, laughter, applause). The experimental results show that the proposed approach obtains CA of 91.05%, while the three other approaches obtain CA of 86.11%, 87.05%, and 83.16%, respectively. The results suggest that the proposed approach is superior to the previous approaches.