Is this you? Create Your Porfile

Kwok-Kwong Yiu

Hong Kong Polytechnic University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kwok-Kwong Yiu is active.

Explore More

Publication

Featured researches published by Kwok-Kwong Yiu.

Computer Speech & Language | 2007

Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning

Kwok-Kwong Yiu; Man-Wai Mak; Sun-Yuan Kung

In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations, (2) reinforced learning, and (3) stochastic feature transformation to reduce the effect caused by the acoustic distortion. Specifically, during training, the clean speaker models and background models are firstly transformed by MLLR-based handset-specific transformations using a small amount of distorted speech data. Then reinforced learning is applied to adapt the transformed models to handset-dependent speaker models and handset-dependent background models using stochastically transformed speaker patterns. During a verification session, a GMM-based handset classifier is used to identify the most likely handset used by the claimant; then the corresponding handset-dependent speaker and background model pairs are used for verification. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on the combination of MLLR, reinforced learning and feature transformation outperforms CMS, Hnorm, Tnorm, and speaker model synthesis.

Neural Computing and Applications | 1999

Gaussian Mixture Models and Probabilistic Decision-Based Neural Networks for Pattern Classification: A Comparative Study

Kwok-Kwong Yiu; Man-Wai Mak; Chi-Kwong Li

Probabilistic Decision-Based Neural Networks (PDBNNs) can be considered as a special form of Gaussian Mixture Models (GMMs) with trainable decision thresholds. This paper provides detailed illustrations to compare the recognition accuracy and decision boundaries of PDBNNs with that of GMMs through two pattern recognition tasks, namely the noisy XOR problem and the classification of two-dimensional vowel data. The paper highlights the strengths of PDBNNs by demonstrating that their thresholding mechanism is very effective in detecting data not belonging to any known classes. The original PDBNNs use elliptical basis functions with diagonal covariance matrices, which may be inappropriate for modelling feature vectors with correlated components. This paper overcomes this limitation by using full covariance matrices, and showing that the matrices are effective in characterising non-spherical clusters.

Neurocomputing | 2007

Probabilistic feature-based transformation for speaker verification over telephone networks

Man-Wai Mak; Kwok-Kwong Yiu; Sun-Yuan Kung

Feature transformation aims to reduce the effects of channel- and handset-distortion in telephone-based speaker verification. This paper compares several feature transformation techniques and evaluates their verification performance and computation time under the 2000 NIST speaker recognition evaluation protocol. Techniques compared include feature mapping (FM), stochastic feature transformation (SFT), blind stochastic feature transformation (BSFT), feature warping (FW), and short-time Gaussianization (STG). The paper proposes a probabilistic feature mapping (PFM) in which the mapped features depend not only on the top-1 decoded Gaussian but also on the posterior probabilities of other Gaussians in the root model. The paper also proposes speeding up the computation of PFM and BSFT parameters by considering the top few Gaussians only. Results show that PFM performs slightly better than FM and that the fast approach can reduce computation time substantially. Among the approaches investigated, the fast BSFT (fBSFT) strikes a good balance between computational complexity and error rates, and FW and STG are the best in terms of error rates but with higher computational complexity. It was also found that fusion of the scores derived from systems using fBSFT and STG can reduce the error rate further. This study advocates that fBSFT, FW, and STG have the highest potential for robust speaker verification over telephone networks because they achieve good performance without any a priori knowledge of the communication channel.

signal processing systems | 2006

Blind Stochastic Feature Transformation for Channel Robust Speaker Verification

Kwok-Kwong Yiu; Man-Wai Mak; Ming-Cheung Cheung; Sun-Yuan Kung

To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However, it is also important to ensure that the channel compensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresses this problem by a blind feature-based transformation approach in which the transformation parameters are determined online without any a priori knowledge of channel characteristics. Specifically, a composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics of enrollment speech. Based on the difference between the claimants speech and the composite model, a stochastic matching type of approach is proposed to transform the claimants speech to a region close to the enrollment speech. Therefore, the algorithm can estimate the transformation online without the necessity of detecting the handset types. Experimental results based on the 2001 NIST evaluation set show that the proposed transformation approach achieves significant improvement in both equal error rate and minimum detection cost as compared to cepstral mean subtraction and Znorm.

International Journal of Neural Systems | 2002

A comparative study on kernel-based probabilistic neural networks for speaker verification.

Kwok-Kwong Yiu; Man-Wai Mak; Sun-Yuan Kung

This paper compares kernel-based probabilistic neural networks for speaker verification based on 138 speakers of the YOHO corpus. Experimental evaluations using probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models were conducted. The original training algorithm of PDBNNs was also modified to make PDBNNs appropriate for speaker verification. Results show that the equal error rate obtained by PDBNNs and GMMs is less than that of EBFNs (0.33% vs. 0.48%), suggesting that GMM- and PDBNN-based speaker models outperform the EBFN ones. This work also finds that the globally supervised learning of PDBNNs is able to find decision thresholds that not only maintain the false acceptance rates to a low level but also reduce their variation, whereas the ad-hoc threshold-determination approach used by the EBFNs and GMMs causes a large variation in the error rates. This property makes the performance of PDBNN-based systems more predictable.

pacific rim conference on multimedia | 2001

A GMM-Based Handset Selector for Channel Mismatch Compensation with Applications to Speaker Identification

Kwok-Kwong Yiu; Man-Wai Mak; Sun-Yuan Kung

In telephone-based speaker identification, variation in handset characteristics can introduce severe speech variability even for speech uttered by the same speaker. This paper proposes a method, a number of Gaussian mixture models are independently trained to identify the most likely handset given a test utterance. The identified handset is used to select a compensation vector from a set of pre-computed vectors, where the pre-computed vectors are the average frame-by-frame differences between the clean and distorted utterance. The clean features are than recovered by subtracting the selected compensation vector from the distorted vectors. Experimental results based on 138 speakers of the YOHO and telephone YOHO corpora show that the proposed approach is computationally efficient and is able to increase the accuracy from 17% (without compensation) to 85% (with compensation).

international conference on neural information processing | 2002

Speaker verification with a priori threshold determination using kernel-based probabilistic neural networks

Kwok-Kwong Yiu; Man-Wai Mak; Sun-Yuan Kung

This paper compares kernel-based probabilistic neural networks for speaker verification. Experimental evaluations based on 138 speakers of the YOHO corpus using probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models were conducted. The original PDBNN training algorithm was also modified to make PDBNNs appropriate for speaker verification. Results show that the equal error rate obtained by PDBNNs and GMMs is about half of that of EBFNs (1.19% vs. 2.73%), suggesting that GMM- and PDBNN-based speaker models outperform the EBFN one. This work also finds that the globally supervised learning of PDBNNs is able to find a set of decision thresholds that reduce the variation in FAR, whereas the ad hoc approach used by the EBFNs and GMMs is not able to do so. This property makes the performance of PDBNN-based systems more predictable.

international conference on signal processing | 1998

Probabilistic decision-based neural networks for speech pattern classification

Kwok-Kwong Yiu; Man-Wai Mak; Chi-Kwong Li

Probabilistic decision-based neural networks (PDBNNs) were originally proposed by Lin, Kung and Lin (1997) for human face recognition. Although high recognition accuracy has been achieved, not many illustrations were given to highlight the characteristics of the decision boundaries. This paper aims at providing detailed illustrations to compare the decision boundaries of PDBNNs with that of Gaussian mixture models through a pattern recognition task, namely the classification of two-dimensional vowel data. The original PDBNNs use elliptical basis functions with diagonal covariance matrices, which may be inefficient for modeling feature vectors with correlated components. This paper attempts to tackle this problem by using full covariance matrices. The paper also highlights the strengths of PDBNNs by demonstrating that the PDBNNs thresholding mechanism is very effective in rejecting data not belonging to any known classes.

pacific rim conference on multimedia | 2002

Kernel-Based Probabilistic Neural Networks with Integrated Scoring Normalization for Speaker Verification

Kwok-Kwong Yiu; Man-Wai Mak; Sun-Yuan Kung

This paper investigates kernel-based probabilistic neural networks for speaker verification in clean and noisy environments. In particular, it compares the performance and characteristics of speaker verification systems that use probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models. Experimental evaluations based on 138 speakers of the YOHO corpus and its noisy variants were conducted. The original PDBNN training algorithm was also modified to make PDBNNs appropriate for speaker verification. Experimental evaluations, based on 138 speakers and the visualization of decision boundaries, indicate that GMM- and PDBNN-based speaker models are superior to the EBFN ones in terms of performance and generalization capability. This work also finds that PDBNNs and GMMs are more robust than EBFNs in verifying speakers in noise environments.

conference of the international speech communication association | 2003