Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where K. K. Chin is active.

Publication


Featured researches published by K. K. Chin.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

Liang Lu; K. K. Chin; Arnab Ghoshal; Stephen Renals

Joint uncertainty decoding (JUD) is a model-based noise compensation technique for conventional Gaussian Mixture Model (GMM) based speech recognition systems. Unlike vector Taylor series (VTS) compensation which operates on the individual Gaussian components in an acoustic model, JUD clusters the Gaussian components into a smaller number of classes, sharing the compensation parameters for the set of Gaussians in a given class. This significantly reduces the computational cost. In this paper, we investigate noise compensation for subspace Gaussian mixture model (SGMM) based speech recognition systems using JUD. The total number of Gaussian components in an SGMM is typically very large. Therefore direct compensation of the individual Gaussian components, as performed by VTS, is computationally expensive. In this paper we show that JUD-based noise compensation can be successfully applied to SGMMs in a computationally efficient way. We evaluate the JUD/SGMM technique on the standard Aurora 4 corpus. Our experimental results indicate that the JUD/SGMM system results in lower word error rates compared with a conventional GMM system with either VTS-based or JUD-based noise compensation.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition

Haitian Xu; Mark J. F. Gales; K. K. Chin

Model-based noise compensation techniques are a powerful approach to improve speech recognition performance in noisy environments. However, one of the major issues with these schemes is that they are computationally expensive. Though techniques have been proposed to address this problem, they often result in degradations in performance. This paper proposes a new, highly flexible, approach which allows the computational load required for noise compensation to be controlled while maintaining good performance. The scheme applies the improved joint uncertainty decoding with the predictive linear transform framework. The final compensation is implemented as a set of linear transforms of the features, decoupling the computational cost of compensation from the complexity of the recognition system acoustic models. Furthermore, by using linear transforms, changes in the correlations in the feature vector can also be efficiently modeled. The proposed methods can be easily applied in an adaptive training scheme, including discriminative adaptive training. The performance of the approach is compared to a number of standard schemes on Aurora 2 as well as in-car speech recognition tasks. Results indicate that the proposed scheme is an attractive alternative to existing approaches.


international conference on acoustics, speech, and signal processing | 2009

Joint uncertainty decoding with the second order approximation for noise robust speech recognition

Haitian Xu; K. K. Chin

Joint uncertainty decoding has recently achieved promising results by integrating the front-end uncertainty into the back-end in a mathematically consistent framework. In this paper, joint uncertainty decoding is compared with the widely used vector Taylor series (VTS). We show that the two methods are identical except that joint uncertainty decoding applies the Taylor expansion on each regression class whereas VTS applies it to each HMM mixture. The relatively rougher expansion points used in joint uncertainty decoding make it computationally cheaper than VTS but inevitably worse on recognition accuracy. To overcome this drawback, this paper proposes an improved joint uncertainty decoding algorithm which employs second-order Taylor expansion on each regression class in order to reduce the expansion errors. Special considerations are further given to limit the overall computational cost by adopting different number of regression classes for different orders in the Taylor expansion. Experiments on the Aurora 2 database show that the proposed method is able to beat VTS on recognition accuracy and computational cost with relative improvement up to 6% and 60%, respectively.


international conference on acoustics, speech, and signal processing | 2011

Rapid joint speaker and noise compensation for robust speech recognition

K. K. Chin; Haitian Xu; Mark J. F. Gales; Catherine Breslin; Katherine Mary Knill

For speech recognition, mismatches between training and testing for speaker and noise are normally handled separately. The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function. The proposed method gives a faster and more optimum adaptation compared to compensating for these two factors separately. It is also more consistent with respect to the basic assumptions of speaker and noise adaptation. Experimental results show significant and consistent gains from the proposed method.


ieee automatic speech recognition and understanding workshop | 2009

Improving joint uncertainty decoding performance by predictive methods for noise robust speech recognition

Haitian Xu; Mark J. F. Gales; K. K. Chin

Model-based noise compensation techniques, such as Vector Taylor Series (VTS) compensation, have been applied to a range of noise robustness tasks. However one of the issues with these forms of approach is that for large speech recognition systems they are computationally expensive. To address this problem schemes such as Joint Uncertainty Decoding (JUD) have been proposed. Though computationally more efficient, the performance of the system is typically degraded. This paper proposes an alternative scheme, related to JUD, but making fewer approximations, VTS-JUD. Unfortunately this approach also removes some of the computational advantages of JUD. To address this, rather than using VTS-JUD directly, it is used instead to obtain statistics to estimate a predictive linear transform, PCMLLR. This is both computationally efficient and limits some of the issues associated with the diagonal covariance matrices typically used with schemes such as VTS. PCMLLR can also be simply used within an adaptive training framework (PAT). The performance of the VTS-JUD, PCMLLR and PAT system were compared to a number of standard approaches on an in-car speech recognition task. The proposed scheme is an attractive alternative to existing approaches.


international conference on acoustics, speech, and signal processing | 2011

Constrained discriminative mapping transforms for unsupervised speaker adaptation

Langzhou Chen; Mark J. F. Gales; K. K. Chin

Discriminative mapping transforms (DMTs) is an approach to robustly adding discriminative training to unsupervised linear adaptation transforms. In unsupervised adaptation DMTs are more robust to unreliable transcriptions than directly estimating adaptation transforms in a discriminative fashion. They were previously proposed for use with MLLR transforms with the associated need to explicitly transform the model parameters. In this work the DMT is extended to CMLLR transforms. As these operate in the feature space, it is only necessary to apply a different linear transform at the front-end rather than modifying the model parameters. This is useful for rapidly changing speakers/environments. The performance of DMTs with CMLLR was evaluated on the WSJ 20k task. Experimental results show that DMTs based on constrained linear transforms yield 3% to 6% relative gain over MLE transforms in unsupervised speaker adaptation.


international conference on acoustics, speech, and signal processing | 2008

Efficient language model look-ahead probabilities generation using lower order LM look-ahead information

Langzhou Chen; K. K. Chin

In this paper, an efficient method for language model look- ahead probability generation is presented. Traditional methods generate language model look-ahead (LMLA) probabilities for each node in the LMLA tree recursively in a bottom to up manner. The new method presented in this paper makes use of the sparseness of the n-gram model and starts the process of generating an n-gram LMLA tree from a backoff LMLA tree. Only a small number of nodes are updated with explicitly estimated LM probabilities. This speeds up the bigram and trigram LMLA tree generation by a factor of 3 and 12 respectively.


conference of the international speech communication association | 2012

Speech factorization for HMM-TTS based on cluster adaptive training.

Javier Latorre; Vincent Wan; Mark J. F. Gales; Langzhou Chen; K. K. Chin; Kate Knill; Masami Akamine


conference of the international speech communication association | 2010

Prior Information for Rapid Speaker Adaptation

Catherine Breslin; K. K. Chin; Mark J. F. Gales; Kate Knill; Haitian Xu


Archive | 2013

TEXT TO SPEECH METHOD AND SYSTEM

Javier Latorre-Martinez; Vincent Wan; K. K. Chin; Mark J. F. Gales; Katherine Mary Knill; Masami Akamine; Byung Ha Chung

Collaboration


Dive into the K. K. Chin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kate Knill

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge