Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simon Ka-Lung Ho is active.

Publication


Featured researches published by Simon Ka-Lung Ho.


IEEE Transactions on Speech and Audio Processing | 2005

Kernel eigenvoice speaker adaptation

Brian Mak; James Tin-Yau Kwok; Simon Ka-Lung Ho

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 s, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA using kernel methods may be even more effective. The eigenvoices thus derived will be called kernel eigenvoices (KEV), and we will call our new adaptation method kernel eigenvoice speaker adaptation. However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the high-dimensional kernel-induced feature space, which, in general, cannot be mapped back to an exact preimage in the input speaker supervector space. Consequently, it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions without the need of a speaker-adapted model in the input supervector space. In this paper, we investigate two different composite kernels for KEV adaptation: direct sum kernel and tensor product kernel. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptation using both forms of composite Gaussian kernels are equally effective, and they outperform a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1 and 4.1 s of speech. For example, with 2.1 s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, or MLLR adaptation are not effective at all.


international conference on acoustics, speech, and signal processing | 2004

A study of various composite kernels for kernel eigenvoice speaker adaptation

Brian Mak; James Tin-Yau Kwok; Simon Ka-Lung Ho

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when the amount of adaptation data is small, say, less than 10 seconds. In traditional eigenvoice (EV) speaker adaptation, linear principal component analysis (PCA) is used to derive the eigenvoices. Recently, we proposed that eigenvoices found by nonlinear kernel PCA could be more effective, and the eigenvoices thus derived were called kernel eigenvoices (KEV). One of our novelties is the use of composite kernel that makes it possible to compute state observation likelihoods via kernel functions. We investigate two different composite kernels: direct sum kernel and tensor product kernel for KEV adaptation. In an evaluation on the TIDIGITS task, it is found that KEV speaker adaptations using either form of composite kernel are equally effective, and they outperform a speaker-independent model and the adapted models from EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech. For example, with 2.1s of adaptation data, KEV adaptation outperforms the speaker-independent model by 27.5%, whereas EV, MAP, and MLLR adaptations are not effective at all.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting

Brian Mak; Roger Hsiao; Simon Ka-Lung Ho; James Tin-Yau Kwok

Recently, we proposed an improvement to the conventional eigenvoice (EV) speaker adaptation using kernel methods. In our novel kernel eigenvoice (KEV) speaker adaptation, speaker supervectors are mapped to a kernel-induced high dimensional feature space, where eigenvoices are computed using kernel principal component analysis. A new speaker model is then constructed as a linear combination of the leading eigenvoices in the kernel-induced feature space. KEV adaptation was shown to outperform EV, MAP, and MLLR adaptation in a TIDIGITS task with less than 10 s of adaptation speech. Nonetheless, due to many kernel evaluations, both adaptation and subsequent recognition in KEV adaptation are considerably slower than conventional EV adaptation. In this paper, we solve the efficiency problem and eliminate all kernel evaluations involving adaptation or testing observations by finding an approximate pre-image of the implicit adapted model found by KEV adaptation in the feature space; we call our new method embedded kernel eigenvoice (eKEV) adaptation. eKEV adaptation is faster than KEV adaptation, and subsequent recognition runs as fast as normal HMM decoding. eKEV adaptation makes use of multidimensional scaling technique so that the resulting adapted model lies in the span of a subset of carefully chosen training speakers. It is related to the reference speaker weighting (RSW) adaptation method that is based on speaker clustering. Our experimental results on Wall Street Journal show that eKEV adaptation continues to outperform EV, MAP, MLLR, and the original RSW method. However, by adopting the way we choose the subset of reference speakers for eKEV adaptation, we may also improve RSW adaptation so that it performs as well as our eKEV adaptation


international conference on acoustics, speech, and signal processing | 2005

Various reference speakers determination methods for embedded kernel eigenvoice speaker adaptation

Brian Mak; Simon Ka-Lung Ho

Recently, we proposed two improvements to the eigenvoice (EV) speaker adaptation using kernel methods: kernel eigenvoice (KEV) speaker adaptation, and embedded kernel eigenvoice (eKEV) speaker adaptation. In both KEV and eKEV adaptation methods, kernel eigenvoices are computed using kernel PCA, and an implicit speaker adapted model is defined as a linear combination of the leading kernel eigenvoices in the kernel-induced feature space. eKEV adaptation further finds an approximate pre-image of the implicit speaker adapted model so that all online kernel evaluations involving any acoustic vectors are eliminated during adaptation and subsequent recognition. The pre-image finding algorithm is cast as a constrained optimization problem using the distances between the expected pre-image and a set of pre-determined reference speakers as constraints. In this paper, we investigate two different ways to determine the reference speakers and the effect of their numbers on the eKEV adaptation performance.


international conference on machine learning and cybernetics | 2004

Using kernel PCA to improve eigenvoice speaker adaptation

Brian Mak; James Tin-Yau Kwok; Simon Ka-Lung Ho

Eigenvoice-based methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data is available. Conventionally, these methods employ linear principal component analysis (PCA) to find the most important eigenvoices. Recently, in what we called kernel eigenvoice (KEV) speaker adaptation, we suggested the use of kernel PCA to compute the eigenvoices so as to exploit possible nonlinearity in the data. The major challenge is that unlike the standard eigenvoice (EV) method, an adapted speaker model found by KEV adaptation resides in the high-dimensional kernel-induced feature space; it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions. In an evaluation on the TIDIGITS task using less than 10s of adaptation speech, it is found that KEV speaker adaptation using composite Gaussian kernels outperforms a speaker-independent model and adapted models found by EV, MAP, or MLLR adaptation using 2.1s and 4.1s of speech.


north american chapter of the association for computational linguistics | 2003

PLASER: pronunciation learning via automatic speech recognition

Brian Mak; Man-Hung Siu; Mimi Ng; Yik Cheung Tam; Yu Chung Chan; Kin Wah Chan; Ka Yee Leung; Simon Ka-Lung Ho; Fong Ho Chong; Jimmy Wong; Jacqueline Lo


Chemosphere | 2008

PCDD/F and dioxin-like PCB in Hong Kong air in relation to their regional transport in the Pearl River Delta region.

M.P.K. Choi; Simon Ka-Lung Ho; Benny K.L. So; Zongwei Cai; Alexis Kai-Hon Lau; Ming Hung Wong


neural information processing systems | 2003

Eigenvoice speaker adaptation via composite kernel PCA

James Tin-Yau Kwok; Brian Mak; Simon Ka-Lung Ho


conference of the international speech communication association | 2004

Speedup of Kernel Eigenvoice Speaker Adaptation by Embedded Kernel PCA

Brian Mak; Simon Ka-Lung Ho; James Tin-Yau Kwok


neural information processing systems | 2003

Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis

James Tin-Yau Kwok; Brian Mak; Simon Ka-Lung Ho

Collaboration


Dive into the Simon Ka-Lung Ho's collaboration.

Top Co-Authors

Avatar

Brian Mak

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

James Tin-Yau Kwok

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Alexis Kai-Hon Lau

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Fong Ho Chong

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jacqueline Lo

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jimmy Wong

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Ka Yee Leung

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Kin Wah Chan

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

M.P.K. Choi

Hong Kong Baptist University

View shared research outputs
Top Co-Authors

Avatar

Mimi Ng

Hong Kong University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge