Yik-Cheung Tam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yik-Cheung Tam is active.

Explore More

Publication

Featured researches published by Yik-Cheung Tam.

Machine Translation | 2007

Bilingual LSA-based adaptation for statistical machine translation

Yik-Cheung Tam; Ian R. Lane; Tanja Schultz

We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.

international conference on acoustics, speech, and signal processing | 2007

Correlated Latent Semantic Model for Unsupervised LM Adaptation

Yik-Cheung Tam; Tanja Schultz

We propose a latent Dirichlet-tree allocation (LDTA) model - a correlated latent semantic model - for unsupervised language model adaptation. The LDTA model extends the latent Dirichlet allocation (LDA) model by replacing a Dirichlet prior with a Dirichlet-tree prior over the topic proportions. Latent topics under the same subtree are expected to be more correlated than topics under different subtrees. The LDTA model falls back to the LDA model using a depth-one Dirichlet-tree, and the model fits to the variational Bayes inference framework employed in the LDA model. Empirical results show that the LDTA model has a faster training convergence than the LDA model with the same initial flat model. Experimental results show that LDTA-adapted LM performed better than LDA-adapted LM on the Mandarin RT04-eval set when the models were trained using a small text corpus, while both models had the same recognition performance when the models were trained using a big text corpus. We observed 0.4% absolute CER reduction after LM adaptation using LSA marginals.

international conference on acoustics, speech, and signal processing | 2014

ASR error detection using recurrent neural network language model and complementary ASR

Yik-Cheung Tam; Yun Lei; Jing Zheng; Wen Wang

Detecting automatic speech recognition (ASR) errors can play an important role for effective human-computer spoken dialogue system, as recognition errors can hinder accurate system understanding of user intents. Our goal is to locate errors in an utterance so that the dialogue manager can pose appropriate clarification questions to the users. We propose two approaches to improve ASR error detection: (1) using recurrent neural network language models to capture long-distance word context within and across previous utterances; (2) using a complementary ASR system. The intuition is that when two complementary ASR systems disagree on a region in an utterance, this region is most likely an error. We train a neural network predictor of errors using a variety of features. We performed experiments on both English and Iraqi Arabic ASR and observed significant improvement in error detection using the proposed methods.

international conference on acoustics, speech, and signal processing | 2002

Discriminative auditory features for robust speech recognition

Brian Mak; Yik-Cheung Tam; Qi Li

Recently, Li et al. proposed a new auditory feature for robust speech recognition in noise environments. The new feature was derived by mimicking closely the function of human auditory process. Several filters were used to model the outer ear, middle ear, and cochlea, and the initial filter parameters and shapes were obtained from crude psychoacoustics results, experience, or experiments. Although one may adjust the feature parameters by hand to get better performance, the resulting feature parameters still may not be optimal in the sense of minimal recognition errors, especially for different tasks. To further improve the auditory feature, in this paper we apply discriminative training to optimize the auditory feature parameters with some guidance from psychoacoustic evidence but otherwise in a data-driven approach so as to minimize the recognition errors. One significant contribution over similar efforts in the past, such as discriminative feature extraction, is that we make no assumption on the parametric form of the auditory filters. Instead, we only require the filters to be smooth and triangular-like as suggested by psychoacoustics research. Our approach is evaluated on the Aurora database and achieves a word error reduction of 19.2%.

international conference on acoustics, speech, and signal processing | 2009

Generalized Baum-Welch algorithm for discriminative training on large vocabulary continuous speech recognition system

Roger Hsiao; Yik-Cheung Tam; Tanja Schultz

We propose a new optimization algorithm called Generalized Baum Welch (GBW) algorithm for discriminative training on hidden Markov model (HMM). GBW is based on Lagrange relaxation on a transformed optimization problem. We show that both Baum-Welch (BW) algorithm for ML estimate ofHMMparameters, and the popular extended Baum-Welch (EBW) algorithm for discriminative training are special cases of GBW.We compare the performance of GBW and EBW for Farsi large vocabulary continuous speech recognition (LVCSR).

international conference on acoustics, speech, and signal processing | 2014

An autoencoder with bilingual sparse features for improved statistical machine translation

Bing Zhao; Yik-Cheung Tam; Jing Zheng

Though sparse features have produced significant gains over traditional dense features in statistical machine translation, careful feature selection and feature engineering are necessary to avoid over-fitting in optimizations. However, many sparse features are highly overlapping with each other; that is, they cover the same or similar information of translational equivalence from slightly different points of view, and eventually overfit easily with only very feature training samples in given bilingual stochastic context-free grammar (SCFG) rules. We propose a natural autoencoder that maps all the discrete and overlapping sparse features for each SCFG rule into a continuous vector, so that the information encoded in sparse feature vectors becomes a dense vector that may enjoy more samples during training and avoid overfitting. Our experiments showed that for a 33-million bilingual SCFG rules statistical machine translation system, the autoencoder generalizes much better than sparse features alone using the same optimization framework.

international conference on acoustics, speech, and signal processing | 2014

Feature fusion for high-accuracy keyword spotting

Vikramjit Mitra; Julien van Hout; Horacio Franco; Dimitra Vergyri; Yun Lei; Martin Graciarena; Yik-Cheung Tam; Jing Zheng

This paper assesses the role of robust acoustic features in spoken term detection (a.k.a keyword spotting - KWS) under heavily degraded channel and noise corrupted conditions. A number of noise-robust acoustic features were used, both in isolation and in combination, to train large vocabulary continuous speech recognition (LVCSR) systems, with the resulting word lattices used for spoken term detection. Results indicate that the use of robust acoustic features improved KWS performance with respect to a highly optimized state-of-the art baseline system. It has been shown that fusion of multiple systems improve KWS performance, however the number of systems that can be trained is constrained by the number of frontend features. This work shows that given a number of frontend features it is possible to train several systems by using the frontend features by themselves along with different feature fusion techniques, which provides a richer set of individual systems. Results from this work show that KWS performance can be improved compared to individual feature based systems when multiple features are fused with one another and even further when multiple such systems are combined. Finally this work shows that fusion of fused and single feature bases systems provide significant improvement in KWS performance compared to fusion of singlefeature based systems.

international conference on acoustics, speech, and signal processing | 2002

An alternative approach of finding competing hypotheses for better minimum classification error training

Yik-Cheung Tam; Brian Mak

During minimum-classification-error (MCE) training, competing hypotheses against the correct one are commonly derived by the N-best algorithm. One problem with the N-best algorithm is that, in practice, some misclassified data can have very large misclassification distances from the N-best competitors and fall out of the steep/trainable region of the sigmoid function, and thus cannot be utilized effectively. Although one may alleviate the problem by adjusting the shape of the sigmoid and then using an appropriate learning rate, it requires careful tuning of these training parameters. In this paper, we propose using the nearest competing hypothesis instead of the traditional N-best hypotheses for MCE training. The aim is to keep the training data as close to the trainable region as possible. Consequently, the amount of “effective” training data is increased. Furthermore, by progressively beating the nearest competitors, the training seems to be more stable. We also design an approximation algorithm based on beam search to locate the nearest competing hypothesis efficiently. We compare the performance of MCE training using 1-nearest or 1-best competing hypotheses on the Aurora database and find that the new approach (using 1-nearest hypotheses) reduces the word error rates by 5.1% and 17.8% over the latter (of using the 1-best competing hypotheses) and the official Aurora baseline respectively.

international conference on acoustics, speech, and signal processing | 2009

Incorporating monolingual corpora into bilingual latent semantic analysis for crosslingual LM adaptation

Yik-Cheung Tam; Tanja Schultz

The major limitation in bilingual latent semantic analysis (bLSA) is the requirement of parallel training corpora. Motivated by semi-supervised learning, we propose a clusterbased bLSA training approach to incorporate monolingual corpora. Treating each parallel document pair as centroids of the parallel document clusters, each monolingual document is associated to the closest centroid according to their topic similarity. The resulting parallel document clusters are used as constraints to enforce a one-to-one topic correspondence in variational EM. Slight performance improvement in crosslingual language model adaptation is observed compared to the baseline without monolingual corpora.

conference of the international speech communication association | 2005