Boon Pang Lim
Agency for Science, Technology and Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Boon Pang Lim.
international conference on acoustics, speech, and signal processing | 2014
Nancy F. Chen; Sunil Sivadas; Boon Pang Lim; Hoang Gia Ngo; Haihua Xu; Van Tung Pham; Bin Ma; Haizhou Li
We propose strategies for a state-of-the-art Vietnamese keyword search (KWS) system developed at the Institute for Infocomm Research (I2R). The KWS system exploits acoustic features characterizing creaky voice quality peculiar to lexical tones in Vietnamese, a minimal-resource transliteration framework to alleviate out-of-vocabulary issues from foreign loan words, and a proposed system combination scheme FusionX. We show that the proposed creaky voice quality features complement pitch-related features, reaching fusion gains of 17.7% relative (6.9% absolute). To the best of our knowledge, the proposed transliteration framework is the first reported rule-based system for Vietnamese; it outperforms statistical-approach baselines up to 14.93-36.73% relative on foreign loan word search tasks. Using FusionX to combine 3 sub-systems, the actual term-weighted value (ATWV) reaches 0.4742, exceeding the ATWV=0.3 benchmark for IARPA Babel participants in the NIST OpenKWSB Evaluation.
international conference on acoustics, speech, and signal processing | 2015
Nancy F. Chen; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Van Tung Pham; Haihua Xu; Xiong Xiao; Tze Siong Lau; Su Jun Leow; Boon Pang Lim; Cheung-Chi Leung; Lei Wang; Chin-Hui Lee; Alvina Goh; Eng Siong Chng; Bin Ma; Haizhou Li
We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.
international symposium on chinese spoken language processing | 2014
I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee
A novel spoken keyword search grammar representation framework is proposed to combine the advantages of conventional keyword-filler based keyword search (KWS) and the LVCSR-based KWS systems. The proposed grammar representation allows keyword search systems to be flexible on keyword target settings as in the LVCSR-based keyword search. In low-resource scenarios it also provides the system with the ability to achieve high keyword detection accuracies as in the keyword-filler based KWS systems and to attain a low false alarm rate inherent in the LVCSR-based KWS systems. In this paper the proposed grammar is realized in three ways by modifying the language models used in LVCSR-based KWS. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese tasks, experimental results indicate that the combined approaches achieve a significant ATWV improvement of more than 50% relatively (from 0.2093 to 0.3287) on the limited-language-pack task, while a 20% relative ATWV improvement (from 0.4578 to 0.5486) is observed on the full-language-pack task.
international conference on acoustics, speech, and signal processing | 2014
Van Tung Pham; Haihua Xu; Nancy F. Chen; Sunil Sivadas; Boon Pang Lim; Eng Siong Chng; Haizhou Li
Many keyword search (KWS) systems make “hit/false alarm (FA)” decisions based on the lattice-based posterior probability, which is incomparable across keywords. Therefore, score normalization is essential for a KWS system. In this paper, we investigate the integration of two novel features, ranking-score and relative-to-max, into a discriminative score normalization method. These features are extracted by considering all competing hypotheses of a putative detection. A metric-based normalization method is also applied as a post-processing step to further optimize the term-weighted value (TWV) evaluation metric. We report empirical improvements over standard baselines using the Vietnamese data from IARPAs Babel program in the NIST OpenKWS13 Evaluation setup.
international conference on acoustics, speech, and signal processing | 2015
I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee
In this paper, we proposed a method to realize the recently developed keyword-aware grammar for LVCSR-based keyword search using weight finite-state automata (WFSA). The approach creates a compact and deterministic grammar WFSA by inserting keyword paths to an existing n-gram WFSA. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese and OpenKWS14 Tamil limited language pack tasks, the experimental results indicate the proposed keyword-aware framework achieves significant improvement, with about 50% relative actual term weighted value (ATWV) enhancement for both languages. Comparisons between the keyword-aware grammar and our previously proposed n-gram LM based approximation approach for the grammar also show that the KWS performances of these two realizations are complementary.
international conference on acoustics, speech, and signal processing | 2014
Rong Tong; Boon Pang Lim; Nancy F. Chen; Bin Ma; Haizhou Li
In computer-assisted language learning (CALL), speech data from non-native speakers are usually insufficient for acoustic modeling. Subspace Gaussian Mixture Models (SGMM) have been effective in training automatic speech recognition (ASR) systems with limited amounts of training data. Therefore, in this work, we propose to use SGMM to improve the fluency assessment performance. In particular, the contributions of this work are: (i) The proposed SGMM acoustic model trained with native data outperforms the MMI-GMM/HMM baseline by 25% relative, (ii) when incorporating a small amount of non-native training data, the SGMM acoustic model further improves the performance of fluency assessment by 47% relative.
international conference on human-computer interaction | 2015
Andreea I. Niculescu; Mei Quin Lim; Seno A. Wibowo; Kheng Hui Yeo; Boon Pang Lim; Michael Popow; Dan Chia; Rafael E. Banchs
A current problem modern cities are facing is the increased traffic flow and heavily congested parking places. To reduce the time and traffic caused by finding available parking we propose IDA, an Intelligent Driver Assistant. The main objective of IDA is to help drivers to find suitable park places, to online monitor car park availability and to redirect drivers when the number of free available spots drops to a critical level. Unlike other parking applications, IDA uses speech to interact with the driver and becomes an active helper during the navigation process by adjusting dynamically the parking decisions based on the traffic situation. The paper presents the current work in progress, interaction design aspects, uses cases, as well as a first user feedback received during a public event where IDA was showcased.
conference of the international speech communication association | 2016
Van Hai Do; Nancy F. Chen; Boon Pang Lim; Mark Hasegawa-Johnson
When speech data with native transcriptions are scarce in an under-resourced language, automatic speech recognition (ASR) must be trained using other methods. Semi-supervised learning first labels the speech using ASR from other languages, then re-trains the ASR using the generated labels. Mismatched crowdsourcing asks crowd-workers unfamiliar with the language to transcribe it. In this paper, self-training and mismatched crowdsourcing are compared under exactly matched conditions. Specifically, speech data of the target language are decoded by the source language ASR systems into source language phone/word sequences. We find that (1) human mismatched crowdsourcing and cross-lingual ASR have similar error patterns, but different specific errors. (2) These two sources of information can be usefully combined in order to train a better target-language ASR. (3) The differences between the error patterns of non-native human listeners and non-native ASR are small, but when differences are observed, they provide information about the relationship between the phoneme systems of the annotator/source language (Mandarin) and the target language (Vietnamese).
international conference on acoustics, speech, and signal processing | 2015
Rong Tong; Nancy F. Chen; Boon Pang Lim; Bin Ma; Haizhou Li
Tone error is commonly observed in tonal language acquisition. Correct tone production is especially challenging for native speakers of non-tonal languages. In this paper, we exploit the fundamental frequency variation (FFV) feature for Mandarin tone error detection. We propose to use FFV through two approaches: (1) Concatenating FFVs along side with standard speech recognition features; (2) Token FFV: Characterizing pitch variation with longer temporal context through GMM tokenization and n-gram language modeling. Our results show that tone error detection improves by incorporating FFV features and the two approaches are complementary to each other.
international conference on asian language processing | 2016
Van Hai Do; Nancy F. Chen; Boon Pang Lim; Mark Hasegawa-Johnson
Mismatched crowdsourcing is a technique to derive speech transcriptions using crowd-workers unfamiliar with the language being spoken. This technique is especially useful for under-resourced languages since it is hard to hire native transcribers. In this paper, we demonstrate that using mismatched transcription for adaptation improves performance of speech recognition under limited matched training data conditions. In addition, we show that using data augmentation improves not only performance of monolingual system but also makes mismatched transcription adaptation more effective.