Is this you? Create Your Porfile

Chongjia Ni

Agency for Science, Technology and Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chongjia Ni is active.

Explore More

Publication

Featured researches published by Chongjia Ni.

international conference on acoustics, speech, and signal processing | 2015

Low-resource keyword search strategies for tamil

Nancy F. Chen; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Van Tung Pham; Haihua Xu; Xiong Xiao; Tze Siong Lau; Su Jun Leow; Boon Pang Lim; Cheung-Chi Leung; Lei Wang; Chin-Hui Lee; Alvina Goh; Eng Siong Chng; Bin Ma; Haizhou Li

We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.

international conference on acoustics, speech, and signal processing | 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili

Nancy F. Chen; Van Tung Pharri; Haihua Xu; Xiong Xiao; Van Hai Do; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Chin-Hui Lee; Eng Siong Chng; Bin Ma; Haizhou Li

We present exemplar-inspired low-resource spoken keyword search strategies for acoustic modeling, keyword verification, and system combination. This state-of-the-art system was developed by the SINGA team in the context of the 2015 NIST Open Keyword Search Evaluation (OpenKWS15) using conversational Swahili provided by the IARPA Babel program. In this work, we elaborate on the following: (1) exploiting exemplar training samples to construct a non-parametric acoustic model using kernel density estimation at test time; (2) rescoring hypothesized keyword detections through quantifying their acoustic similarity with exemplar training samples; (3 ) extending our previously proposed system combination approach to incorporate prosody features of exemplar keyword samples.

international symposium on chinese spoken language processing | 2014

A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search

I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee

A novel spoken keyword search grammar representation framework is proposed to combine the advantages of conventional keyword-filler based keyword search (KWS) and the LVCSR-based KWS systems. The proposed grammar representation allows keyword search systems to be flexible on keyword target settings as in the LVCSR-based keyword search. In low-resource scenarios it also provides the system with the ability to achieve high keyword detection accuracies as in the keyword-filler based KWS systems and to attain a low false alarm rate inherent in the LVCSR-based KWS systems. In this paper the proposed grammar is realized in three ways by modifying the language models used in LVCSR-based KWS. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese tasks, experimental results indicate that the combined approaches achieve a significant ATWV improvement of more than 50% relatively (from 0.2093 to 0.3287) on the limited-language-pack task, while a 20% relative ATWV improvement (from 0.4578 to 0.5486) is observed on the full-language-pack task.

international conference on acoustics, speech, and signal processing | 2015

A keyword-aware grammar framework for LVCSR-based spoken keyword search

I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee

In this paper, we proposed a method to realize the recently developed keyword-aware grammar for LVCSR-based keyword search using weight finite-state automata (WFSA). The approach creates a compact and deterministic grammar WFSA by inserting keyword paths to an existing n-gram WFSA. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese and OpenKWS14 Tamil limited language pack tasks, the experimental results indicate the proposed keyword-aware framework achieves significant improvement, with about 50% relative actual term weighted value (ATWV) enhancement for both languages. Comparisons between the keyword-aware grammar and our previously proposed n-gram LM based approximation approach for the grammar also show that the KWS performances of these two realizations are complementary.

international conference on acoustics, speech, and signal processing | 2015

Submodular data selection with acoustic and phonetic features for automatic speech recognition

Chongjia Ni; Lei Wang; Haibo Liu; Cheung-Chi Leung; Li Lu; Bin Ma

In this paper, we propose to use acoustic feature based submodular function optimization to select a subset of untranscribed data for manual transcription, and retrain the initial acoustic model with the additional transcribed data. The acoustic features are obtained from an unsupervised Gaussian mixture model. We also integrate the acoustic features with the phonetic features, which are obtained from an initial ASR system, in the submodular function. Submodular function optimization has been theoretically shown its near-optimal guarantee. We performed the experiments on 1000 hours of Mandarin mobile phone speech, in which 300 hours of initial data was for the training of an initial acoustic model. The experimental results show that the acoustic feature based approach, which does not rely on an initial ASR system, performs as well as the phonetic feature based approach. Moreover, there is complementary effect between the acoustic feature based and the phonetic feature based data selection. The submodular function with the combined features provides a relative 4.8% character error rate (CER) reduction over the corresponding ASR system using random selection. We also include the desired feature distribution obtained from a development set in a generalized function, but the improvement is insignificant.

spoken language technology workshop | 2014

System and keyword dependent fusion for spoken term detection

Van Tung Pham; Nancy F. Chen; Sunil Sivadas; Haihua Xu; I-Fan Chen; Chongjia Ni; Eng Siong Chng; Haizhou Li

System combination (or data fusion1) is known to provide significant improvement for spoken term detection (STD). The key issue of the system combination is how to effectively fuse the various scores of participant systems. Currently, most system combination methods are system and keyword independent, i.e. they use the same arithmetic functions to combine scores for all keywords. Although such strategy improve keyword search performance, the improvement is limited. In this paper we first propose an arithmetic-based system combination method to incorporate the system and keyword characteristics into the fusion procedure to enhance the effectiveness of system combination. The method incorporates a system-keyword dependent property, which is the number of acceptances in this paper, into the combination procedure. We then introduce a discriminative model to combine various useful system and keyword characteristics into a general framework. Improvements over standard baselines are observed on the Vietnamese data from IARPA Babel program with the NIST OpenKWS13 Evaluation setup.

conference of the international speech communication association | 2016

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis.

Cheung-Chi Leung; Lei Wang; Haihua Xu; Jingyong Hou; Van Tung Pham; Hang Lv; Lei Xie; Xiong Xiao; Chongjia Ni; Bin Ma; Eng Siong Chng; Haizhou Li

This paper documents the significant components of a state-ofthe-art language-independent query-by-example spoken term detection system designed for the Query by Example Search on Speech Task (QUESST) in MediaEval 2015. We developed exact and partial matching DTW systems, and WFST based symbolic search systems to handle different types of search queries. To handle the noisy and reverberant speech in the task, we trained tokenizers using data augmented with different noise and reverberation conditions. Our postevaluation analysis showed that the phone boundary label provided by the improved tokenizers brings more accurate speech activity detection in DTW systems. We argue that acoustic condition mismatch is possibly a more important factor than language mismatch for obtaining consistent gain from stacked bottleneck features. Our post-evaluation system, involving a smaller number of component systems, can outperform our submitted systems, which performed the best for the task.

international conference on acoustics, speech, and signal processing | 2015

Unsupervised data selection and word-morph mixed language model for tamil low-resource keyword search

Chongjia Ni; Cheung-Chi Leung; Lei Wang; Nancy F. Chen; Bin Ma

This paper considers an unsupervised data selection problem for the training data of an acoustic model and the vocabulary coverage of a keyword search system in low-resource settings. We propose to use Gaussian component index based n-grams as acoustic features in a submodular function for unsupervised data selection. The submodular function provides a near-optimal solution in terms of the objective being optimized. Moreover, to further resolve the high out-of-vocabulary (OOV) rate for morphologically-rich languages like Tamil, word-morph mixed language modeling is also considered. Our experiments are conducted on the Tamil speech provided by the IAPRA Babel program for the 2014 NIST Open Keyword Search Evaluation (OpenKWS14). We show that the selection of data plays an important role to the word error rate of the speech recognition system and the actual term weighted value (ATWV) of the keyword search system. The 10 hours of speech selected from the full language pack (FLP) using the proposed algorithm provides a relative 23.2% and 20.7% ATWV improvement over two other data subsets, the 10-hour data from the limited language pack (LLP) defined by IARPA and the 10 hours of speech randomly selected from the FLP, respectively. The proposed algorithm also increases the vocabulary coverage, implicitly alleviating the OOV problem: The number of OOV search terms drops from 1,686 and 1,171 in the two baseline conditions to 972.

international conference on acoustics, speech, and signal processing | 2016

Cross-lingual deep neural network based submodular unbiased data selection for low-resource keyword search

Chongjia Ni; Cheung-Chi Leung; Lei Wang; Haibo Liu; Feng Rao; Li Lu; Nancy F. Chen; Bin Ma; Haizhou Li

In this paper, we propose a cross-lingual deep neural network (DNN) based submodular unbiased data selection approach for low-resource keyword search (KWS). A small amount (e.g. one hour) of transcribed data is used to conduct cross-lingual transfer. The frame-level senone sequence activated by the cross-lingual DNN is used to represent each untranscribed speech utterance. The proposed submodular function considers utterance length normalization and the feature distribution matched to a development set. Experiments are conducted by selecting 9 hours of Tamil speech for the 2014 NIST Open Keyword Search Evaluation (OpenKWS14). The proposed data selection approach provides 35.8% relative actual term weighted value (ATWV) improvement over random selection on the OpenKWS14 Evalpartl data set. Further analysis of the experimental results shows that both utterance length normalization and the feature distribution estimated from a development set deployed in the submodular function can suppress the preference to select long utterances. The selected utterances can cover a more diverse range of tri-phones, words, and acoustic variations from a wider set of utterances. Moreover, the wider coverage of words also benefits the acquired linguistic knowledge, which also contributes to improving KWS performance.

international symposium on chinese spoken language processing | 2014

Investigation of using different Chinese word segmentation standards and algorithms for automatic speech recognition

Chongjia Ni; Cheung-Chi Leung

Chinese word segmentation (CWS) is a necessary step in Mandarin Chinese automatic speech recognition (ASR), and it has an impact on the results of ASR. However, there are few works on the relations between CWS and ASR. CWS settings, including segmentation standards and algorithms, are involved in building a segmenter. In this paper, four CWS standards and three CWS algorithms, including maximum matching, term frequency based and conditional random field (CRF) based algorithms, are investigated for ASR performance. Our experiments on the second Sighan Bakeoff data and Mandarin Chinese conversational telephone speech show that a better segmentation performance does not necessarily lead to a better ASR performance. Maximum matching and the term frequency based algorithm, which are classified as lexicon-based algorithms, are more flexible to update their vocabulary inventories according to the application need. We find that these two algorithms can provide similar ASR performance as the CRF-based algorithm. Motivated by the availability of huge amounts of web text data, we investigate whether this can improve the term frequency based algorithm and thus the ASR performance. Lastly we find that combining the two lexicon-based algorithms through language model interpolation can further improve the ASR performance.

Explore More