Is this you? Create Your Porfile

Nancy F. Chen

Agency for Science, Technology and Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nancy F. Chen is active.

Explore More

Publication

Featured researches published by Nancy F. Chen.

international conference on acoustics, speech, and signal processing | 2014

Strategies for Vietnamese keyword search

Nancy F. Chen; Sunil Sivadas; Boon Pang Lim; Hoang Gia Ngo; Haihua Xu; Van Tung Pham; Bin Ma; Haizhou Li

We propose strategies for a state-of-the-art Vietnamese keyword search (KWS) system developed at the Institute for Infocomm Research (I2R). The KWS system exploits acoustic features characterizing creaky voice quality peculiar to lexical tones in Vietnamese, a minimal-resource transliteration framework to alleviate out-of-vocabulary issues from foreign loan words, and a proposed system combination scheme FusionX. We show that the proposed creaky voice quality features complement pitch-related features, reaching fusion gains of 17.7% relative (6.9% absolute). To the best of our knowledge, the proposed transliteration framework is the first reported rule-based system for Vietnamese; it outperforms statistical-approach baselines up to 14.93-36.73% relative on foreign loan word search tasks. Using FusionX to combine 3 sub-systems, the actual term-weighted value (ATWV) reaches 0.4742, exceeding the ATWV=0.3 benchmark for IARPA Babel participants in the NIST OpenKWSB Evaluation.

international conference on acoustics, speech, and signal processing | 2015

Low-resource keyword search strategies for tamil

Nancy F. Chen; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Van Tung Pham; Haihua Xu; Xiong Xiao; Tze Siong Lau; Su Jun Leow; Boon Pang Lim; Cheung-Chi Leung; Lei Wang; Chin-Hui Lee; Alvina Goh; Eng Siong Chng; Bin Ma; Haizhou Li

We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.

international conference on acoustics, speech, and signal processing | 2016

Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling

Wei Li; Sabato Marco Siniscalchi; Nancy F. Chen; Chin-Hui Lee

We propose the use of speech attributes, such as voicing and aspiration, to address two key research issues in computer assisted pronunciation training (CAPT) for L2 learners, namely detecting mispronunciation and providing diagnostic feedback. To improve the performance we focus on mispronunciations occurred at the segmental and sub-segmental levels. In this study, speech attributes scores are first used to measure the pronunciation quality at a sub-segmental level, such as manner and place of articulation. These speech attribute scores are integrated by neural network classifiers to generate segmental pronunciation scores. Compared with the conventional phone-based GOP (Goodness of Pronunciation) system we implement with our dataset, the proposed framework reduces the equal error rate by 8.78% relative. Moreover, it attains comparable results to phone-based classifier approach to mispronunciation detection while providing comprehensive feedback, including segmental and sub-segmental diagnostic information, to help L2 learners.

international conference on acoustics, speech, and signal processing | 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili

Nancy F. Chen; Van Tung Pharri; Haihua Xu; Xiong Xiao; Van Hai Do; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Chin-Hui Lee; Eng Siong Chng; Bin Ma; Haizhou Li

We present exemplar-inspired low-resource spoken keyword search strategies for acoustic modeling, keyword verification, and system combination. This state-of-the-art system was developed by the SINGA team in the context of the 2015 NIST Open Keyword Search Evaluation (OpenKWS15) using conversational Swahili provided by the IARPA Babel program. In this work, we elaborate on the following: (1) exploiting exemplar training samples to construct a non-parametric acoustic model using kernel density estimation at test time; (2) rescoring hypothesized keyword detections through quantifying their acoustic similarity with exemplar training samples; (3 ) extending our previously proposed system combination approach to incorporate prosody features of exemplar keyword samples.

international symposium on chinese spoken language processing | 2014

A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search

I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee

A novel spoken keyword search grammar representation framework is proposed to combine the advantages of conventional keyword-filler based keyword search (KWS) and the LVCSR-based KWS systems. The proposed grammar representation allows keyword search systems to be flexible on keyword target settings as in the LVCSR-based keyword search. In low-resource scenarios it also provides the system with the ability to achieve high keyword detection accuracies as in the keyword-filler based KWS systems and to attain a low false alarm rate inherent in the LVCSR-based KWS systems. In this paper the proposed grammar is realized in three ways by modifying the language models used in LVCSR-based KWS. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese tasks, experimental results indicate that the combined approaches achieve a significant ATWV improvement of more than 50% relatively (from 0.2093 to 0.3287) on the limited-language-pack task, while a 20% relative ATWV improvement (from 0.4578 to 0.5486) is observed on the full-language-pack task.

international conference on acoustics, speech, and signal processing | 2014

Discriminative score normalization for keyword search decision

Van Tung Pham; Haihua Xu; Nancy F. Chen; Sunil Sivadas; Boon Pang Lim; Eng Siong Chng; Haizhou Li

Many keyword search (KWS) systems make “hit/false alarm (FA)” decisions based on the lattice-based posterior probability, which is incomparable across keywords. Therefore, score normalization is essential for a KWS system. In this paper, we investigate the integration of two novel features, ranking-score and relative-to-max, into a discriminative score normalization method. These features are extracted by considering all competing hypotheses of a putative detection. A metric-based normalization method is also applied as a post-processing step to further optimize the term-weighted value (TWV) evaluation metric. We report empirical improvements over standard baselines using the Vietnamese data from IARPAs Babel program in the NIST OpenKWS13 Evaluation setup.

international conference on acoustics, speech, and signal processing | 2015

A keyword-aware grammar framework for LVCSR-based spoken keyword search

I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee

In this paper, we proposed a method to realize the recently developed keyword-aware grammar for LVCSR-based keyword search using weight finite-state automata (WFSA). The approach creates a compact and deterministic grammar WFSA by inserting keyword paths to an existing n-gram WFSA. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese and OpenKWS14 Tamil limited language pack tasks, the experimental results indicate the proposed keyword-aware framework achieves significant improvement, with about 50% relative actual term weighted value (ATWV) enhancement for both languages. Comparisons between the keyword-aware grammar and our previously proposed n-gram LM based approximation approach for the grammar also show that the KWS performances of these two realizations are complementary.

spoken language technology workshop | 2014

System and keyword dependent fusion for spoken term detection

Van Tung Pham; Nancy F. Chen; Sunil Sivadas; Haihua Xu; I-Fan Chen; Chongjia Ni; Eng Siong Chng; Haizhou Li

System combination (or data fusion1) is known to provide significant improvement for spoken term detection (STD). The key issue of the system combination is how to effectively fuse the various scores of participant systems. Currently, most system combination methods are system and keyword independent, i.e. they use the same arithmetic functions to combine scores for all keywords. Although such strategy improve keyword search performance, the improvement is limited. In this paper we first propose an arithmetic-based system combination method to incorporate the system and keyword characteristics into the fusion procedure to enhance the effectiveness of system combination. The method incorporates a system-keyword dependent property, which is the number of acceptances in this paper, into the combination procedure. We then introduce a discriminative model to combine various useful system and keyword characteristics into a general framework. Improvements over standard baselines are observed on the Vietnamese data from IARPA Babel program with the NIST OpenKWS13 Evaluation setup.

Speech Communication | 2016

Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin

Nancy F. Chen; Darren Wee; Rong Tong; Bin Ma; Haizhou Li

In this work, we analyze phonetic and prosodic pronunciation patterns from iCALL, a speech corpus designed to evaluate Mandarin mispronunciations by non-native speakers of European origin and to address the lack of large-scale, non-native corpora with comprehensive annotations for applications in CAPT (computer-assisted pronunciation training). iCALL consists of 90,841 utterances from 305 speakers with a total duration of 142 hours. The speakers are from diverse linguistic backgrounds (spanning Germanic, Romance, and Slavic native languages). The read utterances are phonetically balanced with phonetic, tonal, and fluency annotations. Our findings on iCALL reveal that lexical tone errors are over six times more prevalent than phonetic errors, French speakers are twice as likely to mispronounce Tone 2, 3, 4 when compared to English speakers, native Romance language speakers are more likely to make de-aspiration and aspiration mistakes, and fluency scores correlate inversely with tone and phone error rate.

Procedia Computer Science | 2016

Mismatched Crowdsourcing based Language Perception for Under-resourced Languages

Wenda Chen; Mark Hasegawa-Johnson; Nancy F. Chen

Abstract Mismatched crowdsourcing is a technique for acquiring automatic speech recognizer training data in under-resourced languages by decoding the transcriptions of workers who don’t know the target language using a noisy-channel model of cross-language speech perception.xa0All previous mismatched crowdsourcing studies have used English transcribers; this study is the first to recruit transcribers with a different native language, in this case, Mandarin Chinese. Using these data we are able to compute statistical models of cross-language perception of the tones and phonemes from transcribers based on phone distinctive features and tone features. By analyzing the phonetic and tonal variation mappings and coverages compared with the dictionary of the target language, we evaluate the different native languages’ effect on the transcribers’ performances.

Explore More