Is this you? Create Your Porfile

I-Fan Chen

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where I-Fan Chen is active.

Explore More

Publication

Featured researches published by I-Fan Chen.

international conference on acoustics, speech, and signal processing | 2015

Low-resource keyword search strategies for tamil

Nancy F. Chen; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Van Tung Pham; Haihua Xu; Xiong Xiao; Tze Siong Lau; Su Jun Leow; Boon Pang Lim; Cheung-Chi Leung; Lei Wang; Chin-Hui Lee; Alvina Goh; Eng Siong Chng; Bin Ma; Haizhou Li

We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.

international conference on acoustics, speech, and signal processing | 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili

Nancy F. Chen; Van Tung Pharri; Haihua Xu; Xiong Xiao; Van Hai Do; Chongjia Ni; I-Fan Chen; Sunil Sivadas; Chin-Hui Lee; Eng Siong Chng; Bin Ma; Haizhou Li

We present exemplar-inspired low-resource spoken keyword search strategies for acoustic modeling, keyword verification, and system combination. This state-of-the-art system was developed by the SINGA team in the context of the 2015 NIST Open Keyword Search Evaluation (OpenKWS15) using conversational Swahili provided by the IARPA Babel program. In this work, we elaborate on the following: (1) exploiting exemplar training samples to construct a non-parametric acoustic model using kernel density estimation at test time; (2) rescoring hypothesized keyword detections through quantifying their acoustic similarity with exemplar training samples; (3 ) extending our previously proposed system combination approach to incorporate prosody features of exemplar keyword samples.

international symposium on chinese spoken language processing | 2014

A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search

I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee

A novel spoken keyword search grammar representation framework is proposed to combine the advantages of conventional keyword-filler based keyword search (KWS) and the LVCSR-based KWS systems. The proposed grammar representation allows keyword search systems to be flexible on keyword target settings as in the LVCSR-based keyword search. In low-resource scenarios it also provides the system with the ability to achieve high keyword detection accuracies as in the keyword-filler based KWS systems and to attain a low false alarm rate inherent in the LVCSR-based KWS systems. In this paper the proposed grammar is realized in three ways by modifying the language models used in LVCSR-based KWS. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese tasks, experimental results indicate that the combined approaches achieve a significant ATWV improvement of more than 50% relatively (from 0.2093 to 0.3287) on the limited-language-pack task, while a 20% relative ATWV improvement (from 0.4578 to 0.5486) is observed on the full-language-pack task.

international conference on acoustics, speech, and signal processing | 2015

A keyword-aware grammar framework for LVCSR-based spoken keyword search

I-Fan Chen; Chongjia Ni; Boon Pang Lim; Nancy F. Chen; Chin-Hui Lee

In this paper, we proposed a method to realize the recently developed keyword-aware grammar for LVCSR-based keyword search using weight finite-state automata (WFSA). The approach creates a compact and deterministic grammar WFSA by inserting keyword paths to an existing n-gram WFSA. Tested on the evalpart1 data of the IARPA Babel OpenKWS13 Vietnamese and OpenKWS14 Tamil limited language pack tasks, the experimental results indicate the proposed keyword-aware framework achieves significant improvement, with about 50% relative actual term weighted value (ATWV) enhancement for both languages. Comparisons between the keyword-aware grammar and our previously proposed n-gram LM based approximation approach for the grammar also show that the KWS performances of these two realizations are complementary.

spoken language technology workshop | 2014

System and keyword dependent fusion for spoken term detection

Van Tung Pham; Nancy F. Chen; Sunil Sivadas; Haihua Xu; I-Fan Chen; Chongjia Ni; Eng Siong Chng; Haizhou Li

System combination (or data fusion1) is known to provide significant improvement for spoken term detection (STD). The key issue of the system combination is how to effectively fuse the various scores of participant systems. Currently, most system combination methods are system and keyword independent, i.e. they use the same arithmetic functions to combine scores for all keywords. Although such strategy improve keyword search performance, the improvement is limited. In this paper we first propose an arithmetic-based system combination method to incorporate the system and keyword characteristics into the fusion procedure to enhance the effectiveness of system combination. The method incorporates a system-keyword dependent property, which is the number of acceptances in this paper, into the combination procedure. We then introduce a discriminative model to combine various useful system and keyword characteristics into a general framework. Improvements over standard baselines are observed on the Vietnamese data from IARPA Babel program with the NIST OpenKWS13 Evaluation setup.

international conference on acoustics, speech, and signal processing | 2014

Attribute based lattice rescoring in spontaneous speech recognition

I-Fan Chen; Sabato Marco Siniscalchi; Chin-Hui Lee

In this paper we extend attribute-based lattice rescoring to spontaneous speech recognition. This technique is based on two key features: (i) an attribute-based frontend, which consists of a bank of speech attribute detectors followed up by an evidence merger that generates confidence scores (e.g., sub-word posterior probabilities), and (ii) a rescoring module that integrates information generated by the frontend into an existing ASR engine through lattice rescoring. The speech attributes used in this work are phonetic features, such as frication and palatalization. Experimental results on the Switchboard part of the NIST 2000 Hub5 data set demonstrate that the proposed approach outperforms LVCSR systems based on Gaussian mixture model/ hidden Markov model (GMM/HMM) that does not use attribute related information. Furthermore, a small yet promising improvement is also observed when rescoring word-lattices generated by a state-of-the-art ASR system using deep neural networks. Different frontend configuration are investigated and tested.

asia pacific signal and information processing association annual summit and conference | 2016

Towards a direct Bayesian adaptation framework for deep models

Zhen Huang; Sabato Marco Siniscalchi; I-Fan Chen; Chin-Hui Lee

We attempt to formulate Bayesian speaker adaptation for deep models and explore two different solutions. In the first “indirect” approach, Bayesian adaptation is applied to context-dependent, Gaussian-mixture-model based hidden Markov models (CD-GMM-HMMs) with bottleneck (BN) features derived from deep neural networks (DNNs). The second method directly formulates Bayesian adaptation for CD-DNN-HMMs by casting the adaptation step into a generative framework to formulate maximum-likelihood (ML) and maximum a posteriori (MAP) adaptation schemes. Experiments on the Wall Street Journal task demonstrate that both MAP and Structural MAP (SMAP) adaptation schemes are effective even with discriminative BN features. Furthermore, SMAP can attain a meaningful word error reduction (WERR) of 7.3% even when 80 hours of data, and 284 different speakers are available at training time. We have also observed a notable performance improvement with the indirect approach, and that supports the plausibility of proposed solution towards this novel direction.

asia-pacific signal and information processing association annual summit and conference | 2013

An experimental study on structural-MAP approaches to implementing very large vocabulary speech recognition systems for real-world tasks

I-Fan Chen; Sabato Marco Siniscalchi; Seokyong Moon; Daejin Shin; Myoung-Wan Koo; Minhwa Chung; Chin-Hui Lee

In this paper we present an experimental study exploiting structural Bayesian adaptation for handling potential mismatches between training and test conditions for real-world applications to be realized in our multilingual very large vocabulary speech recognition (VLVSR) system project sponsored by MOTIE (The Ministry of Trade, Industry and Energy), Republic of Korea. The goal of the project is to construct a national-wide VLVSR cloud service platform for mobile applications. Besides system architecture design issues, at such a large scale, performance robustness problems, caused by mismatches in speakers, tasks, environments, and domains, etc., need to be taken into account very carefully as well. We decide to adopt adaptation, especially the structural MAP, techniques to reduce system accuracy degradation caused by these mismatches. Being part of an ongoing project, we describe how structural MAP approaches can be used for adaptation of both acoustic and language models for our VLVSR systems, and provide convincing experimental results to demonstrate how adaptation can be utilized to bridge the performance gap between the current state-of-the-art and deployable VLVSR systems.

conference of the international speech communication association | 2015