Han-Ping Shen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Han-Ping Shen is active.

Explore More

Publication

Featured researches published by Han-Ping Shen.

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011

CECOS: A Chinese-English code-switching speech database

Han-Ping Shen; Chung-Hsien Wu; Yan-Ting Yang; Chun-Shan Hsu

With the increase on the demands for code-switching automatic speech recognition (ASR), the design and development of a code-switching speech database becomes highly desirable. However, it is not easy to collect sufficient code-switched utterances for model training for code-switching ASR. This study presents the procedure and experience for the design and development of a Chinese-English COde-switching Speech database (CECOS). Two different methods for collecting Chinese-English code-switched utterances are employed in this work. The applications of the collected database are also introduced. The CECOS database not only contains the speech data with code-switch properties but also accents due to non-native speakers. This database can be applied to several applications, such as code-switching speech recognition, language identification, named entity detection, etc.

ACM Transactions on Asian Language Information Processing | 2011

Articulation-Disordered Speech Recognition Using Speaker-Adaptive Acoustic Models and Personalized Articulation Patterns

Chung-Hsien Wu; Hung-Yu Su; Han-Ping Shen

This article presents a novel approach to speaker-adaptive recognition of speech from articulation-disordered speakers without a large amount of adaptation data. An unsupervised, incremental adaptation method is adopted for personalized model adaptation based on the recognized syllables with high recognition confidence from an automatic speech recognition (ASR) system. For articulation pattern discovery, the manually transcribed syllables and the corresponding recognized syllables are associated with each other using articulatory features. The Apriori algorithm is applied to discover the articulation patterns in the corpus, which are then used to construct a personalized pronunciation dictionary to improve the recognition accuracy of the ASR. The experimental results indicate that the proposed adaptation method achieves a syllable error rate reduction of 6.1%, outperforming the conventional adaptation methods that have a syllable error rate reduction of 3.8%. In addition, an average syllable error rate reduction of 5.04% is obtained for the ASR using the expanded pronunciation dictionary.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Speaker Clustering Using Decision Tree-Based Phone Cluster Models With Multi-Space Probability Distributions

Han-Ping Shen; Jui-Feng Yeh; Chung-Hsien Wu

This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.

international conference on acoustics, speech, and signal processing | 2012

Phone set construction based on context-sensitive articulatory attributes for code-switching speech recognition

Chung-Hsien Wu; Han-Ping Shen; Yan-Ting Yang

Bilingual speakers are known for their ability to code-switch or mix their languages during communication. This phenomenon occurs when bilinguals substitute a word or phrase from one language with a phrase or word from another language. For code-switching speech recognition, it is essential to collect a large-scale code-switching speech database for model training. In order to ease the negative effect caused by the data sparseness problem in training code-switching speech recognizers, this study proposes a data-driven approach to phone set construction by integrating acoustic features and cross-lingual context-sensitive articulatory features into distance measure between phone units. KL-divergence and a hierarchical phone unit clustering algorithm are used in this study to cluster similar phone units to reduce the need of the training data for model construction. The experimental results show that the proposed method outperforms other traditional phone set construction methods.

acm transactions on asian and low resource language information processing | 2015

Model Generation of Accented Speech using Model Transformation and Verification for Bilingual Speech Recognition

Han-Ping Shen; Chung-Hsien Wu; Pei-shan Tsai

Nowadays, bilingual or multilingual speech recognition is confronted with the accent-related problem caused by non-native speech in a variety of real-world applications. Accent modeling of non-native speech is definitely challenging, because the acoustic properties in highly-accented speech pronounced by non-native speakers are quite divergent. The aim of this study is to generate highly Mandarin-accented English models for speakers whose mother tongue is Mandarin. First, a two-stage, state-based verification method is proposed to extract the state-level, highly-accented speech segments automatically. Acoustic features and articulatory features are successively used for robust verification of the extracted speech segments. Second, Gaussian components of the highly-accented speech models are generated from the corresponding Gaussian components of the native speech models using a linear transformation function. A decision tree is constructed to categorize the transformation functions and used for transformation function retrieval to deal with the data sparseness problem. Third, a discrimination function is further applied to verify the generated accented acoustic models. Finally, the successfully verified accented English models are integrated into the native bilingual phone model set for Mandarin-English bilingual speech recognition. Experimental results show that the proposed approach can effectively alleviate recognition performance degradation due to accents and can obtain absolute improvements of 4.1%, 1.8%, and 2.7% in word accuracy for bilingual speech recognition compared to that using traditional ASR approaches, MAP-adapted, and MLLR-adapted ASR methods, respectively.

Archive | 2009