Han-Ping Shen
National Cheng Kung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Han-Ping Shen.
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011
Han-Ping Shen; Chung-Hsien Wu; Yan-Ting Yang; Chun-Shan Hsu
With the increase on the demands for code-switching automatic speech recognition (ASR), the design and development of a code-switching speech database becomes highly desirable. However, it is not easy to collect sufficient code-switched utterances for model training for code-switching ASR. This study presents the procedure and experience for the design and development of a Chinese-English COde-switching Speech database (CECOS). Two different methods for collecting Chinese-English code-switched utterances are employed in this work. The applications of the collected database are also introduced. The CECOS database not only contains the speech data with code-switch properties but also accents due to non-native speakers. This database can be applied to several applications, such as code-switching speech recognition, language identification, named entity detection, etc.
ACM Transactions on Asian Language Information Processing | 2011
Chung-Hsien Wu; Hung-Yu Su; Han-Ping Shen
This article presents a novel approach to speaker-adaptive recognition of speech from articulation-disordered speakers without a large amount of adaptation data. An unsupervised, incremental adaptation method is adopted for personalized model adaptation based on the recognized syllables with high recognition confidence from an automatic speech recognition (ASR) system. For articulation pattern discovery, the manually transcribed syllables and the corresponding recognized syllables are associated with each other using articulatory features. The Apriori algorithm is applied to discover the articulation patterns in the corpus, which are then used to construct a personalized pronunciation dictionary to improve the recognition accuracy of the ASR. The experimental results indicate that the proposed adaptation method achieves a syllable error rate reduction of 6.1%, outperforming the conventional adaptation methods that have a syllable error rate reduction of 3.8%. In addition, an average syllable error rate reduction of 5.04% is obtained for the ASR using the expanded pronunciation dictionary.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Han-Ping Shen; Jui-Feng Yeh; Chung-Hsien Wu
This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.
international conference on acoustics, speech, and signal processing | 2012
Chung-Hsien Wu; Han-Ping Shen; Yan-Ting Yang
Bilingual speakers are known for their ability to code-switch or mix their languages during communication. This phenomenon occurs when bilinguals substitute a word or phrase from one language with a phrase or word from another language. For code-switching speech recognition, it is essential to collect a large-scale code-switching speech database for model training. In order to ease the negative effect caused by the data sparseness problem in training code-switching speech recognizers, this study proposes a data-driven approach to phone set construction by integrating acoustic features and cross-lingual context-sensitive articulatory features into distance measure between phone units. KL-divergence and a hierarchical phone unit clustering algorithm are used in this study to cluster similar phone units to reduce the need of the training data for model construction. The experimental results show that the proposed method outperforms other traditional phone set construction methods.
acm transactions on asian and low resource language information processing | 2015
Han-Ping Shen; Chung-Hsien Wu; Pei-shan Tsai
Nowadays, bilingual or multilingual speech recognition is confronted with the accent-related problem caused by non-native speech in a variety of real-world applications. Accent modeling of non-native speech is definitely challenging, because the acoustic properties in highly-accented speech pronounced by non-native speakers are quite divergent. The aim of this study is to generate highly Mandarin-accented English models for speakers whose mother tongue is Mandarin. First, a two-stage, state-based verification method is proposed to extract the state-level, highly-accented speech segments automatically. Acoustic features and articulatory features are successively used for robust verification of the extracted speech segments. Second, Gaussian components of the highly-accented speech models are generated from the corresponding Gaussian components of the native speech models using a linear transformation function. A decision tree is constructed to categorize the transformation functions and used for transformation function retrieval to deal with the data sparseness problem. Third, a discrimination function is further applied to verify the generated accented acoustic models. Finally, the successfully verified accented English models are integrated into the native bilingual phone model set for Mandarin-English bilingual speech recognition. Experimental results show that the proposed approach can effectively alleviate recognition performance degradation due to accents and can obtain absolute improvements of 4.1%, 1.8%, and 2.7% in word accuracy for bilingual speech recognition compared to that using traditional ASR approaches, MAP-adapted, and MLLR-adapted ASR methods, respectively.
Archive | 2009
Huan-Chung Li; Chung-Hsien Wu; Han-Ping Shen; Chun-Kai Wang; Chia-Hsin Hsieh
international conference on acoustics, speech, and signal processing | 2014
Shun Kasahara; S. Kitahara; Nobuaki Minematsu; Han-Ping Shen; Takehiko Makino; Daisuke Saito; K. Hiorse
symposium on languages, applications and technologies | 2013
Han-Ping Shen; Nobuaki Minematsu; Takehiko Makino; Steven H. Weinberger; Teeraphon Pongkittiphan; Chung-Hsien Wu
ieee automatic speech recognition and understanding workshop | 2013
Han-Ping Shen; Nobuaki Minematsu; Takehiko Makino; Steven H. Weinberger; Teeraphon Pongkittiphan; Chung-Hsien Wu
international conference on information science and technology | 2014
Shun Kasahara; Nobuaki Minematsu; Han-Ping Shen; Daisuke Saito; Keikichi Hirose