Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Han-Ping Shen is active.

Publication


Featured researches published by Han-Ping Shen.


2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011

CECOS: A Chinese-English code-switching speech database

Han-Ping Shen; Chung-Hsien Wu; Yan-Ting Yang; Chun-Shan Hsu

With the increase on the demands for code-switching automatic speech recognition (ASR), the design and development of a code-switching speech database becomes highly desirable. However, it is not easy to collect sufficient code-switched utterances for model training for code-switching ASR. This study presents the procedure and experience for the design and development of a Chinese-English COde-switching Speech database (CECOS). Two different methods for collecting Chinese-English code-switched utterances are employed in this work. The applications of the collected database are also introduced. The CECOS database not only contains the speech data with code-switch properties but also accents due to non-native speakers. This database can be applied to several applications, such as code-switching speech recognition, language identification, named entity detection, etc.


ACM Transactions on Asian Language Information Processing | 2011

Articulation-Disordered Speech Recognition Using Speaker-Adaptive Acoustic Models and Personalized Articulation Patterns

Chung-Hsien Wu; Hung-Yu Su; Han-Ping Shen

This article presents a novel approach to speaker-adaptive recognition of speech from articulation-disordered speakers without a large amount of adaptation data. An unsupervised, incremental adaptation method is adopted for personalized model adaptation based on the recognized syllables with high recognition confidence from an automatic speech recognition (ASR) system. For articulation pattern discovery, the manually transcribed syllables and the corresponding recognized syllables are associated with each other using articulatory features. The Apriori algorithm is applied to discover the articulation patterns in the corpus, which are then used to construct a personalized pronunciation dictionary to improve the recognition accuracy of the ASR. The experimental results indicate that the proposed adaptation method achieves a syllable error rate reduction of 6.1%, outperforming the conventional adaptation methods that have a syllable error rate reduction of 3.8%. In addition, an average syllable error rate reduction of 5.04% is obtained for the ASR using the expanded pronunciation dictionary.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Speaker Clustering Using Decision Tree-Based Phone Cluster Models With Multi-Space Probability Distributions

Han-Ping Shen; Jui-Feng Yeh; Chung-Hsien Wu

This paper presents an approach to speaker clustering using decision tree-based phone cluster models (DT-PCMs). In this approach, phone clustering is first applied to construct the universal phone cluster models to accommodate acoustic characteristics from different speakers. Since pitch feature is highly speaker-related and beneficial for speaker identification, the decision trees based on multi-space probability distributions (MSDs), useful to model both pitch and cepstral features for voiced and unvoiced speech simultaneously, are constructed. In speaker clustering based on DT-PCMs, contextual, phonetic, and prosodic features of each input speech segment is used to select the speaker-related MSDs from the MSD decision trees to construct the initial phone cluster models. The maximum-likelihood linear regression (MLLR) method is then employed to adapt the initial models to the speaker-adapted phone cluster models according to the input speech segment. Finally, the agglomerative clustering algorithm is applied on all speaker-adapted phone cluster models, each representing one input speech segment, for speaker clustering. In addition, an efficient estimation method for phone model merging is proposed for model parameter combination. Experimental results show that the MSD-based DT-PCMs outperform the conventional GMM- and HMM-based approaches for speaker clustering on the RT09 tasks.


international conference on acoustics, speech, and signal processing | 2012

Phone set construction based on context-sensitive articulatory attributes for code-switching speech recognition

Chung-Hsien Wu; Han-Ping Shen; Yan-Ting Yang

Bilingual speakers are known for their ability to code-switch or mix their languages during communication. This phenomenon occurs when bilinguals substitute a word or phrase from one language with a phrase or word from another language. For code-switching speech recognition, it is essential to collect a large-scale code-switching speech database for model training. In order to ease the negative effect caused by the data sparseness problem in training code-switching speech recognizers, this study proposes a data-driven approach to phone set construction by integrating acoustic features and cross-lingual context-sensitive articulatory features into distance measure between phone units. KL-divergence and a hierarchical phone unit clustering algorithm are used in this study to cluster similar phone units to reduce the need of the training data for model construction. The experimental results show that the proposed method outperforms other traditional phone set construction methods.


acm transactions on asian and low resource language information processing | 2015

Model Generation of Accented Speech using Model Transformation and Verification for Bilingual Speech Recognition

Han-Ping Shen; Chung-Hsien Wu; Pei-shan Tsai

Nowadays, bilingual or multilingual speech recognition is confronted with the accent-related problem caused by non-native speech in a variety of real-world applications. Accent modeling of non-native speech is definitely challenging, because the acoustic properties in highly-accented speech pronounced by non-native speakers are quite divergent. The aim of this study is to generate highly Mandarin-accented English models for speakers whose mother tongue is Mandarin. First, a two-stage, state-based verification method is proposed to extract the state-level, highly-accented speech segments automatically. Acoustic features and articulatory features are successively used for robust verification of the extracted speech segments. Second, Gaussian components of the highly-accented speech models are generated from the corresponding Gaussian components of the native speech models using a linear transformation function. A decision tree is constructed to categorize the transformation functions and used for transformation function retrieval to deal with the data sparseness problem. Third, a discrimination function is further applied to verify the generated accented acoustic models. Finally, the successfully verified accented English models are integrated into the native bilingual phone model set for Mandarin-English bilingual speech recognition. Experimental results show that the proposed approach can effectively alleviate recognition performance degradation due to accents and can obtain absolute improvements of 4.1%, 1.8%, and 2.7% in word accuracy for bilingual speech recognition compared to that using traditional ASR approaches, MAP-adapted, and MLLR-adapted ASR methods, respectively.


Archive | 2009

Phonetic Variation Model Building Apparatus and Method and Phonetic Recognition System and Method Thereof

Huan-Chung Li; Chung-Hsien Wu; Han-Ping Shen; Chun-Kai Wang; Chia-Hsin Hsieh


international conference on acoustics, speech, and signal processing | 2014

Improved and robust prediction of pronunciation distance for individual-basis clustering of World Englishes pronunciation

Shun Kasahara; S. Kitahara; Nobuaki Minematsu; Han-Ping Shen; Takehiko Makino; Daisuke Saito; K. Hiorse


symposium on languages, applications and technologies | 2013

Speaker-based Accented English Clustering Using a World English Archive

Han-Ping Shen; Nobuaki Minematsu; Takehiko Makino; Steven H. Weinberger; Teeraphon Pongkittiphan; Chung-Hsien Wu


ieee automatic speech recognition and understanding workshop | 2013

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Han-Ping Shen; Nobuaki Minematsu; Takehiko Makino; Steven H. Weinberger; Teeraphon Pongkittiphan; Chung-Hsien Wu


international conference on information science and technology | 2014

Structure-based prediction of English pronunciation distances and its analytical investigation

Shun Kasahara; Nobuaki Minematsu; Han-Ping Shen; Daisuke Saito; Keikichi Hirose

Collaboration


Dive into the Han-Ping Shen's collaboration.

Top Co-Authors

Avatar

Chung-Hsien Wu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chia-Hsin Hsieh

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Pei-shan Tsai

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Yan-Ting Yang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge