Yu-Hsien Chiu
National Cheng Kung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yu-Hsien Chiu.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007
Yu-Hsien Chiu; Chung-Hsien Wu; Hung-Yu Su; Chih-Jen Cheng
This work proposes a novel approach to translate Chinese to Taiwanese sign language and to synthesize sign videos. An aligned bilingual corpus of Chinese and Taiwanese sign language (TSL) with linguistic and signing information is also presented for sign language translation. A two-pass alignment in syntax level and phrase level is developed to obtain the optimal alignment between Chinese sentences and Taiwanese sign sequences. For sign video synthesis, a scoring function is presented to develop motion transition-balanced sign videos with rich combinations of intersign transitions. Finally, the maximum a posteriori (MAP) algorithm is employed for sign video synthesis based on joint optimization of two-pass word alignment and intersign epenthesis generation. Several experiments are conducted in an educational environment to evaluate the performance on the comprehension of sign expression. The proposed approach outperforms the IBM Model2 in sign language translation. Moreover, deaf students perceived sign videos generated by the proposed method to be satisfactory
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Chung-Hsien Wu; Yu-Hsien Chiu; Chi-Jiun Shia; Chun-Yu Lin
This paper proposes an approach to segmenting and identifying mixed-language speech. A delta Bayesian information criterion (delta-BIC) is firstly applied to segment the input speech utterance into a sequence of language-dependent segments using acoustic features. A VQ-based bi-gram model is used to characterize the acoustic-phonetic dynamics of two consecutive codewords in a language. Accordingly the language-specific acoustic-phonetic property of sequence of phones was integrated in the identification process. A Gaussian mixture model (GMM) is used to model codeword occurrence vectors orthonormally transformed using latent semantic analysis (LSA) for each language-dependent segment. A filtering method is used to smooth the hypothesized language sequence and thus eliminate noise-like components of the detected language sequence generated by the maximum likelihood estimation. Finally, a dynamic programming method is used to determine globally the language boundaries. Experimental results show that for Mandarin, English, and Taiwanese, a recall rate of 0.87 for language boundary segmentation was obtained. Based on this recall rate, the proposed approach achieved language identification accuracies of 92.1% and 74.9% for single-language and mixed-language speech, respectively.
Speech Communication | 2002
Yeou-Jiunn Chen; Chung-Hsien Wu; Yu-Hsien Chiu; Hsiang-Chuan Liao
A phonetic representation of a language is used to describe the corresponding pronunciation and synthesize the acoustic model of any vocabulary. A phonetic representation with smaller phonetic units such as SAMPA-C for Mandarin Chinese and decision trees for parameter sharing are broadly applied to deal with the problem of large numbers of recognition units. However, the confusable phonetic representation in SAMPA-C generally degrades the recognition performance. In this paper, a statistical method based on chi-square testing is used to investigate the phonetic unit characteristics that are confusing and develop a more reliable phonetic set, named modified SAMPA-C. A corresponding question set for the modified SAMPA-C and a two-level splitting criterion are also proposed to effectively and efficiently construct the decision trees. Experiments using continuous Mandarin telephone speech recognition were conducted. Experimental results show that an encouraging improvement in recognition performance can be obtained. The proposed approaches represent a good compromise between the demands of accurate acoustic modeling and the limitations imposed by insufficient training data.
international conference on acoustics, speech, and signal processing | 2004
Chi-Jiun Shia; Yu-Hsien Chiu; Jia-Hsin Hsieh; Chung-Hsien Wu
The paper proposes a maximum a posteriori (MAP) based approach to segment and identify jointly an utterance with mixed languages. A statistical framework for language boundary detection and language identification is proposed. First, the MAP estimation is used to determine the boundary number and positions. Further, an LSA-based GMM and a VQ-based bigram language model are proposed to characterize a language and used for language identification. Finally, a likelihood ratio test approach is used to determine the optimal number of language boundaries. Experimental results show that the proposed approach exhibits encouraging potential in mixed-language segmentation and identification.
ACM Transactions on Asian Language Information Processing | 2007
Chung-Hsien Wu; Hung-Yu Su; Yu-Hsien Chiu; Chia-Hung Lin
This article presents a transfer-based statistical model for Chinese to Taiwanese sign-language (TSL) translation. Two sets of probabilistic context-free grammars (PCFGs) are derived from a Chinese Treebank and a bilingual parallel corpus. In this approach, a three-stage translation model is proposed. First, the input Chinese sentence is parsed into possible phrase structure trees (PSTs) based on the Chinese PCFGs. Second, the Chinese PSTs are then transferred into TSL PSTs according to the transfer probabilities between the context-free grammar (CFG) rules of Chinese and TSL derived from the bilingual parallel corpus. Finally, the TSL PSTs are used to generate the possible translation results. The Viterbi algorithm is adopted to obtain the best translation result via the three-stage translation. For evaluation, three objective evaluation metrics including AER, Top-N, and BLUE and one subjective evaluation metric using MOS were used. Experimental results show that the proposed approach outperforms the IBM Model 3 in the task of Chinese to sign-language translation.
IEEE Transactions on Neural Systems and Rehabilitation Engineering | 2004
Chung-Hsien Wu; Yu-Hsien Chiu; Chi-Shiang Guo
This paper proposes a novel approach to the generation of Chinese sentences from ill-formed Taiwanese Sign Language (TSL) for people with hearing impairments. First, a sign icon-based virtual keyboard is constructed to provide a visualized interface to retrieve sign icons from a sign database. A proposed language model (LM), based on a predictive sentence template (PST) tree, integrates a statistical variable n-gram LM and linguistic constraints to deal with the translation problem from ill-formed sign sequences to grammatical written sentences. The PST tree trained by a corpus collected from the deaf schools was used to model the correspondence between signed and written Chinese. In addition, a set of phrase formation rules, based on trigger pair category, was derived for sentence pattern expansion. These approaches improved the efficiency of text generation and the accuracy of word prediction and, therefore, improved the input rate. For the assessment of practical communication aids, a reading-comprehension training program with ten profoundly deaf students was undertaken in a deaf school in Tainan, Taiwan. Evaluation results show that the literacy aptitude test and subjective satisfactory level are significantly improved.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004
Chung-Hsien Wu; Yu-Hsien Chiu; Kung-Wei Cheng
This paper proposes an efficient error-tolerant approach to retrieving sign words from a Taiwanese Sign Language (TSL) database. This database is tagged with visual gesture features and organized as a multilist code tree. These features are defined in terms of the visual characteristics of sign gestures by which they are indexed for sign retrieval and displayed using an anthropomorphic interface. The maximum a posteriori estimation is exploited to retrieve the most likely sign word given the input feature sequence. An error-tolerant mechanism based on mutual information criterion is proposed to retrieve a sign word of interest efficiently and robustly. A user-friendly anthropomorphic interface is also developed to assist learning TSL. Several experiments were performed in an educational environment to investigate the systems retrieval accuracy. Our proposed approach outperformed a dynamic programming algorithm in its task and shows tolerance to user input errors.
pacific rim conference on multimedia | 2001
Chung-Hsien Wu; Yu-Hsien Chiu; Kung-Wei Cheng
This paper addresses a multi-modal sign icon retrieval and prediction technology for generating sentences from ill-formed Taiwanese sign language (TSL) for people with speech or hearing impairments. The design and development of this PC-based TSL augmented and alternative communication (AAC) system aims to improve the input rate and accuracy of communication aids. This study focuses on 1) developing a effective TSL icon retrieval method, 2) investigating TSL prediction strategies for input rate enhancement, 3) using a predictive sentence template (PST) tree for sentence generation. The proposed system assists people with language disabilities in sentence formation. To evaluate the performance of our approach, a pilot study for clinical evaluation and education training was undertaken. The evaluation results show that the retrieval rate and subjective satistactory level for sentence generation was significantly improved.
international conference on acoustics, speech, and signal processing | 2002
Chung-Hsien Wu; Yu-Hsien Chiu; Huigan Lim
This paper proposes a perceptual modeling approach with a two-stage recognition to deal with the issues of recognition degradation in noisy environment. The auditory masking effect is used for speech enhancement and acoustic modeling in order to overcome the model inconsistencies between training speech and noisy input. In the two-stage recognition, the maximum a posteriori (MAP) based adaptation algorithm is used to incrementally adapt the noise model. In order to evaluate our proposed approach, a Mandarin keyword spotting system was constructed. The experimental results show our proposed method achieves a better recognition rate compared to the audible noise suppression (ANS) and parallel model combination (PMC) methods for both in 70km/hr (10.3dB) and 90km/hr (6.4dB) car environments.
conference of the international speech communication association | 2002
Chung-Hsien Wu; Yu-Hsien Chiu; Kung-Wei Cheng