Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chung-Hsien Wu is active.

Publication


Featured researches published by Chung-Hsien Wu.


Speech Communication | 2001

Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis

Chung-Hsien Wu; Jau-Hung Chen

In this paper, some approaches to the generation of synthesis units and prosodic information are proposed for Mandarin Chinese text-to-speech (TTS) conversion. The monosyllables are adopted as the basic synthesis units. A set of synthesis units is selected from a large continuous speech database based on two cost functions, which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a phrase. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results showed that the synthesized prosodic features matched quite well with their original counterparts. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis

Chung-Hsien Wu; Chi-Chun Hsia; Te-Hsien Liu; Jhing-Fa Wang

This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis


IEEE Transactions on Signal Processing | 1991

A hierarchical neural network model based on a C/V segmentation algorithm for isolated Mandarin speech recognition

Jhing-Fa Wang; Chung-Hsien Wu; Shih-Hung Chang; Jau-Yien Lee Lee

A novel algorithm simultaneously performing consonant/vowel (C/V) segmentation and pitch detection is proposed. Based on this algorithm, a consonant enhancement method and a hierarchical neural network scheme are explored for Mandarin speech recognition. As a result, an improvement of 12% in consonant recognition rate is obtained and the number of recognition candidates is reduced from 1300 to 63. A series of experiments over all Mandarin syllables (about 1300) is demonstrated in the speaker-dependent mode. Comparisons with the decoder timer waveform algorithm are evaluated to show that the performance is satisfactory. An overall recognition rate of 90.14% is obtained. >


IEEE Transactions on Multimedia | 2012

Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition

Jen-Chun Lin; Chung-Hsien Wu; Wen-Li Wei

This paper presents an approach to the automatic recognition of human emotions from audio-visual bimodal signals using an error weighted semi-coupled hidden Markov model (EWSC-HMM). The proposed approach combines an SC-HMM with a state-based bimodal alignment strategy and a Bayesian classifier weighting scheme to obtain the optimal emotion recognition result based on audio-visual bimodal fusion. The state-based bimodal alignment strategy in SC-HMM is proposed to align the temporal relation between audio and visual streams. The Bayesian classifier weighting scheme is then adopted to explore the contributions of the SC-HMM-based classifiers for different audio-visual feature pairs in order to obtain the emotion recognition output. For performance evaluation, two databases are considered: the MHMC posed database and the SEMAINE naturalistic database. Experimental results show that the proposed approach not only outperforms other fusion-based bimodal emotion recognition methods for posed expressions but also provides satisfactory results for naturalistic expressions.


international conference on multimedia and expo | 2004

Emotion recognition using acoustic features and textual content

Ze-Jing Chuang; Chung-Hsien Wu

The paper presents an approach to emotion recognition from speech signals and textual content. In the analysis of speech signals, thirty-three acoustic features are extracted from the speech input. After principle component analysis (PCA), 14 principle components are selected for discriminative representation. In this representation, each principle component is the combination of the 33 original acoustic features and forms a feature subspace. Support vector machines (SVMs) are adopted to classify the emotional states. In text analysis, all emotional keywords and emotion modification words are manually defined. The emotion intensity levels of emotional keywords and emotion modification words are estimated from a collected emotion corpus. The final emotional state is determined based on the emotion outputs from the acoustic and textual approaches. The experimental result shows that the emotion recognition accuracy of the integrated system is better than each of the two individual approaches.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

Joint Optimization of Word Alignment and Epenthesis Generation for Chinese to Taiwanese Sign Synthesis

Yu-Hsien Chiu; Chung-Hsien Wu; Hung-Yu Su; Chih-Jen Cheng

This work proposes a novel approach to translate Chinese to Taiwanese sign language and to synthesize sign videos. An aligned bilingual corpus of Chinese and Taiwanese sign language (TSL) with linguistic and signing information is also presented for sign language translation. A two-pass alignment in syntax level and phrase level is developed to obtain the optimal alignment between Chinese sentences and Taiwanese sign sequences. For sign video synthesis, a scoring function is presented to develop motion transition-balanced sign videos with rich combinations of intersign transitions. Finally, the maximum a posteriori (MAP) algorithm is employed for sign video synthesis based on joint optimization of two-pass word alignment and intersign epenthesis generation. Several experiments are conducted in an educational environment to evaluate the performance on the comprehension of sign expression. The proposed approach outperforms the IBM Model2 in sign language translation. Moreover, deaf students perceived sign videos generated by the proposed method to be satisfactory


ACM Transactions on Asian Language Information Processing | 2005

Domain-specific FAQ retrieval using independent aspects

Chung-Hsien Wu; Jui-Feng Yeh; Ming-Jun Chen

This investigation presents an approach to domain-specific FAQ (frequently-asked question) retrieval using independent aspects. The data analysis classifies the questions in the collected QA (question-answer) pairs into ten question types in accordance with question stems. The answers in the QA pairs are then paragraphed and clustered using latent semantic analysis and the K-means algorithm. For semantic representation of the aspects, a domain-specific ontology is constructed based on WordNet and HowNet. A probabilistic mixture model is then used to interpret the query and QA pairs based on independent aspects; hence the retrieval process can be viewed as the maximum likelihood estimation problem. The expectation-maximization (EM) algorithm is employed to estimate the optimal mixing weights in the probabilistic mixture model. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in medical FAQ retrieval.


ACM Transactions on Asian Language Information Processing | 2002

Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology

Yu-Sheng Lai; Chung-Hsien Wu

In this article, an approach based on unknown words is proposed for meaningful term extraction and discriminative term selection in text categorization. For meaningful term extraction, a phrase-like unit (PLU)-based likelihood ratio is proposed to estimate the likelihood that a word sequence is an unknown word. On the other hand, a discriminative measure is proposed for term selection and is combined with the PLU-based likelihood ratio to determine the text category. We conducted several experiments on a news corpus, called MSDN. The MSDN corpus is collected from an online news Website maintained by the Min-Sheng Daily News, Taiwan. The corpus contains 44,675 articles with over 35 million words. The experimental results show that the system using a simple classifier achieved 95.31% accuracy. When using a state-of-the-art classifier, kNN, the average accuracy is 96.40%, outperforming all the other systems evaluated on the same collection, including the traditional term-word by kNN (88.52%); sleeping-experts (82.22%); sparse phrase by four-word sleeping-experts (86.34%); and Boolean combinations of words by RIPPER (87.54%). A proposed purification process can effectively reduce the dimensionality of the feature space from 50,576 terms in the word-based approach to 19,865 terms in the unknown word-based approach. In addition, more than 80% of automatically extracted terms are meaningful. Experiments also show that the proportion of meaningful terms extracted from training data is relative to the classification accuracy in outside testing.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

Chung-Hsien Wu; Chi-Chun Hsia; Chung-Han Lee; Mai-Chun Lin

This paper presents an approach to hierarchical prosody conversion for emotional speech synthesis. The pitch contour of the source speech is decomposed into a hierarchical prosodic structure consisting of sentence, prosodic word, and subsyllable levels. The pitch contour in the higher level is encoded by the discrete Legendre polynomial coefficients. The residual, the difference between the source pitch contour and the pitch contour decoded from the discrete Legendre polynomial coefficients, is then used for pitch modeling at the lower level. For prosody conversion, Gaussian mixture models (GMMs) are used for sentence- and prosodic word-level conversion. At subsyllable level, the pitch feature vectors are clustered via a proposed regression-based clustering method to generate the prosody conversion functions for selection. Linguistic and symbolic prosody features of the source speech are adopted to select the most suitable function using the classification and regression tree for prosody conversion. Three small-sized emotional parallel speech databases with happy, angry, and sad emotions, respectively, were designed and collected for training and evaluation. Objective and subjective evaluations were conducted and the comparison results to the GMM-based method for prosody conversion achieved an improved performance using the hierarchical prosodic structure and the proposed regression-based clustering method.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Chi-Chun Hsia; Chung-Hsien Wu; Jung-Yun Wu

This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.

Collaboration


Dive into the Chung-Hsien Wu's collaboration.

Top Co-Authors

Avatar

Jhing-Fa Wang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chien-Lin Huang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Jui-Feng Yeh

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Ming-Hsiang Su

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Kun-Yi Huang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chia-Hsin Hsieh

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Yeou-Jiunn Chen

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Yu-Hsien Chiu

National Cheng Kung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge