Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chi-Chun Hsia is active.

Publication


Featured researches published by Chi-Chun Hsia.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis

Chung-Hsien Wu; Chi-Chun Hsia; Te-Hsien Liu; Jhing-Fa Wang

This paper presents an expressive voice conversion model (DeBi-HMM) as the post processing of a text-to-speech (TTS) system for expressive speech synthesis. DeBi-HMM is named for its duration-embedded characteristic of the two HMMs for modeling the source and target speech signals, respectively. Joint estimation of source and target HMMs is exploited for spectrum conversion from neutral to expressive speech. Gamma distribution is embedded as the duration model for each state in source and target HMMs. The expressive style-dependent decision trees achieve prosodic conversion. The STRAIGHT algorithm is adopted for the analysis and synthesis process. A set of small-sized speech databases for each expressive style is designed and collected to train the DeBi-HMM voice conversion models. Several experiments with statistical hypothesis testing are conducted to evaluate the quality of synthetic speech as perceived by human subjects. Compared with previous voice conversion methods, the proposed method exhibits encouraging potential in expressive speech synthesis


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

Chung-Hsien Wu; Chi-Chun Hsia; Chung-Han Lee; Mai-Chun Lin

This paper presents an approach to hierarchical prosody conversion for emotional speech synthesis. The pitch contour of the source speech is decomposed into a hierarchical prosodic structure consisting of sentence, prosodic word, and subsyllable levels. The pitch contour in the higher level is encoded by the discrete Legendre polynomial coefficients. The residual, the difference between the source pitch contour and the pitch contour decoded from the discrete Legendre polynomial coefficients, is then used for pitch modeling at the lower level. For prosody conversion, Gaussian mixture models (GMMs) are used for sentence- and prosodic word-level conversion. At subsyllable level, the pitch feature vectors are clustered via a proposed regression-based clustering method to generate the prosody conversion functions for selection. Linguistic and symbolic prosody features of the source speech are adopted to select the most suitable function using the classification and regression tree for prosody conversion. Three small-sized emotional parallel speech databases with happy, angry, and sad emotions, respectively, were designed and collected for training and evaluation. Objective and subjective evaluations were conducted and the comparison results to the GMM-based method for prosody conversion achieved an improved performance using the hierarchical prosodic structure and the proposed regression-based clustering method.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis

Chi-Chun Hsia; Chung-Hsien Wu; Jung-Yun Wu

This paper proposes a method for modeling and generating pitch in hidden Markov model (HMM)-based Mandarin speech synthesis by exploiting prosody hierarchy and dynamic pitch features. The prosodic structure of a sentence is represented by a prosody hierarchy, which is constructed from the predicted prosodic breaks using a supervised classification and regression tree (S-CART). The S-CART is trained by maximizing the proportional reduction of entropy to minimize the errors in the prediction of the prosodic breaks. The pitch contour of a speech sentence is estimated using the STRAIGHT algorithm and decomposed into the prosodic features (static features) at prosodic word, syllable, and frame layers, based on the predicted prosodic structure. Dynamic features at each layer are estimated to preserve the temporal correlation between adjacent units. A hierarchical prosody model is constructed using an unsupervised CART (U-CART) for generating pitch contour. Minimum description length (MDL) is adopted in U-CART training. Objective and subjective evaluations with statistical hypothesis testing were conducted, and the results compared to corresponding results for HMM-based pitch modeling. The comparison confirms the improved performance of the proposed method.


IEEE Transactions on Computers | 2007

Conversion Function Clustering and Selection Using Linguistic and Spectral Information for Emotional Voice Conversion

Chi-Chun Hsia; Chung-Hsien Wu; Jian-Qi Wu

In emotional speech synthesis, a large speech database is required for high-quality speech output. Voice conversion needs only a compact-sized speech database for each emotion. This study designs and accumulates a set of phonetically balanced small- sized emotional parallel speech databases to construct conversion functions. The Gaussian mixture bigram model (GMBM) is adopted as the conversion function to characterize the temporal and spectral evolution of the speech signal. The conversion function is initially constructed for each instance of parallel subsyllable pairs in the collected speech database. To reduce the total number of conversion functions and select an appropriate conversion function, this study presents a framework by incorporating linguistic and spectral information for conversion function clustering and selection. Subjective and objective evaluations with statistical hypothesis testing are conducted to evaluate the quality of the converted speech. The proposed method compares favorably with previous methods in conversion-based emotional speech synthesis.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Variable-Length Unit Selection in TTS Using Structural Syntactic Cost

Chung-Hsien Wu; Chi-Chun Hsia; Jiun-Fu Chen; Jhing-Fa Wang

This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone


international conference on acoustics, speech, and signal processing | 2007

Conversion Function Clustering and Selection for Expressive Voice Conversion

Chi-Chun Hsia; Chung-Hsien Wu; Jian-Qi Wu

In this study, a conversion function clustering and selection approach to conversion-based expressive speech synthesis is proposed. First, a set of small-sized emotional parallel speech databases is designed and collected to train the conversion functions. Gaussian mixture bi-gram model (GMBM) is adopted as the conversion function to model the temporal and spectral evolution of speech. Conversion functions initially constructed from the parallel sub-syllable pairs in the speech database are clustered based on linguistic and spectral information. Subjective and objective evaluations with statistical hypothesis testing were conducted to evaluate the quality of the converted speech. The results show that the proposed method exhibits encouraging potential in conversion-based expressive speech synthesis.


international conference on orange technologies | 2013

SVM-based IADL score correlation and classification with EEG/ECG signals

Yang-Yen Ou; Chi-Chun Hsia; Jhing-Fa Wang; Ta-Wen Kuan; Cheng-Hsun Hsieh

This paper explores the correlation between the subjective IADL assessment and the objective EEG/ECG signals measurement. Thirty elderly participants are scored by IADL and classified into three groups, that is, the high score, the medium score and the low score groups, and each participants collected EEG/ECG signals is then attributed to the groups correspondingly. Six equations of extraction methods, including five for EEG and one for ECG, are applied to the EEG/ECG signals from each participant. Thereafter, the extracted features are trained by SVM and classified by one-against-all method in terms of group. The experiment is shown that 82% of accuracy can be reached by the proposed extracted methods and the proposed framework.


international conference on acoustics, speech, and signal processing | 2009

Regression-based clustering for hierarchical pitch conversion

Chung-Han Lee; Chi-Chun Hsia; Chung-Hsien Wu; Mai-Chun Lin

This study presents a hierarchical pitch conversion method using regression-based clustering for conversion function modeling. The pitch contour of a speech utterance is first extracted and decomposed into sentence-, word and sub-syllable-level features in a top-down mechanism. The pair-wise source and target pitch feature vectors at each level are then clustered to generate the pitch conversion function. Regression-based clustering, which clusters the feature vectors to achieve a minimum conversion error between the predicted and the real feature vectors is proposed for conversion function generation. A classification and regression tree (CART), incorporating linguistic, phonetic and source prosodic features, is adopted to select the most suitable function for pitch conversion. Several objective and subjective evaluations were conducted and the comparison results to the GMM-based methods for pitch conversion confirm the performance of the proposed regression-based clustering approach.


international symposium on chinese spoken language processing | 2004

Variable-length unit selection using LSA-based syntactic structure cost

Chung-Hsien Wu; Chi-Chun Hsia; Jiun-Fu Chen; Te-Hsien Liu

The paper introduces a variable-length unit selection method for concatenative speech synthesis based on a syntactic structure based on latent semantic analysis (LSA). First, a probabilistic context free grammar (PCFG) based parser is used to construct the syntactic structure of the input text sentence. Second, the synthesizer selects the candidate units for each node of the syntactic structure. LSA is then adopted to estimate the syntactic cost between the target unit and the candidate units in the database. Finally, the concatenation of units with minimum cost is selected using a dynamic programming algorithm. Experimental results show that variable-length unit selection based on syntactic structure outperforms the synthesizer that does not consider syntactic structure. Also, the LSA-based syntactic cost provides a better estimation of substitution cost than that calculated only from acoustic features.


conference of the international speech communication association | 2005

Duration-embedded bi-HMM for expressive voice conversion.

Chi-Chun Hsia; Chung-Hsien Wu; Te-Hsien Liu

Collaboration


Dive into the Chi-Chun Hsia's collaboration.

Top Co-Authors

Avatar

Chung-Hsien Wu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Jhing-Fa Wang

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Jiun-Fu Chen

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Te-Hsien Liu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Chung-Han Lee

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Jian-Qi Wu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Mai-Chun Lin

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Cheng-Hsun Hsieh

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Jung-Yun Wu

National Cheng Kung University

View shared research outputs
Top Co-Authors

Avatar

Ren-Ying Fang

National Cheng Kung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge