Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ren-Hua Wang is active.

Publication


Featured researches published by Ren-Hua Wang.


international conference on acoustics, speech, and signal processing | 2006

Minimum Generation Error Training for HMM-Based Speech Synthesis

Yi-Jian Wu; Ren-Hua Wang

In HMM-based speech synthesis, there are two issues critical related to the MLE-based HMM training: the inconsistency between training and synthesis, and the lack of mutual constraints between static and dynamic features. In this paper, we propose minimum generation error (MGE) based HMM training method to solve these two issues. In this method, an appropriate generation error is defined, and the HMM parameters are optimized by using the generalized probabilistic descent (GPD) algorithm, with the aims to minimize the generation errors. From the experimental results, the generation errors were reduced after the MGE-based HMM training, and the quality of synthetic speech is improved


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis

Zhen-Hua Ling; Korin Richmond; Junichi Yamagishi; Ren-Hua Wang

This paper presents an investigation into ways of integrating articulatory features into hidden Markov model (HMM)-based parametric speech synthesis. In broad terms, this may be achieved by estimating the joint distribution of acoustic and articulatory features during training. This may in turn be used in conjunction with a maximum-likelihood criterion to produce acoustic synthesis parameters for generating speech. Within this broad approach, we explore several variations that are possible in the construction of an HMM-based synthesis system which allow articulatory features to influence acoustic modeling: model clustering, state synchrony and cross-stream feature dependency. Performance is evaluated using the RMS error of generated acoustic parameters as well as formal listening tests. Our results show that the accuracy of acoustic parameter prediction and the naturalness of synthesized speech can be improved when shared clustering and asynchronous-state model structures are adopted for combined acoustic and articulatory features. Most significantly, however, our experiments demonstrate that modeling the dependency between these two feature streams can make speech synthesis systems more flexible. The characteristics of synthetic speech can be easily controlled by modifying generated articulatory features as part of the process of producing acoustic synthesis parameters.


Speech Communication | 2009

A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models

Si Wei; Guoping Hu; Yu Hu; Ren-Hua Wang

This paper presents two new ideas for text dependent mispronunciation detection. Firstly, mispronunciation detection is formulated as a classification problem to integrate various predictive features. A Support Vector Machine (SVM) is used as the classifier and the log-likelihood ratios between all the acoustic models and the model corresponding to the given text are employed as features for the classifier. Secondly, Pronunciation Space Models (PSMs) are proposed to enhance the discriminative capability of the acoustic models for pronunciation variations. In PSMs, each phone is modeled with several parallel acoustic models to represent pronunciation variations of that phone at different proficiency levels, and an unsupervised method is proposed for the construction of the PSMs. Experiments on a database consisting of more than 500,000 Mandarin syllables collected from 1335 Chinese speakers show that the proposed methods can significantly outperform the traditional posterior probability based method. The overall recall rates for the 13 most frequently mispronounced phones increase from 17.2%, 7.6% and 0% to 58.3%, 44.3% and 29.5% at three precision levels of 60%, 70% and 80%, respectively. The improvement is also demonstrated by a subjective experiment with 30 subjects, in which 53.3% of the subjects think the proposed method is better than the traditional one and 23.3% of them think that the two methods are comparable.


international conference on acoustics, speech, and signal processing | 2007

HMM-Based Hierarchical Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion

Zhen-Hua Ling; Ren-Hua Wang

This paper presents a hidden Markov model (HMM) based unit selection method using hierarchical units under statistical criterion. In our previous work we tried to use frame sized speech segments and maximum likelihood criterion to improve the performance of traditional concatenative synthesis system using phone sized units and cost function criterion. In this paper, hierarchical units which consist of phone level units and frame level units are adopted to achieve better balance between the coverage rate of candidate unit and the number of concatenation points during synthesis. Besides, Kullback-Leibler divergence (KLD) between candidate and target phoneme HMMs is introduced as a part of the final criterion for unit selection. The listening result proves that these two approaches can improve the performance of synthetic speech effectively.


international symposium on chinese spoken language processing | 2004

A comparative study on various confidence measures in large vocabulary speech recognition

Gang Guo; Chao Huang; Hui Jiang; Ren-Hua Wang

In this paper, we have conducted a comparative study on several confidence measures (CM) for large vocabulary speech recognition. Firstly, we propose a novel high-level CM that is based on the inter-word mutual information (MI). Secondly, we experimentally investigate several popular low-level CM, such as word posterior probabilities, N-best counting, likelihood ratio testing (LRT), etc. Finally, we have studied a simple linear interpolation strategy to combine the best low-level CM with the best high-level CM. All of these CM are examined in two large vocabulary ASR tasks, namely the Switchboard task and a Mandarin dictation task, to verify the recognition errors in baseline recognition systems. Experimental results show: (1) the proposed MI-based CM greatly surpass another existing high-level CM which are based on the LSA technique; (2) among all low-level CM, word posteriori probabilities give the best verification performance; (3) when combining the word posteriori probabilities with the MI-based CM, the equal error rate is reduced from 24.4% to 23.9% in the Switchboard task and from 17.5% to 16.2% in the Mandarin dictation task.


international conference on acoustics, speech, and signal processing | 2008

Automatic mispronunciation detection for Mandarin

Feng Zhang; Chao Huang; Frank K. Soong; Min Chu; Ren-Hua Wang

This paper presents the methods to improve the performance of mispronunciation detection at syllable level for Mandarin from two aspects: proposing scaled log-posterior probability (SLPP) and weighted phone SLPP to get the better measure of pronunciation quality; introducing speaker normalization of speaker adaptive training (SAT) and speaker adaptation of selective maximum likelihood linear regression (SMLLR) to get a better statistical model. Experiments based on a database, consisting of 8000 syllables pronounced by 40 speakers with varied pronunciation proficiency, confirm the promising effectiveness of these strategies by reducing FAR from 41.1% to 31.4% at 90% FRR and 36.0% to 16.3%at 95%FRR.


international conference on acoustics, speech, and signal processing | 2008

Minimum generation error criterion considering global/local variance for HMM-based speech synthesis

Long Qin; Yi-Jian Wu; Zhen-Hua Ling; Ren-Hua Wang; Li-Rong Dai

Due to the inconsistency between the maximum likelihood (ML) based training and the synthesis application in HMM-based speech synthesis, a minimum generation error (MGE) criterion had been proposed for HMM training. This paper continues to apply the MGE criterion to model adaptation for HMM-based speech synthesis. We propose a MGE linear regression (MGELR) based model adaptation algorithm, where the regression matrices used to transform source models to target models are optimized to minimize the generation errors for the input speech data uttered by the target speaker. The proposed MGELR approach was compared with the maximum likelihood linear regression (MLLR) based model adaptation. Experimental results indicate that the generation errors were reduced after the MGELR-based model adaptation. And from the subjective listening test, the discrimination and the quality of the synthesized speech using MGELR were better than the results using MLLR.


international conference on acoustics, speech, and signal processing | 2004

A complexity reduction of ETSI advanced front-end for DSR

Jinyu Li; Bo Liu; Ren-Hua Wang; Li-Rong Dai

In October 2002, the advanced front-end (AFE) for distributed speech recognition (DSR) was standardized by ETSI. In order to use the AFE feature on low computational resource devices, we propose a novel approach to improve the computational efficiency. In our new algorithm, the structure of the two-stage mel-warped Wiener filtering algorithm, which is the main part of AFE, is modified. A Wiener filter is constructed and applied directly in the mel-warped filter-bank domain. The measures we take make many time-consuming operations in the original algorithm completely unnecessary, including the re-calculations of power spectrum and the time-domain convolution operations. Consequently, a large amount of computations are saved. Experiments show that the new approach can substantially reduce the computation load while preserving the excellent performance of the ETSI AFE.


international conference on acoustics, speech, and signal processing | 2005

Optimal clustering and non-uniform allocation of Gaussian kernels in scalar dimension for HMM compression [speech recognition applications]

Xiao-Bing Li; Frank K. Soong; Tor Andre Myrvoll; Ren-Hua Wang

We propose an algorithm for optimal clustering and nonuniform allocation of Gaussian kernels in scalar (feature) dimension to compress complex, Gaussian mixture-based, continuous density HMMs into computationally efficient, small footprint models. The symmetric Kullback-Leibler divergence (KLD) is used as the universal distortion measure and it is minimized in both kernel clustering and allocation procedures. The algorithm was tested on the resource management (RM) database. The original context-dependent HMMs can be compressed to any resolution, measured by the total number of clustered scalar kernel components. Good trade-offs between the recognition performance and model complexities have been obtained; the HMM can be compressed to 15-20% of the original model size, which needs 1-5% of multiplication/division operations, and results in almost negligible recognition performance degradation.


international symposium on chinese spoken language processing | 2004

A superposed prosodic model for Chinese text-to-speech synthesis

Gao Peng Chen; Gérard Bailly; Qingfeng Liu; Ren-Hua Wang

The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.

Collaboration


Dive into the Ren-Hua Wang's collaboration.

Top Co-Authors

Avatar

Yu Hu

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Zhen-Hua Ling

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Li-Rong Dai

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Guoping Hu

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Xiaoru Wu

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Si Wei

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge