Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuan-Fu Liao is active.

Publication


Featured researches published by Yuan-Fu Liao.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Modeling of speaking rate influences on mandarin speech prosody and its application to speaking rate-controlled TTS

Sin-Horng Chen; Chiao-Hua Hsieh; Chen-Yu Chiang; Hsi-Chun Hsiao; Yih-Ru Wang; Yuan-Fu Liao; Hsiu-Min Yu

A new data-driven approach to building a speaking rate-dependent hierarchical prosodic model (SR-HPM), directly from a large prosody-unlabeled speech database containing utterances of various speaking rates, to describe the influences of speaking rate on Mandarin speech prosody is proposed. It is an extended version of the existing HPM model which contains 12 sub-models to describe various relationships of prosodic-acoustic features of speech signal, linguistic features of the associated text, and prosodic tags representing the prosodic structure of speech. Two main modifications are suggested. One is designing proper normalization functions from the statistics of the whole database to compensate the influences of speaking rate on all prosodic-acoustic features. Another is modifying the HPM training to let its parameters be speaking-rate dependent. Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate on Mandarin speech prosody very well. An application of the SR-HPM to design and implement a speaking rate-controlled Mandarin TTS system is demonstrated. The system can generate natural synthetic speech for any given speaking rate in a wide range of 3.4-6.8 syllables/sec. Two subjective tests, MOS and preference test, were conducted to compare the proposed system with the popular HTS system. The MOS scores of the proposed system were in the range of 3.58-3.83 for eight different speaking rates, while they were in 3.09-3.43 for HTS. Besides, the proposed system had higher preference scores (49.8%-79.6%) than those (9.8%-30.7%) of HTS. This confirmed the effectiveness of the speaking rate control method of the proposed TTS system.


international conference on acoustics, speech, and signal processing | 2005

Prosody modeling and eigen-prosody analysis for robust speaker recognition

Zi-He Chen; Yuan-Fu Liao; Yau-Tarng Juang

Unseen handset mismatch and limited training/test data are the major source of performance degradation for speaker identification in telecommunication environments. In this paper, a vector quantization (VQ)-based prosody modeling and an eigen-prosody analysis (EPA) is integrated to transform the close-set speaker identification problem into a full text document retrieval-similar task. The prosody modeling labels the prosodic feature contours of a speakers speech into sequences of prosody states. EPA then constructs a compact eigen-prosody space to represent the constellation of speakers. Furthermore, EPA is fused with a lower-level a priori knowledge interpolation (AKI) handset distortion compensator to complement each other. Experimental results on the HTIMIT database had shown that about 41.0% and 32.8% relative error rate reduction for seen and unseen handsets, respectively, was achieved compared with the maximum a priori-adapted Gaussian mixture model/cepstral mean subtraction (MAP-GMM/CMS) baseline.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Latent Prosody Analysis for Robust Speaker Identification

Yuan-Fu Liao; Zi-He Chen; Yau-Tarng Juang

Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.


international conference on acoustics, speech, and signal processing | 2010

Subband minimum classification error beamforming for speech recognition in reverberant environments

Yuan-Fu Liao; I-Yun Xu

In this paper, a subband minimum classification error beamforming (S-MCEBEAM), instead of the subband likelihood maximizing beamforming (S-LIMABEAM) proposed by Seltzer, is investigated to closely integrate microphone array and speech recognizer for robust speech recognition in reverberant environments. The main idea behind this is to apply minimum classification error (MCE) criterion to directly match the goal of automatic speech recognition (ASR) and to simultaneously adjust both beamformer parameters and recognizers acoustic models. Experimental results on a Mandarin reverberation corpus created from Mandarin spontaneous speech corpus (TCC300) and RWCPs sound scene database show S-MCEBEAM leads to better recognition results than S-LIMABEAM in reverberant environments.


international conference on acoustics, speech, and signal processing | 2007

Latent Prosody Model of Continuous Mandarin Speech

Chen-Yu Chiang; Xiao-Dong Wang; Yuan-Fu Liao; Yih-Ru Wang; Sin-Horng Chen; Keikichi Hirose

The major difficulty of prosody modeling and automatic tone recognition of continuous Mandarin speech is the complex interaction of tones and prosody/intonation on FO contours. In this study, we propose a latent prosody model (LPM) aiming to jointly model the affections of tone and prosody state on FO. The main purposes are twofold including (1) automatic prosody state labeling and (2) improving tone recognition accuracy. The basic idea is to introduce latent prosody state variables into an additive statistic model of FO which already considers the affecting factors of tone and speaker. Experiments on the Tree-Bank corpus showed that LPM not only gave meaningful prosody state labeling results but also improved the average tone recognition rate from 80.86% of a multi-layer perceptron (MLP) baseline to 82.55%.


international symposium on chinese spoken language processing | 2006

Distributed speech recognition of mandarin digits string

Yih-Ru Wang; Bo-Xuan Lu; Yuan-Fu Liao; Sin-Horng Chen

In this paper, the performance of the pitch detection algorithm in ETSI ES-202-212 XAFE standard is evaluated on a Mandarin digit string recognition task. Experimental results showed that the performance of the pitch detection algorithm degraded seriously when the SNR of speech signal was lower than 10dB. This makes the recognizer using pitch information perform inferior to the original recognizer without using pitch information in low SNR environments. A modification of the pitch detection algorithm is therefore proposed to improve the performance of pitch detection in low SNR environments. The recognition performance can be improved for most SNR levels by integrating the recognizers with and without using pitch information. Overall recognition rates of 82.1% and 86.8% were achieved for clean and multi-condition training cases.


2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) | 2016

A preliminary study on cross-language knowledge transfer for low-resource Taiwanese Mandarin ASR

Chien-Ting Lin; Yih-Ru Wang; Sin-Horng Chen; Yuan-Fu Liao

The deep neural networks (DNNs) are the state-of-the-art automatic speech recognition (ASR) technique nowadays. However, the key to success is that a large amount of speech data of the target language is required to well train DNNs. Unfortunately, there are only few small Taiwanese Mandarin Speech corpora available in Taiwan. Therefore, in this paper, two cross-language knowledge transfer approaches are evaluated for building a high performance Taiwanese Mandarin ASR including (1) a borrowed-hidden-layer and (2) a shared-hidden-layer method. Experimental results show that (1) the shared-hidden-layer method achieved the best performance and (2) the system is robustness to different speaker and phone.


international conference on acoustics, speech, and signal processing | 2013

Maximum intelligibility-based close-loop speech synthesis framework for noisy environments

Yuan-Fu Liao; Ming-Long Wu; Jia-Chi Lin

This paper proposes a maximum intelligibility (MI)-based close-loop speech synthesis framework to actively compensate for the distortion of background noises. In this framework, an extra environmental noise-sensing microphone and an automatic speech recognition (ASR) module are utilized to approximate a subjective intelligibility measure. The hidden Markov model-based speech synthesis system (HTS) is then online adjusted by using the MI-based model adaptation algorithm. Experimental results of two subjective listening tests in noisy environments show that the proposed approach obtains 64% of the votes in an A/B preference test and helps the participants reduce word dictation errors by relative 26% when compared to an HTS baseline.


international symposium on chinese spoken language processing | 2008

Reference Eigen-Environment and Speaker Weighting for Robust Speech Recognition

Yuan-Fu Liao; Hung-Hsiang Fang; Chih-Min Yang

In this paper a reference eigen-environment and speaker weighting (RESW) method is proposed for online HMM adaptation. RESW establishes multiple eigen-MLLR subspaces as the set of a priori knowledge according to certain affecting factors, such as noise type, SNR, male and female. It then projects an input test utterance simultaneously into the set of eigen-subspaces and optimally synthesizes out a set of suitable HMMs. The proposed RESW was evaluated on Aurora 2 multi- condition training task. Experimental results showed that average word error rate (WER) of 6.11% was achieved. RESW not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%) and the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) and Eigen-MLLR (6.14%) approaches.


Electronics Letters | 2004

Eigen-prosody analysis for robust speaker recognition under mismatch handset environment

Zi-He Chen; Yuan-Fu Liao; Yau-Tarng Juang

Collaboration


Dive into the Yuan-Fu Liao's collaboration.

Top Co-Authors

Avatar

Yih-Ru Wang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Yau-Tarng Juang

National Central University

View shared research outputs
Top Co-Authors

Avatar

Zi-He Chen

National Central University

View shared research outputs
Top Co-Authors

Avatar

Sin-Horng Chen

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chen-Yu Chiang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Jyh-Her Yang

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Chiao-Hua Hsieh

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar

Hsi-Chun Hsiao

National Chiao Tung University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ming-Chieh Liu

National Chiao Tung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge