Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wenhu Wu is active.

Publication


Featured researches published by Wenhu Wu.


international symposium on chinese spoken language processing | 2006

Pitch mean based frequency warping

Jian Liu; Thomas Fang Zheng; Wenhu Wu

In this paper, a novel pitch mean based frequency warping (PMFW) method is proposed to reduce the pitch variability in speech signals at the front-end of speech recognition. The warp factors used in this process are calculated based on the average pitch of a speech segment. Two functions to describe the relations between the frequency warping factor and the pitch mean are defined and compared. We use a simple method to perform frequency warping in the Mel-filter bank frequencies based on different warping factors. To solve the problem of mismatch in bandwidth between the original and the warped spectra, the Mel-filters selection strategy is proposed. At last, the PMFW mel-frequency cepstral coefficient (MFCC) is extracted based on the regular MFCC with several modifications. Experimental results show that the new PMFW MFCCs are more distinctive than the regular MFCCs.


Speech Communication | 2006

A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification

Zhenyu Xiong; Thomas Fang Zheng; Zhanjiang Song; Frank K. Soong; Wenhu Wu

Abstract We propose a tree-based kernel selection (TBKS) algorithm as a computationally efficient approach to the Gaussian mixture model–universal background model (GMM–UBM) based speaker identification. All Gaussian components in the universal background model are first clustered hierarchically into a tree and the corresponding acoustic space is mapped into structurally partitioned regions. When identifying a speaker, each test input feature vector is scored against a small subset of all Gaussian components. As a result of this TBKS process, computation complexity can be significantly reduced. We improve the efficiency of the proposed system further by applying a previously proposed observation reordering based pruning (ORBP) to screen out unlikely candidate speakers. The approach is evaluated on a speech database of 1031 speakers, in both clean and noisy conditions. The experimental results show that by integrating TBKS and ORBP together we can speed up the computation efficiency by a factor of 15.8 with only a very slight degradation of identification performance, i.e., an increase of 1% of relative error rate, compared with a baseline GMM–UBM system. The improved search efficiency is also robust to additive noise.


international conference on acoustics, speech, and signal processing | 2007

Session Variability Subspace Projection Based Model Compensation for Speaker Verification

Jing Deng; Thomas Fang Zheng; Wenhu Wu

In this paper, a session variability subspace projection (SVSP) based model compensation method for speaker verification is proposed. During the training phase the session variability is removed from speaker models by projection, while during the testing phase the session variability in a test utterance is used to compensate speaker models. Finally, the compensated speaker models and UBM are used to recognize the identity of the test utterance. Compared with the conventional GMM-UBM system, the relative equal error rate reduction of SVSP is 16.2% on the NIST 2006 single-side one conversation training, single-side one conversation test.


international symposium on chinese spoken language processing | 2006

State-dependent phoneme-based model merging for dialectal chinese speech recognition

Linquan Liu; Thomas Fang Zheng; Wenhu Wu

Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone that is identical with the central phoneme in the triphone(s). It can be seen that the proposed method has a good performance however it will introduce a Gaussian mixtures expansion problem. To deal with it, an acoustic model distance measure, named pseudo-divergence based distance measure, is proposed based on the difference measurement of Gaussian mixture models and then implemented to downsize the model size almost without causing any performance degradation for dialectal speech. With a small amount of only 40-minute Shanghai-dialectal Chinese speech, the proposed SDPBMM achieves a significant absolute syllable error rate (SER) reduction of 5.9% for dialectal Chinese and almost no performance degradation for standard Chinese. In combination with a certain existing adaptation method, another absolute SER reduction of 1.9% can be further achieved.


international conference on acoustics, speech, and signal processing | 2005

Combining selection tree with observation reordering pruning for efficient speaker identification using GMM-UBM

Zhenyu Xiong; Thomas Fang Zheng; Zhanjiang Song; Wenhu Wu

In this paper a new method of reducing the computational load for Gaussian mixture model universal background model (GMM-UBM) based speaker identification is proposed. In order to speed up the selection of N-best Gaussian mixtures in a UBM, a selection tree (ST) structure as well as relevant operations is proposed. Combined with the existing observation reordering pruning (ORP) method which was proposed for rapid pruning of unlikely speaker model candidates, the proposed method achieves a much larger computation reduction factor than any single individual method. Experimental results show that a GMM-UBM system used in a conjunction with ST and ORP can speed up the computation by a factor of about 16 with an error rate increase of only about 1% compared with a baseline GMM-UBM system.


international conference natural language processing | 2005

Language model adaptation based on the classification of a trigram's language style feature

Qi Liang; Thomas Fang Zheng; Mingxing Xu; Wenhu Wu

In this paper, an adaptation method of the language style of a language model is proposed based on the differences between spoken and written language. Several interpolation methods based on trigram counts are used for adaptation. An interpolation method considering Katz smoothing computes weights according to the confidence score of a trigram. An adaptation method based on the classification of a trigrams style feature computes weights dynamically according to the trigrams language style tendency, and several weight generation functions are proposed. Experiments for spoken language on the Chinese corpora show that these methods, especially the method considering both a trigrams confidence and style tendency, can achieve a reduction in the Chinese character error rate for pinyin-to-character conversion.


international symposium on chinese spoken language processing | 2006

UBM based speaker segmentation and clustering for 2-speaker detection

Jing Deng; Thomas Fang Zheng; Wenhu Wu

In this paper, a speaker segmentation method based on log-likelihood ratio score (LLRS) over universal background model (UBM) and a speaker clustering method based on difference of log-likelihood scores between two speaker models are proposed. During the segmentation process, the LLRS between two adjacent speech segments over UBM is used as a distance measure Cwhile during the clustering process Cthe difference of log-likelihood scores between two speaker models is used as a speaker classification criterion. A complete system for NIST 2002 2-speaker task is presented using the methods mentioned above. Experimental results on NIST 2002 Switchboard Cellular speaker segmentation corpus, 1-speaker evaluation corpus and 2- speaker evaluation corpus show the potentiality of the proposed algorithms.


international conference on machine learning and cybernetics | 2005

Using predictive differential power spectrum and subband mel-spectrum centroid for robust speaker recognition in stationary noises

Jing Deng; Thomas Fang Zheng; Zhanjiang Song; Jian Liu; Wenhu Wu

In state-of-the-art speaker recognition systems, mel-scaled frequency cepstral coefficients (MFCCs) are perhaps the most widely used front-ends. One of the major issues with the MFCCs is that they are very sensitive to additive noises. In this paper, two methods for robust speech front-ends are proposed. One is to use a predictive difference function to calculate the differential power spectrums (DPS) as precisely as possible in order to restore the power spectrum of its original clean speech. The spectrum in the traditional MFCC calculation is then replaced with this estimated spectrum and the extracted features based on this are referred to as predictive differential power spectrum (PDPS) based cepstral coefficients (PDPSCCs). The other is to incorporate subband power information with subband mel-spectrum centroid information after the outputs of traditional mel-filter banks. The extracted features based on this are referred to as subband mel-spectrum centroid (SMSC) based cepstral coefficients (SMSCCCs). PDPSCCs and SMSCCCs with cepstral mean subtraction (CMS) based, spectral subtraction (SS) based, and differential power spectrum (DPS) based cepstral coefficients are compared at different noise levels. Experimental results show that the PDPSCCs and SMSCCCs are more effective in enhancing the robustness of a speaker recognition system, where with the CMS method the average error rate can be reduced by 12.2% in comparison with DPS based cepstral coefficients.


conference of the international speech communication association | 1999

EASYTALK: A LARGE-VOCABULARY SPEAKER-INDEPENDENT CHINESE DICTATION MACHINE

Fang Zheng; Zhanjiang Song; Mingxing Xu; Jian Wu; Yinfei Huang; Wenhu Wu; Cheng Bi


conference of the international speech communication association | 2005

Real-time pitch tracking based on combined SMDSF.

Jian Liu; Thomas Fang Zheng; Jing Deng; Wenhu Wu

Collaboration


Dive into the Wenhu Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge