Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiongwei Zhang is active.

Publication


Featured researches published by Xiongwei Zhang.


international conference on wireless communications and signal processing | 2011

An improved spectral subtraction speech enhancement algorithm under non-stationary noise

Luying Sui; Xiongwei Zhang; Jianjun Huang; Bin Zhou

The main limitation of conventional spectral subtraction algorithm is that it is based on stationary noise assumption. However, the majority of the common noise types encountered in real world are non-stationary. Moreover, the method requires a voice activity detector that might not work well under very low signal-to-noise ratio conditions. In this paper, we proposed an improved spectral subtraction algorithm for speech enhancement in non-stationary noise conditions. The proposed algorithm contains two steps. Firstly, the priori information about the spectrum of speech and noise is modeled using autoregressive model and the speech and noise codebooks are constructed. Secondly, the speech and noise are estimated in each time frame by solving a log-spectral distortion minimization problem. Consequently, the proposed algorithm can adapt to varying levels of noise even while speech is present. On the other hand, autoregressive modeling results in smooth frequency spectrums and thus reduces musical noise. Experimental results show that the proposed algorithm outperforms the conventional spectral subtraction algorithm and multiband spectral subtraction algorithm.


international conference on wireless communications and signal processing | 2010

Research on speaker feature dimension reduction based on CCA and PCA

Yuhuan Zhou; Xiongwei Zhang; Jinming Wang; Yong Gong

A method to reduce feature dimension based on CCA and PCA is proposed. First, using the CCA to fuse the LPC features based on channel model and the MFCC feature based on auditory model to improve the relevance of the two different features; second, utilizing the PCA to further remove redundant features, and reduce the dimension of effective features. To verify the validity of this method, experimental model is based on GMM speaker recognition system, and 16-dimensional LPC and 13-dimensional MFCC are selected as speaker features. Compared with the traditional dimension reduction method, such as CCA, PCA and manual methods, experiments show that CCA+PCA method can further enhance the effect of dimension reduction.


international conference on wireless communications and signal processing | 2010

An improved wavelet-based speech enhancement method using adaptive block thresholding

Bin Zhou; Xiongwei Zhang; Xia Zou

A double-tuned matching network is proposed, which replaces one of the resonant circuits in a Wheeler matching network with a λg/2 transmission line in order to achieve broadband and multiband matching performance. The multiband Wheeler matching network is then applied to a circular monopole antenna, and it is demonstrated that a compact, multiband and highly efficient antenna can be achieved. The antenna operates in the frequency ranges of 1.90-4.20 GHz and 6.05-6.53 GHz, has a total size of 20 × 50.3 × 1.59 mm and exhibits efficiencies in the order of 90%.An improved wavelet-based speech enhancement method using adaptive block thresholding is proposed. Adaptive block thresholding is introduced by G. Yu [1] to eliminate the musical noise artifacts, but the performance for unvoiced speech is not satisfactory. To solve this problem, a voiced/unvoiced decision is applied and a small block size is set for unvoiced speech. Furthermore, the thresholding is adjusted considering both the intrascale and interscale dependencies of wavelet coefficients. The experimental results show that the improved method can obtain better denoising performance than the method proposed by G. Yu, especially for unvoiced speech.


international conference on computer modeling and simulation | 2010

Research on Speaker Recognition Based on Multifractal Spectrum Feature

Yuhuan Zhou; Jinming Wang; Xiongwei Zhang

In this paper, a new nonlinear feature extraction method based on the WTMM (wavelet transform modulus-maxima method) is proposed, which can greatly facilitate the extraction of the multifractal spectrum feature (MSF) from speech signals. The MSF combined with traditional linear features can obviously improve the performance of speaker recognition system. Experiment results show that 6-dimensional MSF combined with LPC make recognition accuracy increase 6.4 percentage points, and 6-dimensional MSF combined with MFCC, LPC make recognition accuracy increase 1.6 percentage points and reach 98.8% in short speech (2 seconds) speaker recognition.


international conference on systems | 2012

Speech enhancement based on sparse nonnegative matrix factorization with priors

Luying Sui; Xiongwei Zhang; Jianjun Huang; Gaihua Zhao; Yan Yang

Speech enhancement with sparse nonnegative matrix factorization and priors of noise is proposed to enhance speech contaminated by non-stationary noise. The proposed algorithm contains two steps. Firstly, the priori information about the spectrum of noise is modeled using sparse nonnegative matrix factorization algorithm and the dictionary of noise is constructed. Then, we estimate the statistical information of noise as the priors. Secondly, the spectrum of noisy speech is analyzed using sparse nonnegative matrix factorization algorithm. Then, we combine the noise dictionary, priors and iterative formulation to evaluate the dictionary and coding matrix of speech, and reconstruct the enhanced speech. Experimental results show that the proposed speech enhancement produces better speech quality than multi-band spectral subtraction, non-negative matrix factorization and non-negative sparse coding.


international conference on communication technology | 2010

Voice Conversion based on GMM and Artificial Neural Network

Danwen Peng; Xiongwei Zhang; Jian Sun

Voice Conversion (VC) technique allows to transform the voice of the source speaker so that it is perceived as uttered by the target speaker. In this paper, a novel VC method combining Gaussian Mixture Model (GMM) and Artificial Neural Network is proposed. To overcome the over-smoothing problem of GMM-based mapping method, we propose to convert the basic spectral envelope by GMM method and the residual envelope by ANN method. Compared with the traditional GMM based method, the proposed method can effectively improve the quality and naturalness of the converted speech. Experimental results using both objective tests and listening tests show the superiority of the new method.


international conference on wireless communications and signal processing | 2009

A 300bps speech coding algorithm based on multi-mode matrix quantization

Xia Zou; Xiongwei Zhang; Yafei Zhang

A 300bps speech coder based on multi-frame structure and multi-mode matrix quantization is presented. The multi-frame structure consisting of six frames is adopted to reduce the algorithm delay. The parameter matrices are classified into different modes based on the voicing vector information of superframe. To improve speech quality, a dynamic bit allocation scheme is developed. Experimental results show that the speech quality of the proposed vocoder is intelligible with good naturalness.


international conference on wireless communications and signal processing | 2016

Experimental study on noise pre-processing for a low bit rate speech coder

Wenhua Shi; Xiongwei Zhang; Xia Zou; Xiaodong Song

This paper focuses on the quality of speech coding parameters extraction under noisy and clean conditions. The influence of speech enhancement on the quality of extracted parameters for a low bit rate speech coder is addressed. MELP vocoder is used to estimate three parameters: the fundamental frequency, voicing and linear prediction coefficients. De-noising methods in MELPe vocoder and SMV are adopted as preprocessor under different noise environment separately. Pitch accuracy rate, voicing decision error rate and average spectral distortion are employed to quantitatively evaluate the quality and intelligibility improvements for the degraded speech with and without noise pre-processing system. The experimental results show that noise pre-processing can provide improvement in parameter estimation especially in low SNR. MELPe speech enhancement algorithm has better parameter extraction performance than SMV. The research will be helpful in designing specific noise pre-processing algorithm for low bit rate parametric coding.


international conference on wireless communications and signal processing | 2012

A new speech enhancement algorithm with generalized Gamma speech model

Gaihua Zhao; Bin Zhou; Xiongwei Zhang; Sui Lu-ying

In this paper, we present a new speech enhancement algorithm based on generalized Gamma speech model, which is more flexible in capturing the statistical behavior of speech signals than the conventional Gaussian and super-Gaussian speech model. Under the assumption of a generalized Gamma distribution for the clean speech spectral amplitudes and additive Gaussian noise, we derive a minimum mean-square error (MMSE) estimator of the log-spectra amplitude for speech signals. Furthermore, the speech presence probability is consistent with the new model which is derived to modify the MMSE estimator. The experimental results show that the proposed algorithm yields improvements in segmental signal-to-noise ratio (SSNR), less residual noise and better perception in speech quality, compared to the conventional short-time spectral amplitude estimators, which are based on Gaussian and super-Gaussian speech model.


international congress on image and signal processing | 2009

A 450bps Speech Coding Algorithm Based on Multi-Mode Matrix Quantization

Xia Zou; Xiongwei Zhang

A 450bps speech coder based on multi-frame structure and multi-mode matrix quantization is presented. The multi- frame structure consisting of four frames is adopted to reduce the algorithm delay. The parameter matrices are classified into different modes based on the voicing vector information of superframe. To improve speech quality, a dynamic bit allocation scheme is developed. Experimental results show that the speech quality of the proposed vocoder is intelligible with good naturalness.

Collaboration


Dive into the Xiongwei Zhang's collaboration.

Top Co-Authors

Avatar

Jianjun Huang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xia Zou

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yafei Zhang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jian Sun

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jinming Wang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jibin Yang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xinjian Sun

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yuhuan Zhou

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Gaihua Zhao

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Li Zeng

University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge