Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xia Zou is active.

Publication


Featured researches published by Xia Zou.


international workshop on acoustic signal enhancement | 2016

Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement

Gang Min; Xiongwei Zhang; Xia Zou; Meng Sun

Mask estimate is regarded as the main goal for using the computational auditory scene analysis method to enhance speech contaminated by noises. This paper presents extended robust principal component analysis (RPCA) methods, referred to as NRPCA and ISNRPCA, to estimate mask effectively. The perceptually motivated cochleagram is decomposed into sparse and low-rank components via NRPCA or ISNRPCA, which correspond to speech and noises, respectively. Different from the classical RPCA, NRPCA imposes nonnegative constraints to regularize the decomposed components. Furthermore, ISNRPCA uses the perceptually meaningful Itakura-Saito measure as its optimization objective function. We use the alternating direction method of multipliers to solve the corresponding optimization problem. NRPCA and ISNRPCA are totally unsupervised, neither speech nor noise model needs to be trained beforehand. Experimental results demonstrate that NRPCA and ISNRPCA show promising results for speech enhancement. With respect to state of the art baselines, the proposed methods achieve better performance on noises suppression and demonstrate at least comparable intelligibility and overall-quality.


international conference on wireless communications and signal processing | 2009

Probabilistic path selection in wireless sensor networks with controlled mobility

Xiwei Zhang; Lili Zhang; Guihai Chen; Xia Zou

We consider the problem of planning path of a “data mule” in a cluster based sensor networks. Recent research shows that significant energy saving can be achieved by using this mobile data collection nodes. Although a data mule (DM) can reduce the energy consumption of the cluster head (CH), it increases the latency from the time the CH gain the data to the time the base station receives it. To address this issue, we propose a dynamic data mule path selection algorithm which called Probabilistic Path Selection (PPS). In surveillance sensor networks, sensor nodes transmit data to CH only when event is detected. Some clusters which did not detect the event would have no data to transmit for power consumption. In this circumstance, DM ignores these CHs to shorten the length of path. The PPS algorithm can reselect the path based on the probability of CH obtaining data. The simulation shows that our algorithm can significantly reduce the data latency and satisfy the users delay requirement simultaneously.


international conference on wireless communications and signal processing | 2009

A 300bps speech coding algorithm based on multi-mode matrix quantization

Xia Zou; Xiongwei Zhang; Yafei Zhang

A 300bps speech coder based on multi-frame structure and multi-mode matrix quantization is presented. The multi-frame structure consisting of six frames is adopted to reduce the algorithm delay. The parameter matrices are classified into different modes based on the voicing vector information of superframe. To improve speech quality, a dynamic bit allocation scheme is developed. Experimental results show that the speech quality of the proposed vocoder is intelligible with good naturalness.


annual acis international conference on computer and information science | 2017

Auditory mask estimation by RPCA for monaural speech enhancement

Wenhua Shi; Xiongwei Zhang; Xia Zou; Wei Han; Gang Min

Mask estimation has shown a IoT of promise in speech enhancement for its simplicity and large speech intelligibility improvement. In this paper, the gammachirp filter banks are applied on the contaminated speech signal to get the auditory time-frequency representation. Robust principal component analysis with non-negative constraint is employed to decompose the auditory time-frequency representation into sparse and low-rank components using alternating direction method of multipliers optimization algorithm. Auditory Mask is estimated by these two parts which are correspond to the speech and noise. Consider that binary mask produces separated sources with more distortion than soft mask estimation. Auditory mask estimation is based on the ideal ratio mask estimation. Experimental results show that the proposed method could achieve better performance in terms of PESQ and LSD compared with multiband spectral subtraction and Robust principal component analysis methods.


international conference on wireless communications and signal processing | 2016

Experimental study on noise pre-processing for a low bit rate speech coder

Wenhua Shi; Xiongwei Zhang; Xia Zou; Xiaodong Song

This paper focuses on the quality of speech coding parameters extraction under noisy and clean conditions. The influence of speech enhancement on the quality of extracted parameters for a low bit rate speech coder is addressed. MELP vocoder is used to estimate three parameters: the fundamental frequency, voicing and linear prediction coefficients. De-noising methods in MELPe vocoder and SMV are adopted as preprocessor under different noise environment separately. Pitch accuracy rate, voicing decision error rate and average spectral distortion are employed to quantitatively evaluate the quality and intelligibility improvements for the degraded speech with and without noise pre-processing system. The experimental results show that noise pre-processing can provide improvement in parameter estimation especially in low SNR. MELPe speech enhancement algorithm has better parameter extraction performance than SMV. The research will be helpful in designing specific noise pre-processing algorithm for low bit rate parametric coding.


international conference on multimedia and expo | 2016

A perceptually motivated approach via sparse and low-rank model for speech enhancement

Gang Min; Xiongwei Zhang; Jibin Yang; Wei Han; Xia Zou

A perceptually motivated speech enhancement approach is proposed in this paper. Different from the conventional sparse and low-rank model based approaches, this new approach takes into account the perceptual differences in different frequency bands of the human auditory system, and separates speech from background noises in the Mel spectral domain. After two propositions for the Mel frequency weighted spectrogram are proved, speech enhancement can be modeled as a sparse and low-rank constrained optimization problem, which is solved efficiently by the alternating direction method of multipliers (ADMM). The proposed approach is totally unsupervised, neither the speech nor the noise dictionary needs to be trained beforehand. The experimental results have shown its promising performance under strong background noises. The performance can be further improved by information fusion technique at high input SNRs.


Modern Physics Letters B | 2017

Deep neural network and noise classification-based speech enhancement

Wenhua Shi; Xiongwei Zhang; Xia Zou; Wei Han

In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.


IEEE Signal Processing Letters | 2016

Perceptually Weighted Analysis-by-Synthesis Vector Quantization for Low Bit Rate MFCC Codec

Gang Min; Xiongwei Zhang; Xia Zou; Jibin Yang

This letter presents a perceptually weighted analysis-by-synthesis vector quantization (VQ) algorithm for low bit rate MFCC codec. Different from conventional VQ of mel-frequency cepstral coefficients (MFCCs) vector, this algorithm uses an analysis-by-synthesis technique and aims to minimize the perceptually weighted spectral reconstruction distortion rather than the distortion of MFCCs vector itself. Also, to reduce the computational complexity, we propose a practical suboptimal codebook searching technique and embed it into the split and multistage VQ framework. Objective and subjective experimental results on Mandarin speech show that the proposed algorithm yields intelligible and natural sounding speech for speech coding at 600-2400 bit/s. Compared to current VQ in MFCC codec, the output speech quality is substantially improved in terms of frequency-weighted segmental SNR, short-time objective intelligibility score, perceptual evaluation of speech quality score, and mean opinion score.


multimedia signal processing | 2015

Speech reconstruction from mel-frequency cepstral coefficients via ℓ1-norm minimization

Gang Min; Xiongwei Zhang; Jibin Yang; Xia Zou

This paper presents a high quality speech reconstruction method from Mel-frequency cepstral coefficients (MFCC). Due to the sparse characteristic of the power spectrum of speech, the ℓ1-norm minimization method is used to tackle the under-determined nature of the speech reconstruction problem. The phase spectrum is recovered by the well-known LSE-ISTFTM algorithm. Experimental results demonstrate that the quality of the reconstructed speech is dramatically improved than the common ℓ2-norm minimization method, it sounds very close to the original speech when using the high-resolution MFCC, the PESQ score reaches 4.0.


international conference on wireless communications and signal processing | 2014

An improved Bayesian NMF-based speech enhancement method using multivariate Laplace distribution

Liwei Zhang; Xiongwei Zhang; Xia Zou; Gang Min

Bayesian NMF (BNMF) algorithm joints nonnegative matrix factorization (NMF) with a statistical framework, and performs well in speech enhancement. However, the dependencies of atoms in speech frame are not considered in the method. In order to exploit the dependencies of the speech and noise signals, we introduce multivariate Laplace distribution for the basis W and NMF coefficients matrix H. In this paper, we propose a novel speech enhancement method, which is based on an improved Bayesian NMF (IBNMF) algorithm using multivariate Laplace distribution. The experimental results show that the proposed algorithm yields improvements in Log-spectral distance (LSD) and Perceptual Evaluation of Speech Quality (PESQ), compared to the other two algorithms, which are based on NMF and BNMF methods.

Collaboration


Dive into the Xia Zou's collaboration.

Top Co-Authors

Avatar

Xiongwei Zhang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Gang Min

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Meng Sun

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Wenhua Shi

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yafei Zhang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jibin Yang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Wei Han

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jianjun Huang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yonggang Hu

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Gang Min

University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge