Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Meng Sun is active.

Publication


Featured researches published by Meng Sun.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback–Leibler divergence

Meng Sun; Yinan Li; Jort F. Gemmeke; Xiongwei Zhang

A key stage in speech enhancement is noise estimation which usually requires prior models for speech or noise or both. However, prior models can sometimes be difficult to obtain. In this paper, without any prior knowledge of speech and noise, sparse and low-rank nonnegative matrix factorization (NMF) with Kullback-Leibler divergence is proposed to noise and speech estimation by decomposing the input noisy magnitude spectrogram into a low-rank noise part and a sparse speech-like part. This initial unsupervised speech-noise estimation allows us to set a subsequent regularized version of NMF or convolutional NMF to reconstruct the noise and speech spectrogram, either by estimating a speech dictionary on the fly (categorized as unsupervised approaches) or by using a pre-trained speech dictionary on utterances with disjoint speakers (categorized as semi-supervised approaches). Information fusion was investigated by taking the geometric mean of the outputs from multiple enhancement algorithms. The performance of the algorithms were evaluated on five metrics (PESQ, SDR, SNR, STOI, and OVERALL) by making experiments on TIMIT with 15 noise types. The geometric means of the proposed unsupervised approaches outperformed spectral subtraction (SS), minimum mean square estimation (MMSE) under low input SNR conditions. All the proposed semi-supervised approaches showed superiority over SS and MMSE and also obtained better performance than the state-of-the-art algorithms which utilized a prior noise or speech dictionary under low SNR conditions.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

Unseen noise estimation using separable deep auto encoder for speech enhancement

Meng Sun; Xiongwei Zhang; Hugo Van hamme; Thomas Fang Zheng

Unseen noise estimation is a key yet challenging step to make a speech enhancement algorithm work in adverse environments. At worst, the only prior knowledge we know about the encountered noise is that it is different from the involved speech. Therefore, by subtracting the components which cannot be adequately represented by a well defined speech model, the noises can be estimated and removed. Given the good performance of deep learning in signal representation, a deep auto encoder (DAE) is employed in this work for accurately modeling the clean speech spectrum. In the subsequent stage of speech enhancement, an extra DAE is introduced to represent the residual part obtained by subtracting the estimated clean speech spectrum (by using the pre-trained DAE) from the noisy speech spectrum. By adjusting the estimated clean speech spectrum and the unknown parameters of the noise DAE, one can reach a stationary point to minimize the total reconstruction error of the noisy speech spectrum. The enhanced speech signal is thus obtained by transforming the estimated clean speech spectrum back into time domain. The above proposed technique is called separable deep auto encoder (SDAE). Given the under-determined nature of the above optimization problem, the clean speech reconstruction is confined in the convex hull spanned by a pre-trained speech dictionary. New learning algorithms are investigated to respect the non-negativity of the parameters in the SDAE. Experimental results on TIMIT with 20 noise types at various noise levels demonstrate the superiority of the proposed method over the conventional baselines.


international workshop on acoustic signal enhancement | 2016

Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement

Gang Min; Xiongwei Zhang; Xia Zou; Meng Sun

Mask estimate is regarded as the main goal for using the computational auditory scene analysis method to enhance speech contaminated by noises. This paper presents extended robust principal component analysis (RPCA) methods, referred to as NRPCA and ISNRPCA, to estimate mask effectively. The perceptually motivated cochleagram is decomposed into sparse and low-rank components via NRPCA or ISNRPCA, which correspond to speech and noises, respectively. Different from the classical RPCA, NRPCA imposes nonnegative constraints to regularize the decomposed components. Furthermore, ISNRPCA uses the perceptually meaningful Itakura-Saito measure as its optimization objective function. We use the alternating direction method of multipliers to solve the corresponding optimization problem. NRPCA and ISNRPCA are totally unsupervised, neither speech nor noise model needs to be trained beforehand. Experimental results demonstrate that NRPCA and ISNRPCA show promising results for speech enhancement. With respect to state of the art baselines, the proposed methods achieve better performance on noises suppression and demonstrate at least comparable intelligibility and overall-quality.


international conference on signal and information processing | 2015

Unsupervised monaural speech enhancement using robust NMF with low-rank and sparse constraints

Yinan Li; Xiongwei Zhang; Meng Sun; Gang Min

Non-negative spectrogram decomposition and its variants have been extensively investigated for speech enhancement due to their efficiency in extracting perceptually meaningful components from mixtures. Usually, these approaches are implemented on the condition that training samples for one or more sources are available beforehand. However, in many real-world scenarios, it is always impossible for conducting any prior training. To solve this problem, we proposed an approach which directly extracts the representations of background noises from the noisy speech via imposing non-negative constraints on the low-rank and sparse decomposition of the noisy spectrogram. The noise representations are subsequently utilized when estimating the clean speech. In this technique, potential spectral structural regularity could be discovered for better reconstruction of clean speech. Evaluations on the Noisex-92 and TIMIT database showed that the proposed method achieves significant improvements over the state-of-the-art methods in unsupervised speech enhancement.


international conference on acoustics, speech, and signal processing | 2016

Adaptive extraction of repeating non-negative temporal patterns for single-channel speech enhancement

Yinan Li; Xiongwei Zhang; Meng Sun; Gang Min; Jibin Yang

Estimating unknown background noise from single-channel noisy speech is a key yet challenging problem for speech enhancement. Given the fact that the background noises typically have the repeating property and the foreground speech is sparse and time-variant, many literatures decompose the noisy spectrogram directly in an unsupervised fashion when there is no isolated training example of the target speaker or particular noise types beforehand. However, recently proposed methods suffer from un-interpretable decomposed patterns, neglecting the temporal structure of the background noise or being constrained by the pre-fixed parameters. To settle these issues, we propose a novel method based on autocorrelation technique and convolutive non-negative matrix factorization. The proposed method can adaptively estimate the underlying non-negative repeating temporal patterns from noisy speech and identify the clean speech spectrogram simultaneously. Experiments on NOIZEUS dataset mixed with various real-world background noises showed that the proposed method performs better than some state-of-the-art methods.


multimedia signal processing | 2015

Speech enhancement based on robust NMF solved by alternating direction method of multipliers

Yinan Li; Xiongwei Zhang; Meng Sun; Jingfeng Pan

A robust version of non-negative matrix factorization (RNMF) with generalized Kullback-Leibler divergence designed for the task of unsupervised monaural speech enhancement is proposed. RNMF tackles unsupervised speech enhancement problem through factorizing the magnitude spectrum of mixture into the sum of a non-negative sparse matrix and a non-negative low-rank matrix. The parameters of nonnegative components are estimated through minimizing the reconstruction error defined by the divergence. The closed-from updating formulae of RNMF are derived using alternating direction method of multipliers. Experimental results demonstrated that the proposed algorithm yields superior results compared with the multiplicative updates at the expense of more computational complexity.


pacific rim conference on multimedia | 2016

Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation

Wei Han; Xiongwei Zhang; Jibin Yang; Meng Sun; Gang Min

Due to the powerful feature extraction ability, deep learning has become a new trend towards solving speech separation problems. In this paper, we present a novel Deep Neural Network (DNN) architecture for monaural speech separation. Taking into account the good mask property of the human auditory system, a perceptual modified Wiener filtering masking function is applied in the proposed DNN architecture, which is used to make the residual noise perceptually inaudible. The proposed architecture jointly optimize the perceptual modified Wiener filtering mask and DNN. Evaluation experiments on TIMIT database with 20 noise types at different signal-to-noise ratio (SNR) situations demonstrate the superiority of the proposed method over the reference DNN-based separation methods, no matter whether the noise appeared in the training database or not.


pacific rim conference on multimedia | 2016

Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints

Yinan Li; Xiongwei Zhang; Meng Sun; Xushan Chen; Lin Qiao

Conventional sparse and low-rank decomposition based speech enhancement algorithms seldom simultaneously consider the non-negativity and continuity of the enhanced speech spectrum. In this paper, an unsupervised algorithm for enhancing the noisy speech in a single channel recording is presented. The algorithm can be viewed as an extension of non-negative matrix factorization (NMF) which approximates the magnitude spectrum of noisy speech using the superposition of a low-rank non-negative matrix and a sparse non-negative matrix. The temporal continuity of speech is also considered by incorporating the sum of squared differences between the adjacent frames to the cost function. We prove that by iteratively updating parameters using the derived multiplicative update rules, the cost function finally converges to a local minimum. Simulation experiments on NOIZEUS database with various noise types demonstrate that the proposed algorithm outperforms recently proposed state-of-the-art methods under low signal-to-noise ratio (SNR) conditions.


international workshop on acoustic signal enhancement | 2016

Perceptual improvement of deep neural networks for monaural speech enhancement

Wei Han; Xiongwei Zhang; Meng Sun; Wenhua Shi; Xushan Chen; Yonggang Hu

Monaural speech enhancement is a key yet challenging problem for many important real world applications. Recently, deep neural networks(DNNs)-based speech enhancement methods, which extract useful feature from complex feature, have demonstrated remarkable performance improvement. In this paper, we present a novel DNN architecture for monaural speech enhancement. Taking into account the masking properties of the human auditory system, a piecewise gain function is applied in the proposed DNN architecture, which is used to reduce the noise and make the residual noise perceptually inaudible. The proposed architecture jointly optimize the piecewise gain function and DNN. Systematic experiments on TIMIT corpus with 20 noise types at various signal-to-noise ratio (SNR) conditions demonstrate the superiority of the proposed DNN over the reference speech enhancement methods, no matter in the matched noise conditions or in the unmatched noise conditions.


international conference on signal processing | 2016

Speech enhancement based on improved deep neural networks with MMSE pretreatment features

Wei Han; Congming Wu; Xiongwei Zhang; Meng Sun; Gang Min

Speech enhancement plays an important role in robust speech processing. Deep learning has become a new trend towards solving speech enhancement problems. The input feature is a key aspect of deep learning, which effect the enhancement performance. In this paper, we explore a new feature which extract through the minimum mean square error (MMSE) estimator pretreatment. Incorporating the MMSE pretreatment features, we proposed a novel deep neural network (DNN) for speech enhancement task. Evaluation experiments on TIMIT database with 20 noise types at different signal-to-noise ratio (SNR) situations demonstrate the effectiveness of the proposed approach compared with the reference DNN-based enhancement approaches, no matter whether the noise matched the training set or not.

Collaboration


Dive into the Meng Sun's collaboration.

Top Co-Authors

Avatar

Xiongwei Zhang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Gang Min

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yinan Li

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Wei Han

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jibin Yang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xia Zou

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xushan Chen

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yonggang Hu

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Wenhua Shi

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Li Li

University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge