Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gang Min is active.

Publication


Featured researches published by Gang Min.


international workshop on acoustic signal enhancement | 2016

Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement

Gang Min; Xiongwei Zhang; Xia Zou; Meng Sun

Mask estimate is regarded as the main goal for using the computational auditory scene analysis method to enhance speech contaminated by noises. This paper presents extended robust principal component analysis (RPCA) methods, referred to as NRPCA and ISNRPCA, to estimate mask effectively. The perceptually motivated cochleagram is decomposed into sparse and low-rank components via NRPCA or ISNRPCA, which correspond to speech and noises, respectively. Different from the classical RPCA, NRPCA imposes nonnegative constraints to regularize the decomposed components. Furthermore, ISNRPCA uses the perceptually meaningful Itakura-Saito measure as its optimization objective function. We use the alternating direction method of multipliers to solve the corresponding optimization problem. NRPCA and ISNRPCA are totally unsupervised, neither speech nor noise model needs to be trained beforehand. Experimental results demonstrate that NRPCA and ISNRPCA show promising results for speech enhancement. With respect to state of the art baselines, the proposed methods achieve better performance on noises suppression and demonstrate at least comparable intelligibility and overall-quality.


international conference on signal and information processing | 2015

Unsupervised monaural speech enhancement using robust NMF with low-rank and sparse constraints

Yinan Li; Xiongwei Zhang; Meng Sun; Gang Min

Non-negative spectrogram decomposition and its variants have been extensively investigated for speech enhancement due to their efficiency in extracting perceptually meaningful components from mixtures. Usually, these approaches are implemented on the condition that training samples for one or more sources are available beforehand. However, in many real-world scenarios, it is always impossible for conducting any prior training. To solve this problem, we proposed an approach which directly extracts the representations of background noises from the noisy speech via imposing non-negative constraints on the low-rank and sparse decomposition of the noisy spectrogram. The noise representations are subsequently utilized when estimating the clean speech. In this technique, potential spectral structural regularity could be discovered for better reconstruction of clean speech. Evaluations on the Noisex-92 and TIMIT database showed that the proposed method achieves significant improvements over the state-of-the-art methods in unsupervised speech enhancement.


international conference on acoustics, speech, and signal processing | 2016

Adaptive extraction of repeating non-negative temporal patterns for single-channel speech enhancement

Yinan Li; Xiongwei Zhang; Meng Sun; Gang Min; Jibin Yang

Estimating unknown background noise from single-channel noisy speech is a key yet challenging problem for speech enhancement. Given the fact that the background noises typically have the repeating property and the foreground speech is sparse and time-variant, many literatures decompose the noisy spectrogram directly in an unsupervised fashion when there is no isolated training example of the target speaker or particular noise types beforehand. However, recently proposed methods suffer from un-interpretable decomposed patterns, neglecting the temporal structure of the background noise or being constrained by the pre-fixed parameters. To settle these issues, we propose a novel method based on autocorrelation technique and convolutive non-negative matrix factorization. The proposed method can adaptively estimate the underlying non-negative repeating temporal patterns from noisy speech and identify the clean speech spectrogram simultaneously. Experiments on NOIZEUS dataset mixed with various real-world background noises showed that the proposed method performs better than some state-of-the-art methods.


world congress on intelligent control and automation | 2016

Perceptual weighting deep neural networks for single-channel speech enhancement

Wei Han; Xiongwei Zhang; Gang Min; Xingyu Zhou; Wei Zhang

Improving the perceptual quality of speech signals is a key yet challenging problem for many real world applications. In this paper, we propose a perceptually motivated approach based on deep neural networks (DNNs) for speech enhancement task. The proposed approach take into consider the masking properties of the human auditory system and reduces the perceptual effect of the residual noise. Given the good performance of deep learning in signal representation, a DNN is employed for accurately modeling the clean speech spectrum. In the DNN training stage, perceptual weighting matrix is used to adjust the weight of the error when using back propagation algorithm transfer the error from DNN output layer to the front layer. Evaluations on various real-world background noises show that the proposed method performs better than different competitive methods.


international conference on multimedia and expo | 2016

A perceptually motivated approach via sparse and low-rank model for speech enhancement

Gang Min; Xiongwei Zhang; Jibin Yang; Wei Han; Xia Zou

A perceptually motivated speech enhancement approach is proposed in this paper. Different from the conventional sparse and low-rank model based approaches, this new approach takes into account the perceptual differences in different frequency bands of the human auditory system, and separates speech from background noises in the Mel spectral domain. After two propositions for the Mel frequency weighted spectrogram are proved, speech enhancement can be modeled as a sparse and low-rank constrained optimization problem, which is solved efficiently by the alternating direction method of multipliers (ADMM). The proposed approach is totally unsupervised, neither the speech nor the noise dictionary needs to be trained beforehand. The experimental results have shown its promising performance under strong background noises. The performance can be further improved by information fusion technique at high input SNRs.


pacific rim conference on multimedia | 2016

Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation

Wei Han; Xiongwei Zhang; Jibin Yang; Meng Sun; Gang Min

Due to the powerful feature extraction ability, deep learning has become a new trend towards solving speech separation problems. In this paper, we present a novel Deep Neural Network (DNN) architecture for monaural speech separation. Taking into account the good mask property of the human auditory system, a perceptual modified Wiener filtering masking function is applied in the proposed DNN architecture, which is used to make the residual noise perceptually inaudible. The proposed architecture jointly optimize the perceptual modified Wiener filtering mask and DNN. Evaluation experiments on TIMIT database with 20 noise types at different signal-to-noise ratio (SNR) situations demonstrate the superiority of the proposed method over the reference DNN-based separation methods, no matter whether the noise appeared in the training database or not.


international conference on multimedia and expo | 2016

Joint optimization of audible noise suppression and deep neural networks for single-channel speech enhancement

Wei Han; Xiongwei Zhang; Gang Min; Meng Sun; Jibin Yang

Improving the perceptual quality of speech signals is a key yet challenging problem for many real world applications. Taking into account the good performance of deep learning in signal representation, a novel single-channel speech enhancement technique is presented based on joint Deep Neural Networks and audible noise suppression as a whole network architecture. This new deep neural network jointly trains an audible noise suppression function which is used to estimate the magnitude spectrum of the clean speech and shape the spectrum of the audible noise at the same time. Experimental results on TIMIT with 20 noise types at various noise levels demonstrate the superiority of the proposed method over the baselines, no matter whether the noise conditions are included in the training set or not.


IEEE Signal Processing Letters | 2016

Perceptually Weighted Analysis-by-Synthesis Vector Quantization for Low Bit Rate MFCC Codec

Gang Min; Xiongwei Zhang; Xia Zou; Jibin Yang

This letter presents a perceptually weighted analysis-by-synthesis vector quantization (VQ) algorithm for low bit rate MFCC codec. Different from conventional VQ of mel-frequency cepstral coefficients (MFCCs) vector, this algorithm uses an analysis-by-synthesis technique and aims to minimize the perceptually weighted spectral reconstruction distortion rather than the distortion of MFCCs vector itself. Also, to reduce the computational complexity, we propose a practical suboptimal codebook searching technique and embed it into the split and multistage VQ framework. Objective and subjective experimental results on Mandarin speech show that the proposed algorithm yields intelligible and natural sounding speech for speech coding at 600-2400 bit/s. Compared to current VQ in MFCC codec, the output speech quality is substantially improved in terms of frequency-weighted segmental SNR, short-time objective intelligibility score, perceptual evaluation of speech quality score, and mean opinion score.


ieee international conference on progress in informatics and computing | 2015

A novel single channel speech enhancement based on joint Deep Neural Network and Wiener Filter

Wei Han; Xiongwei Zhang; Gang Min; Xingyu Zhou

In this paper, we present a novel single channel speech enhancement method based on joint Deep Neural Network (DNN) and Wiener Filter as a whole network named Wiener Deep Neural Network (WDNN). The proposed method contains two stages: the training stage and the enhancement stage. In the training stage, WDNN predicts the clean speech magnitude spectra and the noise magnitude spectra from noisy speech features simultaneously. Then, the Wiener filter is placed on top of the two output of the neural network as an extra layer to generate the enhanced speech magnitude spectra. Finally, we use the phase of noisy speech to reconstruct clean speech. In the enhancement stage, the well-trained WDNN is fed with the features of noisy speech in order to obtain the enhanced speech. Extensive experimental results show that the proposed method outperforms state-of-the-art methods such as the non-negative matrix factorization (NMF) and the tradition DNN methods.


IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences | 2015

Speech Enhancement Combining NMF Weighted by Speech Presence Probability and Statistical Model

Yonggang Hu; Xiongwei Zhang; Xia Zou; Gang Min; Meng Sun; Yunfei Zheng

Collaboration


Dive into the Gang Min's collaboration.

Top Co-Authors

Avatar

Xiongwei Zhang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Meng Sun

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Wei Han

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xia Zou

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jibin Yang

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Xingyu Zhou

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yinan Li

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yonggang Hu

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yunfei Zheng

University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Ji bin Yang

University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge