Guangzhao Bao
University of Science and Technology of China
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guangzhao Bao.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Guangzhao Bao; Zhongfu Ye; Xu Xu; Yingyue Zhou
This paper discusses underdetermined blind source separation (BSS) using a compressed sensing (CS) approach, which contains two stages. In the first stage we exploit a modified K-means method to estimate the unknown mixing matrix. The second stage is to separate the sources from the mixed signals using the estimated mixing matrix from the first stage. In the second stage a two-layer sparsity model is used. The two-layer sparsity model assumes that the low frequency components of speech signals are sparse on K-SVD dictionary and the high frequency components are sparse on discrete cosine transformation (DCT) dictionary. This model, taking advantage of two dictionaries, can produce effective separation performance even if the sources are not sparse in time-frequency (TF) domain.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Guangzhao Bao; Yangfei Xu; Zhongfu Ye
This paper presents a novel dictionary learning (DL) method to improve the performance of sparsity based single-channel speech separation (SCSS). The conventional approaches regard the sub-dictionaries as independent units and learn sub-dictionaries separately in the short-time Fourier transform (STFT) domain using their corresponding training sets respectively. However, we take the relationship between the sub-dictionaries into account and optimize the sub-dictionaries jointly in the time domain. By satisfying a designed discrimination constraint, a structured dictionary, whose atoms have better correspondences to the speaker labels, is learned so that the sources can be recovered by the corresponding reconstruction after sparse coding. An algorithm, which consists of sparse coding stage and dictionary updating stage, is proposed to deal with this DL optimization problem. Two strategies, i.e., direct learning and adaptive learning, are presented to select the training sets which are used to learn the discriminative dictionary. Experimental results show that the proposed SCSS algorithms have superior performance compared with other tested approaches.
IEEE Signal Processing Letters | 2015
Renjie Tong; Guangzhao Bao; Zhongfu Ye
In this letter, we propose a tensor factorization approach for multichannel speech enhancement, which is very successful even when the noise level is high. Specifically, we extend the well-known subspace approach to arbitrary orders and present the higher order subspace approach for multichannel speech enhancement. Unlike previous algorithms, the proposed approach constructs a third order tensor from the noisy data and then applies a tensor operation to reduce the noise. Through this it preserves the original data structure and makes full use of the spatial and temporal correlations in the multichannel data. The proposed approach adopts an iterative and step-wise procedure which usually converges in a few iterations. At each step a subspace filter sharing the same form with the conventional subspace approach is updated. Experiments show that it has achieved considerable performance on white Gaussian noise in terms of segmental signal-to-noise ratio improvement. Rapid convergence of the proposed approach is also reported.
Publications of the Astronomical Society of Australia | 2011
Zhangqin Zhu; Jia Zhu; Sheng Wang; Shenghong Cao; Guangzhao Bao; Zhongfu Ye
A novel spectrum-extraction method based on a 2-D Gaussian model is proposed in this paper. First, the flat images are employed to fit the model parameters in the spatial orientation and the calibration lamp images are used to fit the model parameters in the wavelength orientation. Then normalized 2-D models are obtained by combining the parameters of the two orientations. The flux-extraction algorithm is based on least-square theory and the 2-D model. Through experiments, the extracted spectra by our method have a stronger ability to reduce noise than the 1-D spectrum extraction method.
IEEE Signal Processing Letters | 2016
You Luo; Guangzhao Bao; Yangfei Xu; Zhongfu Ye
Sparse representation is one of the most well-known methods that are applied to monaural speech enhancement. In order to make full use of the relationships among speech, noise, and mixture in sparse representation for speech enhancement, this letter proposes a novel sparsity model that consists of a couple of joint sparse representations (JSRs). One JSR uses the mapping relationship between mixture and speech while the other uses that between mixture and noise. Both relationships are used to constrain the joint dictionary learning, which effectively solves the source confusion problem of traditional methods. Moreover, the latter JSR can be complementary to the former JSR, depending on the level of structure of the noise. Thus, we propose a Gini index based weighting parameter to take their complementary advantages. The experimental results show that the proposed method outperforms state-of-the-art methods using various objective measures.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Renjie Tong; Yingyue Zhou; Long Zhang; Guangzhao Bao; Zhongfu Ye
In this paper, we propose a robust time-frequency decomposition (RTFD) model to restore audio signals degraded by sparse impulse noise mixed with small dense Gaussian noise. This kind of noise is very common especially in old-time recordings. The proposed RTFD model is based on the observation that these degraded audio signals mainly contain four parts, i.e., the quasi-periodic and voiced part, the aperiodic and transient part, the arbitrarily large impulse noise and the small dense Gaussian noise. Sparsity and local correlations of corresponding parts are exploited to solve the RTFD model. We also heuristically develop a discriminative orthogonal matching pursuit (DOMP) algorithm to more precisely estimate sparse representing vectors. Specifically, the DOMP algorithm divides the whole atom set into two subsets, i.e., the active subset and the passive subset. Atoms in two subsets are treated discriminatively since sparsity regularization terms are not equally weighted. Based on RTFD and DOMP, we have developed two algorithms, i.e., the fidelity-oriented algorithm and the articulation-oriented algorithm. The proposed algorithms achieve considerable performance on both synthetic and real noisy signals. Results show that the articulation-oriented algorithm using DOMP obviously outperforms other algorithms in heavier impulse noise situations.
Speech Communication | 2016
Long Zhang; Guangzhao Bao; Jing Zhang; Zhongfu Ye
A novel structure which combines the advantages of ratio mask (RM) and joint dictionary learning (JDL) is proposed for single-channel speech enhancement in this paper. The novel speech enhancement structure makes full use of the training data and overcomes some shortcomings of generative dictionary learning (GDL) algorithm. RMs of speech and interferer are introduced to provide the discriminative information both in the training stage and enhancement stage of the novel structure. In the training stage, the signals and their corresponding ideal RMs (IRMs) are used to learn the signal and IRM dictionaries jointly by K-SVD algorithm. In the enhancement stage, the mixture signal and mixture RM are sparsely represented over the composite dictionaries composed of the learned signal and IRM dictionaries to formulate a joint sparse coding (JSC) problem. Then, the estimated RMs (ERMs) of speech and interferer in the mixture are calculated to develop two soft mask (SM) filters. The proposed SM filters incorporate ideal binary mask technique and Wiener-type filter to make full use of the discriminative information provided by the ERMs. They are used to both strengthen the speech and suppress the interferer in the mixture. The proposed algorithms have shown their abilities to improve both speech intelligibility and quality. Experimental evaluations verify the proposed algorithms obtain comparable performances to a deep neural network (DNN) based mask estimator with lower computation and perform better than other tested algorithms.
international symposium on computational intelligence and design | 2015
Long Zhang; Guangzhao Bao; You Luo; Zhongfu Ye
In the real world, the interferers are often nonstationary and potentially similar to the speech where the conventional speech enhancement (SE) approaches are often incompetent. In the recently proposed sparsity-based approaches, the clean speech is often recovered from the degraded speech by sparse coding of the mixture over the composite dictionary consisting of the speech and interferer dictionaries. However, parts of the speech component are explained by interferer dictionary atoms and vice-versa, which cause source confusion. The existing approaches learn the speech and interferer dictionaries separately and the source confusion is relatively large. In this paper, we introduce a new joint dictionary learning (JDL) method for SE which learns the speech and interferer dictionaries jointly. In the proposed method, the information of speech, interferer and their mixture and the cross-coherence of dictionaries are taken into account. These two parts constitute the new cost function and an algorithm is presented to solve this JDL optimization problem. The experimental results show that our proposed approach can obtain better performances than other tested approaches, and the advantages are more obvious when the input signal-to-interferer ratios are low.
ieee signal processing workshop on statistical signal processing | 2014
Guangzhao Bao; Yangfei Xu; Xu Xu; Zhongfu Ye
This paper presents a novel algorithm for learning a hierarchical dictionary in the short-time Fourier (STFT) domain, which can improve the performance of dictionary learning (DL) based single-channel speech separation (SCSS). The goal of SCSS is to separate the underlying clean speeches from a signal mixture, which was often achieved by learning a pair of discriminative sub-dictionaries and sparsely coding the mixture speech signal over the dictionary pair. The case of 2 source speech signals is considered in this paper. Unfortunately, the existing DL approaches cannot avoid the source confusion drastically, i.e., when we sparsely represent the mixture signal over the dictionary pair, parts of the object speech component are explained by interferer speech dictionary atoms and vice-versa. In order to suppress more source confusion, we divide the training sets into two layers of components and learn hierarchical sub-dictionaries using different layers. Experimental testing is shown to verify the superior performance compared with other existing approaches.
Signal Processing | 2015
Yangfei Xu; Guangzhao Bao; Xu Xu; Zhongfu Ye