Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guibin Zheng is active.

Publication


Featured researches published by Guibin Zheng.


international conference on acoustics, speech, and signal processing | 2012

Sparse power spectrum based robust voice activity detector

Datao You; Jiqing Han; Guibin Zheng; Tieran Zheng

This paper presents a robust approach to improve the performance of voice activity detector (VAD) in low signal-to-noise ratio (SNR) noisy environments. To this end, we first generate sparse representations by Bregman Iteration based sparse decomposition with a learned over-complete dictionary, and derive a kind of audio feature called sparse power spectrum from the sparse representations. we then propose a method to calculate the short segment average spectrum and long segment average spectrum from sparse power spectrum. Finally, we design a criterion to detect speech region and non-speech region based on the above average spectrum. Experiments show that the proposed approach further improves the performance of VAD in low SNR noisy environments.


international conference on acoustics, speech, and signal processing | 2012

A solution to residual noise in speech denoising with sparse representation

Yongjun He; Jiqing Han; Shiwen Deng; Tieran Zheng; Guibin Zheng

As a promising technique, sparse representation has been extensively investigated in signal processing community. Recently, sparse representation is widely used for speech processing in noisy environments; however, many problems need to be solved because of the particularity of speech. One assumption for speech denoising with sparse representation is that the representation of speech over the dictionary is sparse, while that of the noise is dense. Unfortunately, this assumption is not sustained in speech denoising scenario. We find that many noises, e.g., the babble and white noises, are also sparse over the dictionary trained with clean speech, resulting in severe residual noise in sparse enhancement. To solve this problem, we propose a novel residual noise reduction (RNR) method which first finds out the atoms which represents the noise sparely, and then ignores them in the reconstruction of speech. Experimental results show that the proposed method can reduce residual noise substantially.


international conference on acoustics, speech, and signal processing | 2014

Robust minimum statistics project coefficients feature for acoustic environment recognition

Shiwen Deng; Jiqing Han; Chaozhu Zhang; Tieran Zheng; Guibin Zheng

Acoustic environment recognition has been widely used in many applications, and is a considerable difficult problem for the real-life and complex environment. This paper proposes a novel feature, named minimum statistics project coefficients (MSPC), and intents to solve this problem. The MSPC feature is extracted from the background sound which is more robust than the foreground sound for the task of acoustic environment recognition. Experimental results show the outstanding performance of the MSPC feature compared with the conventional acoustic features, especially in very complex acoustic environments.


international conference on acoustics, speech, and signal processing | 2011

A modified MAP criterion based on hidden Markov model for voice activity detecion

Shiwen Deng; Jiqing Han; Tieran Zheng; Guibin Zheng

The maximum a posteriori (MAP) criterion is broadly used in the statistical model-based voice activity detection (VAD) approaches. In the conventional MAP criterion, however, the inter-frame correlation of the voice activity is not taken into consideration. In this paper, we proposes a novel modified MAP criterion based on a two-state hidden Markov model (HMM) to improve the performance of the VAD, and the the inter-frame correlation of the voice activity is modeled. With the proposed MAP criterion, the decision rule is derived by explicitly incorporating the a priori, a posteriori, and inter-frame correlation information into the likelihood ratio test (LRT). In the LRT, a compensation factor for the hypothesis of speech presence is used to regulate the trade-off between the probability of detection and the false alarm probability. Experimental results show the superiority of the VAD algorithm based on the proposed MAP criterion in comparison with that based on the recent conditional MAP criterion (CMAP) under various noise conditions.


International Journal of Pattern Recognition and Artificial Intelligence | 2016

Speaker Verification via Modeling Kurtosis Using Sparse Coding

Wei Wang; Jiqing Han; Tieran Zheng; Guibin Zheng; Xingyu Zhou

This paper proposes a new model for speaker verification by employing kurtosis statistical method based on sparse coding of human auditory system. Since only a small number of neurons in primary auditory cortex are activated in encoding acoustic stimuli and sparse independent events are used to represent the characteristics of the neurons. Each individual dictionary is learned from individual speaker samples where dictionary atoms correspond to the cortex neurons. The neuron responses possess statistical properties of acoustic signals in auditory cortex so that the activation distribution of individual speaker’s neurons is approximated as the characteristics of the speaker. Kurtosis is an efficient approach to measure the sparsity of the neuron from its activation distribution, and the vector composed of the kurtosis of every neuron is obtained as the model to characterize the speaker’s voice. The experimental results demonstrate that the kurtosis model outperforms the baseline systems and an effective identity validation function is achieved desirably.


Circuits Systems and Signal Processing | 2014

Sparse Representation with Optimized Learned Dictionary for Robust Voice Activity Detection

Datao You; Jiqing Han; Guibin Zheng; Tieran Zheng; Jie Li

Traditionally, most of voice activity detection (VAD) methods are based on speech features such as spectrum, temporal energy, and periodicity. The robustness of these features plays a critical role on the performance of VAD. However, since these features are always directly generated from observed signal, the robustness of these features would be significantly degraded in non-stationary noise environments, especially at low level signal-to-noise ratio (SNR) condition. This paper proposes a kind of robust feature for VAD based on sparse representation with an optimized learned dictionary. To do so, a speech dictionary and a noise dictionary are first learned from speech corpus and noise corpus, respectively. Then an optimization algorithm is designed to reduce the mutual coherence between the two learned dictionaries. After that the proposed feature is generated from the optimized dictionary-based sparse representation, and a VAD method is derived from the proposed feature. The proposed method is evaluated over seven types of noise and four types of SNR level, experimental results show that the optimized dictionary is important for enhancing the robustness of the proposed method, and the proposed method performs well under non-stationary noise, especially at low level SNR condition.


International Journal of Pattern Recognition and Artificial Intelligence | 2012

SPARSE-BASED AUDITORY MODEL FOR ROBUST SPEAKER RECOGNITION

Datao You; Jiqing Han; Tieran Zheng; Guibin Zheng

The mismatch between the training and the testing environments greatly degrades the performance of speaker recognition. Although many robust techniques have been proposed, speaker recognition in mismatch condition is still a challenge. To solve this problem, we propose a sparse-based auditory model as the front-end of speaker recognition by simulating auditory processing of speech signal. To this end, we introduce narrow-band filter-bank instead of the widely used wide-band filter-bank to simulate the basilar membrane filter-bank, use sparse representation as the approximation of basilar membrane coding strategy, and incorporate the frequency selectivity enhance mechanism between tectorial membrane and basilar membrane by practical engineering approximation. Compared with the standard Mel-frequency cepstral coefficient approach, our preliminary experimental results indicate that the sparse-based auditory model consistently improve the robustness of speaker recognition in mismatched condition.


international conference on intelligent control and information processing | 2011

Mandarin keyword spotting using syllable based confidence features and SVM

Haiyang Li; Jiqing Han; Tieran Zheng; Guibin Zheng

A method for confidence measure (CM) using syllable based confidence features is proposed to improve false-alarm rejection of the mandarin keyword spotting (KWS). The features take advantage of the merit of mandarin syllable structure and describe the confidences in every sub-syllable level. The evaluation is processed with support vector machine (SVM) on telephone speech database. Compared with the typical method, the experimental results show that the proposed CM features and SVM based method yields significant improvement, and at best a reduction of 12.13% equal error rate (EER) is gotten.


international conference on acoustics, speech, and signal processing | 2011

Compensation of partly reliable components for band-limited speech recognition with missing data techniques

Yongjun He; Jiqing Han; Tieran Zheng; Guibin Zheng

Mismatch in speech bandwidth between training and real operation greatly degrades the performance of automatic speech recognition (ASR) systems. Missing feature technique (MFT) is effective in handling bandwidth mismatch. However, current MFT-based methods ignore the mismatch in the filterbank channels which cover the upper and lower limit cutoff frequencies. To solve this problem, we propose to partition the feature into reliable, unreliable and partly reliable parts, and then modify the probability density functions (PDFs) of the partly reliable part to match band-limited features. Experiments showed that such compensation further improved the performances of MFT-based methods under band-limited conditions.


international conference on neural information processing | 2017

Learning Deep Neural Network Based Kernel Functions for Small Sample Size Classification

Tieran Zheng; Jiqing Han; Guibin Zheng

Kernel learning is to learn a kernel function based on the set of all sample pairs from training data. Even for small sample size classification tasks, the set size is mostly large enough to make a complex kernel that holds lots of parameters being well optimized. Hence, the complex kernel can be helpful in improving classification performance via providing more meaningful feature representation in kernel induced feature space. In this paper, we propose to embed a deep neural network (DNN) into kernel functions, taking its output as kernel parameter to adjust the feature representations adaptively. Two kind of DNN based kernels are defined, and both of them are proved to satisfy the Mercer theorem. Considering the connection between kernel and classifier, we optimize the proposed DNN based kernels by exploiting the GMKL alternating optimization framework. A stochastic gradient descent (SGD) based algorithm is also proposed, which still implements alternating optimization in each iteration. Furthermore, an incremental batch size method is given to reduce gradient noise gradually in optimization process. Experimental results show that our method performed better than the typical methods.

Collaboration


Dive into the Guibin Zheng's collaboration.

Top Co-Authors

Avatar

Tieran Zheng

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jiqing Han

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Haiyang Li

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Shiwen Deng

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Datao You

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tao Jiang

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yongjun He

Harbin Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Chaozhu Zhang

Harbin Engineering University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Li Ding

Washington University in St. Louis

View shared research outputs
Researchain Logo
Decentralizing Knowledge