Tieran Zheng
Harbin Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tieran Zheng.
international conference on acoustics, speech, and signal processing | 2012
Datao You; Jiqing Han; Guibin Zheng; Tieran Zheng
This paper presents a robust approach to improve the performance of voice activity detector (VAD) in low signal-to-noise ratio (SNR) noisy environments. To this end, we first generate sparse representations by Bregman Iteration based sparse decomposition with a learned over-complete dictionary, and derive a kind of audio feature called sparse power spectrum from the sparse representations. we then propose a method to calculate the short segment average spectrum and long segment average spectrum from sparse power spectrum. Finally, we design a criterion to detect speech region and non-speech region based on the above average spectrum. Experiments show that the proposed approach further improves the performance of VAD in low SNR noisy environments.
international conference on acoustics, speech, and signal processing | 2012
Yongjun He; Jiqing Han; Shiwen Deng; Tieran Zheng; Guibin Zheng
As a promising technique, sparse representation has been extensively investigated in signal processing community. Recently, sparse representation is widely used for speech processing in noisy environments; however, many problems need to be solved because of the particularity of speech. One assumption for speech denoising with sparse representation is that the representation of speech over the dictionary is sparse, while that of the noise is dense. Unfortunately, this assumption is not sustained in speech denoising scenario. We find that many noises, e.g., the babble and white noises, are also sparse over the dictionary trained with clean speech, resulting in severe residual noise in sparse enhancement. To solve this problem, we propose a novel residual noise reduction (RNR) method which first finds out the atoms which represents the noise sparely, and then ignores them in the reconstruction of speech. Experimental results show that the proposed method can reduce residual noise substantially.
international conference on acoustics, speech, and signal processing | 2011
Shiwen Deng; Jiqing Han; Tieran Zheng; Guibin Zheng
The maximum a posteriori (MAP) criterion is broadly used in the statistical model-based voice activity detection (VAD) approaches. In the conventional MAP criterion, however, the inter-frame correlation of the voice activity is not taken into consideration. In this paper, we proposes a novel modified MAP criterion based on a two-state hidden Markov model (HMM) to improve the performance of the VAD, and the the inter-frame correlation of the voice activity is modeled. With the proposed MAP criterion, the decision rule is derived by explicitly incorporating the a priori, a posteriori, and inter-frame correlation information into the likelihood ratio test (LRT). In the LRT, a compensation factor for the hypothesis of speech presence is used to regulate the trade-off between the probability of detection and the false alarm probability. Experimental results show the superiority of the VAD algorithm based on the proposed MAP criterion in comparison with that based on the recent conditional MAP criterion (CMAP) under various noise conditions.
Digital Signal Processing | 2014
Yongjun He; Jiqing Han; Tieran Zheng; Guanglu Sun
Channel distortion is one of the major factors which degrade the performances of automatic speech recognition (ASR) systems. Current compensation methods are generally based on the assumption that the channel distortion is a constant or slowly varying bias in an utterance or globally. However, this assumption is not sustained in a more complex circumstance, when the speech records being recognized are from many different unknown channels and have parts of the spectrum completely removed (e.g. band-limited speech). On the one hand, different channels may cause different distortions; on the other, the distortion caused by a given channel varies over the speech frames when parts of the speech spectrum are removed completely. As a result, the performance of the current methods is limited in complex environments. To solve this problem, we propose a unified framework in which the channel distortion is first divided into two subproblems, namely, spectrum missing and magnitude changing. Next, the two types of distortions are compensated with different techniques in two steps. In the first step, the speech bandwidth is detected for each utterance and the acoustic models are synthesized with clean models to compensate for spectrum missing. In the second step, the constant term of the distortion is estimated via the expectation-maximization (EM) algorithm and subtracted from the means of the synthesized model to further compensate for magnitude changing. Several databases are chosen to evaluate the proposed framework. The speech in these databases is recorded in different channels, including various microphones and band-limited channels. Moreover, to simulate more types of spectrum missing, various low-pass and band-pass filters are used to process the speech from the chosen databases. Although these databases and their filtered versions make the channel conditions more challenging for recognition, experimental results show that the proposed framework can substantially improve the performance of ASR systems in complex channel environments.
international conference on intelligent control and information processing | 2011
Haiyang Li; Jiqing Han; Tieran Zheng; Guibin Zheng
A method for confidence measure (CM) using syllable based confidence features is proposed to improve false-alarm rejection of the mandarin keyword spotting (KWS). The features take advantage of the merit of mandarin syllable structure and describe the confidences in every sub-syllable level. The evaluation is processed with support vector machine (SVM) on telephone speech database. Compared with the typical method, the experimental results show that the proposed CM features and SVM based method yields significant improvement, and at best a reduction of 12.13% equal error rate (EER) is gotten.
international conference on acoustics, speech, and signal processing | 2011
Yongjun He; Jiqing Han; Tieran Zheng; Guibin Zheng
Mismatch in speech bandwidth between training and real operation greatly degrades the performance of automatic speech recognition (ASR) systems. Missing feature technique (MFT) is effective in handling bandwidth mismatch. However, current MFT-based methods ignore the mismatch in the filterbank channels which cover the upper and lower limit cutoff frequencies. To solve this problem, we propose to partition the feature into reliable, unreliable and partly reliable parts, and then modify the probability density functions (PDFs) of the partly reliable part to match band-limited features. Experiments showed that such compensation further improved the performances of MFT-based methods under band-limited conditions.
international conference on acoustics, speech, and signal processing | 2013
Haiyang Li; Jiqing Han; Tieran Zheng; Guibin Zheng
The Kullback-Leibler (KL) divergence is often used for a similarity comparison between two Hidden Markov models (HMMs). However, there is no closed form expression for computing the KL divergence between HMMs, and it can only be approximated. In this paper, we propose two novel methods for approximating the KL divergence between the left-to-right transient HMMs. The first method is a product approximation which can be calculated recursively without introducing extra parameters. The second method is based on the upper and lower bounds of KL divergence, and the mean of these bounds provides an available approximation of the divergence. We demonstrate the effectiveness of the proposed methods through experiments including the deviations to the numerical approximation and the task of predicting the confusability of phone pairs. Experimental results show that the proposed product approximation is comparable with the current variational approximation, and the proposed approximation based on bounds performs better than current methods in the experiments.
international conference on acoustics, speech, and signal processing | 2013
Tieran Zheng; Jiqing Han; Guibin Zheng; Shiwen Deng
In some practical keyword spotting applications, users or service providers are willing to provide spotting-result feedback to help improve system performance. To do so, they require a keyword spotting technique with a sustained learning ability. This paper presents a new Chinese keyword spotting method based on a case based reasoning framework. Two level keyword case representations are adopted based on a set of symbols that are discriminative both in acoustic feature vector space and in semantic space. Then case bases are indexed with a tree structure and searched for test speech based on an elastic matching strategy. Finally, the feedback is used to adjust the statistics attached to the cases or to append new cases. Two experiments were conducted to compare our approach with a syllable lattice based method and to test the sustained learning ability.
international conference on acoustics, speech, and signal processing | 2011
Datao You; Tao Jiang; Jiqing Han; Tieran Zheng
In this paper, a robust feature for text-independent speaker recognition is proposed, which simulate the response mode of cochlear neurons in processing acoustic signal. The feature is derived from sparse coding coefficient which is computed on a learned over-complete dictionary, and the dictionary is considered similar to part of speech sensitive cochlear neurons. Furthermore, the feature is generated without dimension reducing and de-correlation. The robust feature is implemented to address the problem of mismatch situation between training and testing. Experiments show that the proposed feature outperforms the Mel-frequency cepstral coefficients (MFCC) feature, especially under noisy environments, the equal error rate (EER) of the MFCC drops to 21.6% (10 dB) from 10.3% (25 dB), while the EER of the proposed feature is also 6.6% (10 dB) with no degradation.
Archive | 2010
Xunxun Chen; Jiqing Han; Tao Jiang; Feng Liu; Zhen Wu; Bing Zhang; Guibin Zheng; Tieran Zheng; Yuan Zhou