Mao-shen Jia
Beijing University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mao-shen Jia.
international conference on acoustics, speech, and signal processing | 2010
Yong-tao Sha; Changchun Bao; Mao-shen Jia; Xin Liu
The quality of audio signals that have been encoded with low-bit rate audio coding standards is degraded because the high frequency information has been removed. The quality of such audio signals can, however, be improved by reconstructing the high frequency information which was lost. In this paper the principles of audio signal production and the characteristics of the human hearing system have been used to develop a blind high frequency reconstruction method based on chaotic prediction theory. Performance evaluation with objective and subjective tests has shown that this method is, in most cases, more efficient than other blind high frequency reconstruction methods.
international conference on signal processing | 2010
Xin Liu; Changchun Bao; Mao-shen Jia; Yong-tao Sha
For conventional bandwidth extension, the spectral patching methods, such as spectral folding, spectral translation and non-linear processing, are employed to reconstruct high frequency signal, yet it leads to the spectral shifting between reconstructed and original signal, and does not retain the original harmonic relations. In this paper, a blind harmonic bandwidth extension method from wideband to super-wideband was proposed by estimating the energy of high frequency spectral envelope with Gaussian mixture model (GMM). Both the objective and subjective test results show that proposed algorithm performs better than conventional blind bandwidth extension algorithms.
international conference on signal and information processing | 2014
Feng Bao; Hui-jing Dou; Mao-shen Jia; Changchun Bao
In this paper, we propose a speech enhancement method based on a few shapes of speech spectrum. First, we utilizes Minima Controlled Recursive Averaging (MCRA) algorithm to estimate the noise instead of training the noise codebooks used in conventional method. Then, the spectral shapes and the spectral gains of speech and noise are optimized by minimizing the spectral distortion between the noisy speech and the combination of noise and speech. Next, the normalized cross-correlation coefficients between the spectra of noisy speech and noise are used to modify the spectral gains of speech and noise. Finally, the noisy speech is passed through the reconstructed Wiener filter to obtain the enhanced speech. The objective and subjective tests show that the performance of removing annoying background noise occurred in the unvoiced segments or silence segments is much better than the conventional codebook-based method.
international conference on signal processing | 2014
Zhen-zhen Gao; Changchun Bao; Feng Bao; Mao-shen Jia
Speech enhancement based on hidden Markov model (HMM) and the minimum mean square error (MMSE) criterion in Mel-frequency domain is generally considered as a weighted-sum filtering of the noisy speech. The weights of filters are often estimated by the HMM of noisy speech, and the estimation of filters usually requires an inverse operation from the Mel-frequency to the spectral domain which often causes spectral distortion. In order to obtain a more accurate HMM of noisy speech, the vector Taylor series (VTS) is used to estimated the mean vectors and covariance matrices of HMM for noisy speech. To reduce the distortion derived from inversion operation, a parallel Mel-frequency and log-magnitude (PMLM) modeling approach is proposed. In PMLM, a simultaneous modeling in both Mel-frequency domain and log-magnitude (LOG-MAG) domain is performed to train the HMMs of the clean speech and noise. Experimental results show that, in comparison with the reference methods, the proposed method can get better performance for different noise environments and input SNRs.
international symposium on chinese spoken language processing | 2016
Feng Deng; Changchun Bao; Mao-shen Jia
In this paper, a hidden Markov model (HMM)-based cue parameters estimation method for single-channel speech enhancement is proposed, in which the cue parameters of binaural cue coding (BCC) are applied to single-channel speech enhancement system successfully. First, the clean speech and noise signals are considered as the left and right channels of stereo signal, respectively; and the noisy speech is treated as the down-mixed mono signal of BCC method. According to the clean speech and noise data set and the corresponding noisy speech data set, the clean cue parameters and pre-enhanced cue parameters are extracted, respectively. Then the cue HMM is trained offline, which exploits the a priori information about the clean cue parameters and the pre-enhanced cue parameters for speech enhancement. Next, using the trained cue HMM, the clean cue parameters are estimated from noisy speech online. Finally, following the synthesis principle of BCC cue parameters, the speech estimator is constructed for enhancing noisy speech. The test results demonstrate that, for the segmental signal-noise-ratio (SNR), the log spectral distortion and PESQ measures, the proposed method performs better than the reference methods.
international symposium on signal processing and information technology | 2011
Mao-shen Jia; Changchun Bao; Xin Liu; Xiao-ming Li; Ru-wei Li
In this paper a compressive sampling method of MLT coefficients which is used for extracting stereo information is adopted based on principal component analysis (PCA) and Modulated Lapped Transform (MLT). With this method, an embedded variable bit-rates stereo speech and audio coding algorithm is proposed in this paper. In this codec, the stereo signal sampled at 32 kHz and 16 kHz can be coded in terms of scalable bit rates, the structure of bit-stream is embedded and the bit-stream can be divided into several layers. The core codec is ITU-T G.729.1 which can process mono signal with 7 kHz bandwidth. Besides there are four extra bit-rates added include 40, 48, 56, and 64kb/s. The maximum bit-rates of wideband stereo signal and super-wideband stereo signal are 48kb/s and 64kb/s, respectively. The objective and subjective test results show that the quality of the proposed codec is no worse than the reference codec which is requested by ITU-T.
international conference on signal processing | 2008
Mao-shen Jia; Changchun Bao; Rui Li
This paper describes an embedded speech and audio codec which is based on ITU-T Recommendation G.722.1; it can process 7 kHz bandwidth speech and audio signal at scalable bit rates. Based on the G.722.1 of ITU-T, this algorithm adds two modules: the energy ordering of sub-band and the processing of bit-stream truncation. Furthermore, it does some modification on the categorization and noise-fill modules. It makes sure that the codec could produce embedded bit-stream, so this codec had more robustness in the transmission. The test results by ITU-T PESQ show that this codec has good performance as G.722.1 at the same bit-rates.
international conference on signal processing | 2012
Ruwei Li; Changchun Bao; Bing-yin Xia; Mao-shen Jia
Archive | 2009
Changchun Bao; Mao-shen Jia; Rui Li
Archive | 2009
Changchun Bao; Mao-shen Jia; Rui Li