Maosheng Zhang
Wuhan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maosheng Zhang.
international conference on multimedia and expo | 2014
Cheng Yang; Ruimin Hu; Liuyue Su; Weiping Tu; Xiaochen Wang; Yuhong Yang; Ge Gao; Shi Dong; Song Wang; Maosheng Zhang; Furong Lei; Shiqing Li
This paper presents a compression technique to improve the quality of three-dimensional (3D) audio produced by multiple loudspeaker channels or by headphone. The approach is based on extracting the side information of spatial sound sources within the three-dimensional space when capturing the sound sources. Different from other compression technique, the distances of sound sources are included in the side information. The separated signals of different sound sources are downmixed into one mono or stereo audio signal with the side information. The resulting downmixed signal is then compressed with traditional audio coder, resulting in a better perceptual quality of 3D audio by adding the distance parameter in the side information, and maintaining a low bit rates comparable with directional audio coding (DirAC).
pacific rim conference on multimedia | 2015
Lin Jiang; Ruimin Hu; Xiaochen Wang; Maosheng Zhang
Modern audio coding technologies apply methods of bandwidth extension (BWE) to efficiently represent audio data at low bitrates. An established method is the well-known spectral band replication (SBR) that can provide the very high sound quality with imperceptible artifact. However, its bitrates and complexity are very high. Another great method is LPC-based BWE, which is part of 3GPP AMR-WB+ codec. Although its bitrates and complexity are reduced distinctly, the sound quality it provided is unsatisfactory for music. In this paper, a novel bandwidth extension method is proposed which provided the high sound quality close to eSBR, with only 0.8 kbps bitrates. The proposed method predicts the fine structure of high frequency band from low frequency band by a deep auto-encoder, and only extracts the envelope of high frequency as side information. The performance evaluation demonstrates the advantage of the proposed method compared to the state of the art. Compared with eSBR, the bitrates drop about 63 %, and the subjective listening quality is close to it. Compared with LPC-based BWE, the subjective listening quality is better than it with the same bitrates.
computer music modeling and retrieval | 2012
Ruimin Hu; Shi Dong; Heng Wang; Maosheng Zhang; Song Wang; Dengshi Li
The 3D audio coding forms a competitive research area due to the standardization of both international standards i.e. MPEG and localized standards i.e. Audio and Video Coding Standard workgroup of China, AVS. Perception of 3D audio is a key issue for standardization and remains a challenging problem. Besides current solutions adopted from traditional audio engineering, we are working for an original 3D audio solution for compression. This paper represents our initial results about 3D audio perception include directional measurement of Just Noticeable Difference JND and Perceptual Entropy PE. We also represent the possible applications of these results in our future researches.
Wuhan University Journal of Natural Sciences | 2015
Maosheng Zhang; Ruimin Hu; Shihong Chen; Xiaochen Wang; Lin Jiang; Heng Wang
A new method for estimating gain factors in amplitude panning system is proposed. The method is based on particle velocity and balanced sound energy formulation. A scale factor is employed in amplitude panning system and thus, an overdetermined system of equation is derived in particle velocity equation. To obtain the analytic solution of the overdetermined equation, the sound energy identical formula is considered and then the unique gain factors are estimated. The proposed method is able to reproduce sound source direction and control the distance perception in a flexible two- or three-dimension loudspeaker setup. Subjective evaluations show that the proposed technique in an aspheric loudspeaker setup maintains the sound direction and controls the distance perception at the listening point.
high performance computing and communications | 2014
Yuhong Yang; Shaolong Dong; Ruimin Hu; Yanye Wang; Li Gao; Maosheng Zhang
This paper proposes an inter-frame correlation based error concealment approach for hybrid CELP (Code Excited Linear Prediction) and transform codecs with both good speech and audio quality at moderate bit rates. The proposed scheme is designed to overcome the main challenge due to the diversified characteristics of input signals. The underlying idea is to employ the inter-frame correlation of previous neighborhood frames to circumvent the pitfalls of referring to the unrelated frames, and to enable effective prediction of ISF (Immittance Spectral Frequencies) spectrum coefficients of missing frames from the immediate relative history using linear regression approach. Objective and subjective evaluation results for the proposed approach, in comparison with existing technique of AVS-P10 (Audio Video coding of China Standard Part 10 -- Mobile Speech and Audio Codec), provide strong evidence for gains across a variety of speech and audio signals.
conference on multimedia modeling | 2017
Jiawang Xu; Xiaochen Wang; Maosheng Zhang; Cheng Yang; Ge Gao
In this paper, a method combining the distance variation function (DVF) and image source method (ISM) is presented to generate binaural 3D audio with accurate feeling of distance. The DVF is introduced to indicate the change in intensity and inter-aural difference when the distance between listener and source changes. Then an artificial reverberation simulated by ISM is added. The reverberation introduces the energy ratio of direct-to-reverberant, which provides an absolute cue to distance perception. The distance perception test results indicate improvement for distance perception when sound sources located within 50 cm. In addition, the variance of perceptual distance was much smaller than that using DVF only. The reduction of variance is a proof that the method proposed in this paper can generate 3D audio with more accurate and steadier feeling of distance.
China Communications | 2017
Cheng Yang; Ruimin Hu; Xiaochen Wang; Yuhong Yang; Maosheng Zhang; Wei Chen
A new three-dimensional (3D) audio coding approach is presented to improve the spatial perceptual quality of 3D audio. Different from other audio coding approaches, the distance side information is also quantified, and the non-uniform perceptual quantization is proposed based on the spatial perception features of the human auditory system, which is named as concentric spheres spatial quantization (CSSQ) method. Comparison results were presented, which showed that a better distance perceptual quality of 3D audio can be enhanced by 5.7%∼8.8% through extracting and coding the distance side information comparing with the directional audio coding, and the bit rate of our coding method is decreased of 8.07% comparing with the spatial squeeze surround audio coding.
Wuhan University Journal of Natural Sciences | 2016
Xiaoyan Sun; Maosheng Zhang; Shaowu Mao; Zhengwei Ren; Huanguo Zhang
Software watermarking is an efficient tool to verify the copyright of software. Public key cryptosystem-based watermarking is widely researched. However, the popular public key cryptosystem is not secure under quantum algorithm. This paper proposes a novel software watermarking scheme based on multivariate public key cryptosystem. The copyright information generated by copyright holder is transformed into copyright numbers using multivariate quadratic polynomial equations inspired by multivariate public key cryptosystem (MPKC). Every polynomial is embedded into the host program independently. Based on the security performance of MPKC, the robustness and invisibility of the proposed scheme is significantly improved in comparison with the RSA-based watermarking method.
China Communications | 2016
Shenming Qu; Ruimin Hu; Shihong Chen; Junjun Jiang; Zhongyuan Wang; Maosheng Zhang
Recently, neighbor embedding based face super-resolution (SR) methods have shown the ability for achieving high-quality face images, those methods are based on the assumption that the same neighborhoods are preserved in both low-resolution (LR) training set and high-resolution (HR) training set. However, due to the “one-to-many” mapping between the LR image and HR ones in practice, the neighborhood relationship of the LR patch in LR space is quite different with that of the HR counterpart, that is to say the neighborhood relationship obtained is not true. In this paper, we explore a novel and effective re-identified K-nearest neighbor (RIKNN) method to search neighbors of LR patch. Compared with other methods, our method uses the geometrical information of LR manifold and HR manifold simultaneously. In particular, it searches K-NN of LR patch in the LR space and refines the searching results by re-identifying in the HR space, thus giving rise to accurate K-NN and improved performance. A statistical analysis of the influence of the training set size and nearest neighbor number is given, experimental results on some public face databases show the superiority of our proposed scheme over state-of-the-art face hallucination approaches in terms of subjective and objective results as well as computational complexity.
pacific rim conference on multimedia | 2015
Cheng Yang; Ruimin Hu; Liuyue Su; Xiaochen Wang; Maosheng Zhang; Shenming Qu
To improve the spatial precision of three-dimensional (3D) audio, the bit rates of spatial parameters are increased sharply. This paper presents a spatial parameters compression approach to decrease the bit rates of spatial parameters for 3D audio. Based on spatial direction filtering and spatial side information clustering, new multi-channel object-based spatial parameters compression approach (MOSPCA) is presented, through which the spatial parameters of intra-frame different frequency bands belonging to the same sound source can be compressed to one spatial parameter. In an experiment it is shown that the compression ratio of spatial parameter can reach 7:1 compared with the 1.4:1 of MPEG Surround and S3AC (spatial squeeze surround audio coding), while transparent spatial perception is maintained.