Mai Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mai Xu is active.

Explore More

Publication

Featured researches published by Mai Xu.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Subjective-Driven Complexity Control Approach for HEVC

Xin Deng; Mai Xu; Lai Jiang; Xiaoyan Sun; Zulin Wang

The latest High Efficiency Video Coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency, compared with the preceding H.264/Advanced Video Coding (AVC) standard. In this paper, we present a novel subjective-driven complexity control (SCC) approach to reduce and control the encoding complexity of HEVC. Through reasonably adjusting the maximum depth of each largest coding unit (LCU), the encoding complexity can be reduced to a target level with minimal visual distortion. Specifically, the maximum depths of different LCUs can be varied through solving the proposed optimization formulation of complexity control, based on two explored relationships: 1) the relationship between the maximum depth and encoding complexity and 2) the relationship between the maximum depth and visual distortion. Besides, the subjective visual quality is favored with a novel subjective-driven constraint imposed in the formulation, on the basis of a visual attention model. Finally, the experimental results show that our approach can achieve a wide range of encoding complexity control (as low as 20%) for HEVC, with the smallest complexity bias being 0.2%. Meanwhile, our SCC approach outperforms other two state-of-the-art complexity control approaches, in terms of both control accuracy and visual quality.

international conference on computer vision | 2015

Learning to Predict Saliency on Face Images

Mai Xu; Yun Ren; Zulin Wang

This paper proposes a novel method, which learns to detect saliency of face images. To be more specific, we obtain a database of eye tracking over extensive face images, via conducting an eye tracking experiment. With analysis on eye tracking database, we verify that the fixations tend to cluster around facial features, when viewing images with large faces. For modeling attention on faces and facial features, the proposed method learns the Gaussian mixture model (GMM) distribution from the fixations of eye tracking data as the top-down features for saliency detection of face images. Then, in our method, the top-down features (i.e., face and facial features) upon the the learnt GMM are linearly combined with the conventional bottom-up features (i.e., color, intensity, and orientation), for saliency detection. In the linear combination, we argue that the weights corresponding to top-down feature channels depend on the face size in images, and the relationship between the weights and face size is thus investigated via learning from the training eye tracking data. Finally, experimental results show that our learning-based method is able to advance state-of-the-art saliency prediction for face images. The corresponding database and code are available online: www.ee.buaa.edu.cn/xumfiles/saliency_detection.html.

IEEE Transactions on Image Processing | 2017

Learning to Detect Video Saliency With HEVC Features

Mai Xu; Lai Jiang; Xiaoyan Sun; Zhaoting Ye; Zulin Wang

Saliency detection has been widely studied to predict human fixations, with various applications in computer vision and image processing. For saliency detection, we argue in this paper that the state-of-the-art High Efficiency Video Coding (HEVC) standard can be used to generate the useful features in compressed domain. Therefore, this paper proposes to learn the video saliency model, with regard to HEVC features. First, we establish an eye tracking database for video saliency detection, which can be downloaded from https://github.com/remega/video_database. Through the statistical analysis on our eye tracking database, we find out that human fixations tend to fall into the regions with large-valued HEVC features on splitting depth, bit allocation, and motion vector (MV). In addition, three observations are obtained with the further analysis on our eye tracking database. Accordingly, several features in HEVC domain are proposed on the basis of splitting depth, bit allocation, and MV. Next, a kind of support vector machine is learned to integrate those HEVC features together, for video saliency detection. Since almost all video data are stored in the compressed form, our method is able to avoid both the computational cost on decoding and the storage cost on raw data. More importantly, experimental results show that the proposed method is superior to other state-of-the-art saliency detection methods, either in compressed or uncompressed domain.Saliency detection has been widely studied to predict human fixations, with various applications in computer vision and image processing. For saliency detection, we argue in this paper that the state-of-the-art High Efficiency Video Coding (HEVC) standard can be used to generate the useful features in compressed domain. Therefore, this paper proposes to learn the video saliency model, with regard to HEVC features. First, we establish an eye tracking database for video saliency detection, which can be downloaded from https://github.com/remega/video_database. Through the statistical analysis on our eye tracking database, we find out that human fixations tend to fall into the regions with large-valued HEVC features on splitting depth, bit allocation, and motion vector (MV). In addition, three observations are obtained with the further analysis on our eye tracking database. Accordingly, several features in HEVC domain are proposed on the basis of splitting depth, bit allocation, and MV. Next, a kind of support vector machine is learned to integrate those HEVC features together, for video saliency detection. Since almost all video data are stored in the compressed form, our method is able to avoid both the computational cost on decoding and the storage cost on raw data. More importantly, experimental results show that the proposed method is superior to other state-of-the-art saliency detection methods, either in compressed or uncompressed domain.

Pattern Recognition | 2016

Bottom-up saliency detection with sparse representation of learnt texture atoms

Mai Xu; Lai Jiang; Zhaoting Ye; Zulin Wang

This paper proposes a saliency detection method by exploring a novel low level feature on sparse representation of learnt texture atoms (SR-LTA). The learnt texture atoms are encoded in salient and non-salient dictionaries. For salient dictionary, a formulation is proposed to learn salient texture atoms from image patches attracting extensive attention. Then, the online salient dictionary learning (OSDL) algorithm is presented to solve the proposed formulation. Similarly, the non-salient dictionary is learnt from image patches without any attention. Then, the pixel-wise SR-LTA feature is yielded based on the difference of sparse representation errors, regarding the learnt salient and non-salient dictionaries. Finally, image saliency can be predicted by linearly combining the proposed SR-LTA feature and conventional features, luminance and contrast. For the linear combination, the weights of different feature channels are determined by least square estimation on the training data. The experimental results show that our method outperforms 9 state-of-the-art methods for bottom-up saliency detection. HighlightsWe develop the OSDL algorithm for learning the salient and non-salient dictionaries.We propose the SR-LTA feature for bottom-up saliency detection, in light of the learnt salient and non-salient dictionaries.We validate that the proposed SR-LTA feature can advance state-of-the-art saliency detection on natural images.

international conference on multimedia and expo | 2015

A novel method on optimal bit allocation at LCU level for rate control in HEVC

Shengxi Li; Mai Xu; Zulin Wang

In this paper, we propose a new method, namely recursive Taylor expansion (RTE) method, for optimally allocating bits to each LCU in the R-λ rate control scheme for HEVC. Specifically, we first set up an optimization formulation on optimal bit allocation. Unfortunately, it is intractable to achieve a closed-form solution for this formulation. We therefore propose a RTE solution to iteratively solve the formulation with a fast convergence speed. Then, an approximate closed-form solution can be obtained. This way, the optimal bit allocation can be achieved at little encoding complexity cost. Finally, the experimental results validate the effectiveness of our method in three aspects: compressed distortion, bit-rate control error, and bit fluctuation.

IEEE Transactions on Multimedia | 2018

Closed-Form Optimization on Saliency-Guided Image Compression for HEVC-MSP

Shengxi Li; Mai Xu; Yun Ren; Zulin Wang

High efficiency video coding (HEVC) is the latest video coding standard, and it has the best performance among all the existing standards. HEVC main still picture profile (HEVC-MSP) also achieves top performance in image compr-ession. In this paper, we propose a closed-form bit allocation approach to optimize the saliency-guided PSNR (viewed as perceptual distortion) such that the coding efficiency of HEVC-based image compression can be significantly improved from a subjective perspective. Specifically, a bit allocation formulation is established to minimize perceptual distortion with a constraint on bit-rates. Then, this formulation is solved using the proposed recursive Taylor expansion method with a closed-form solution. On the basis of our solution, a bit allocation and re-allocation process is developed in our approach to minimize perceptual distortion, meanwhile accurately controlling bit-rates. In addition, we provide both theoretical and numerical analyses of the computational complexity, verifying the little extra time cost of our approach. The experimental results demonstrate the superior performance of our approach over the state-of-the-art HEVC-MSP, and the BD-rate savings are approximately 40% and 24% for face and generic images, respectively.

international conference on multimedia and expo | 2017

Decoder-side HEVC quality enhancement with scalable convolutional neural network

Ren Yang; Mai Xu; Zulin Wang

The latest High Efficiency Video Coding (HEVC) has been increasingly used to generate video streams over Internet. However, the decoded HEVC video streams may incur severe quality degradation, especially at low bit-rates. Thus, it is necessary to enhance visual quality of HEVC videos at the decoder side. To this end, we propose in this paper a Decoder-side Scalable Convolutional Neural Network (DS-CNN) approach to achieve quality enhancement for HEVC, which does not require any modification of the encoder. In particular, our DS-CNN approach learns a model of Convo-lutional Neural Network (CNN) to reduce distortion of both I and B/P frames in HEVC. It is different from the existing CNN-based quality enhancement approaches, which only handle intra coding distortion, thus not suitable for B/P frames. Furthermore, a scalable structure is included in our DS-CNN, suchthat the computational complexity of our DS-CNN approach is adjustable to the changing computational resources. Finally, the experimental results show the effectiveness of our DS-CNN approach in enhancing quality for both I and B/P frames of HEVC.

international conference on multimedia and expo | 2017

A subjective visual quality assessment method of panoramic videos

Mai Xu; Chen Li; Yufan Liu; Xin Deng; Jiaxin Lu

Different from 2-dimensional (2D) videos, panoramic videos contain spherical viewing direction with the support of head-mounted displays, thus improving immersive and interactive visual experience. Unfortunately, to our best knowledge, there are few subjective visual quality assessment (VQA) methods for panoramic videos. In this paper, we therefore propose a subjective VQA method for assessing quality loss of impaired panoramic videos. Specifically, we first establish a database containing viewing direction data of several subjects on watching panoramic videos. Then, we find out that there exists high consistency of viewing direction on panoramic videos across different subjects. Upon this finding, we present a procedure of subjective test in measuring quality of panoramic videos by different subjects, yielding different mean opinion score (DMOS). To couple with inconsistency of viewing directions on panoramic videos, we further propose a vectorized DMOS metric. Finally, experimental results verify that our subjective VQA method, in the forms of both overall and vectorized DMOS metrics, is effective in measuring subjective quality of panoramic videos.

international conference on multimedia and expo | 2014

A novel weight-based URQ scheme for perceptual video coding of conversational video in HEVC

Shengxi Li; Mai Xu; Xin Deng; Zulin Wang

In this paper, we propose a novel weight-based unified rate-quantization (URQ) scheme for rate control in state-of-the-art HEVC standard, to improve its perceived visual quality for conversational videos. In conventional rate control of HEVC, a pixel-wise URQ scheme is proposed by introducing the concept of bit per pixel (bpp). This scheme is able to assign different amounts of bits to the blocks with various sizes, thus well suitable for flexible picture partition of HEVC. However, bpp does not reflect the visual importance of each pixel. Therefore, we propose a novel weight-based URQ scheme to take into account the visual importance for rate control in HEVC. In combination with the weight map acquired from a novel hierarchical perceptual model of face, such a scheme is capable of allocating more bits to the face and much more bits to the facial features, by using bit per weight (bpw) instead of bpp. As a result, the visual quality of face, especially facial features, can be improved such that perceptual video coding is achieved for HEVC. Finally, the experimental results validate such improvement.

international conference on multimedia and expo | 2017

A novel rate control scheme for panoramic video coding

Yufan Liu; Mai Xu; Chen Li; Shengxi Li; Zulin Wang

The popularity of multi-view panoramic videos has been considerably increased for producing Virtual Reality (VR) content, due to its immersive visual experience. We argue in this paper that PSNR is less effective in assessing visual quality of compressed panoramic videos than Sphere-based PSNR (S-PNSR), in which sphere-to-plain mapping of panoramic videos is considered. Thus, the conventional rate control (R-C) schemes of 2-Dimensional (2D) video coding, which optimize on PSNR, are not suitable for panoramic video coding. To optimize S-PSNR, we propose in this paper a novel RC scheme for panoramic video coding. Specifically, we develop an S-PSNR optimization formulation with constraint on bit-rate. Then, a solution is provided to the developed formulation, such that bits can be allocated to each coding block for achieving optimal S-PSNR in panoramic video coding. Finally, the experiment results validate the effectiveness of the proposed RC scheme in improving S-PSNR of panoramic video coding.

Explore More