Falei Luo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Falei Luo is active.

Explore More

Publication

Featured researches published by Falei Luo.

Journal of Visual Communication and Image Representation | 2016

Low complexity encoder optimization for HEVC

Shanshe Wang; Falei Luo; Siwei Ma; Xiang Zhang; Shiqi Wang; Debin Zhao; Wen Gao

Considering the content characteristic of different PU, an optimization scheme is proposed.Based on the R-D cost distribution, an early termination scheme of coding unit (CU) splitting is proposed.Based on the reference frame distribution, an adaptive reference frame selection scheme is proposed. High Efficiency Video Coding (HEVC) improved the coding efficiency significantly. Compared to its predecessor H.264/AVC, it can provide equivalent subjective quality with more than 57% bit rate reduction. However, the improvement on coding efficiency is obtained at the expense of much more intensive computation complexity. In this paper, based on an overall analysis of computation complexity at the HEVC encoder, a low complexity encoder optimization scheme is proposed by reducing the number of available candidates for evaluation in terms of the intra prediction mode decision, early termination of coding unit (CU) splitting and adaptive reference frame selection. With the proposed scheme, the rate distortion optimization (RDO) technique of HEVC can be implemented in a low-complexity way for complexity-constrained encoders. Experimental results demonstrate that, compared with the original HEVC reference encoder implementation, the proposed optimization scheme can achieve more than 40% complexity reduction on average with coding performance degradation as only 0.43% which can be ignorable.

international symposium on circuits and systems | 2015

Multiple layer parallel motion estimation on GPU for High Efficiency Video Coding (HEVC)

Falei Luo; Siwei Ma; Juncheng Ma; Honggang Qi; Li Su; Wen Gao

This paper provides a multiple-layer parallel motion estimation (ME) scheme implemented on GPU for High Efficiency Video Coding (HEVC). The scheme is hierarchically structured, including four layers: coding tree unit (CTU), prediction unit (PU), motion vector (MV) selection and instruction optimization. In PU-layer, costs of various PU sizes were obtained through a SAD (sum of absolute differences) look-up table instead of progressive cost merging. And during MV selection, GPUs comparison instruction was used to avoid branches. At the same time, concurrent CTUs processing and SIMD (Single Instruction, Multiple Data) optimization also improve the performance significantly. Experimental results show that the proposed scheme can take full advantage of GPU and achieves over 90 times speedup compared with the HM10.0 using fast ME.

visual communications and image processing | 2014

Flexible CTU-level parallel motion estimation by CPU and GPU pipeline for HEVC

Juncheng Ma; Falei Luo; Shanshe Wang; Siwei Ma

In the high efficiency video coding (HEVC) encoder, motion estimation (ME) takes up more than 50% encoding time. To reduce the complexity of the ME module in HEVC, this paper proposes a flexible coding tree unit (CTU)-level parallel ME method through CPU and GPU pipeline collaboration. Firstly a highly scalable CTU-level parallel motion search scheme on GPU is provided, in which, the parallel CTU group can be configured to be any size to adapt to the variable sequence resolution and hardware configurations. Then, the motion search range can be adaptively adjusted based on the motion intensity. Therefore, the unnecessary GPU time wasting can be further avoided for slow-moving scenes, while high performance kept for fast-moving scenes. Moreover, the ME information returned from GPU can be used by CPU for fast mode decision. Experimental results show that the proposed method achieves up to 73% complexity reduction than HM10.0 anchor using CPU only with acceptable coding performance loss, providing higher performance than the state-of-the-art scheme.

international symposium on circuits and systems | 2017

Fast intra coding unit size decision for HEVC with GPU based keypoint detection

Falei Luo; Shanshe Wang; Siwei Ma; Nan Zhang; Yun Zhou; Wen Gao

In this paper, a fast intra Coding Unit (CU) size decision framework based on keypoint detection on Graphic Processing Unit (GPU) is proposed. In this framework, firstly the original frames are sent to GPU and then keypoint detection is conducted with numerous threads, which is able to avoid bringing in additional computational complexity even in realtime systems. Then, based on the keypoint distribution, whether to split the CU to the next coding depth is efficiently predicted. Experiments show that the proposed algorithm can achieve over 25% time saving under all intra (AI) configuration with ignorable performance loss.

international symposium on circuits and systems | 2016

GPU based sample adaptive offset parameter decision and perceptual optimization for HEVC

Falei Luo; Shanshe Wang; Nan Zhang; Siwei Ma; Wen Gao

In this paper, a graphics processing unit (GPU) based sample adaptive offset (SAO) parameters decision scheme is proposed for High Efficiency Video Coding (HEVC). Then, in order to further improve the performance of SAO, a perceptual based optimization scheme is provided according to the adjustment of Lagrange multiplier aiming to improve the subjective performance of SAO. Experimental results demonstrate that the proposed GPU based SAO parameter decision scheme can achieve average 0.76% and 0.78% BD-rate gain in terms of PSNR (Peak Signal to Noise Ratio) and SSIM (Structure Similarity) respectively. Combined with the perceptual optimization scheme, the maximum BD-rate gain in terms of PSNR and SSIM can be up to 1.77% and 3.3% with the average as 1.23% and 1.37%. Moreover, much computation complexity of SAO can be distributed to GPU.

visual communications and image processing | 2015

Adaptive motion vector resolution prediction in block-based video coding

Zhao Wang; Juncheng Ma; Falei Luo; Siwei Ma

In the classical block-based video coding, motion vector is derived for each coding block to remove the inter-frame redundancy. However, the motion vector resolution is usually restricted to be identical, typically 1/4-pixel resolution, regardless of the different video contents. In this paper, we propose an algorithm that can adaptively select the optimal motion vector resolution at frame level according to the characteristics of the video contents. We first derived a residual energy model, and the major factors that may impact the motion vector resolution are considered, including the texture complexity, motion scale, inter-frame noise and quantization parameter. Experimental results have shown that the proposed scheme can achieve 1.8% BD-rate gain on average without complexity increment.

visual communications and image processing | 2015

Parallel intra coding for HEVC on CPU plus GPU platform

Juncheng Ma; Falei Luo; Shanshe Wang; Nan Zhang; Siwei Ma

In High Efficiency Video Coding (HEVC), the intra coding performance is significantly improved due to the recursive splitting structure and up to 35 intra prediction modes. However, the computational complexity of intra coding increases largely as well. In this paper, a fast intra coding scheme is proposed based on CPU and GPU cooperation. Firstly, the intra prediction of variable blocks is performed in parallel on multi-cores GPU. Secondly, the intra prediction mode with minimum Sum of Absolute Difference (SAD) cost is selected and transmitted to the host CPU. Instead of exhaustively searching all the intra modes in Rough Mode Decision (RMD) process, the mode returned by the GPU is directly selected. Lastly, the texture gradient of each coding unit (CU) is assessed during parallel intra prediction, then used by the CPU for fast CU size decision. Experiment results show that the proposed parallel intra coding method achieves up to 62% complexity reduction with acceptable coding performance loss.

international symposium on circuits and systems | 2017

An adaptive and low-complexity all-zero block detection for HEVC encoder

Jing Cui; Ruiqin Xiong; Falei Luo; Shanshe Wang; Siwei Ma

To improve the detection accuracy of SAD or SATD based threshold and save the time cost of RDO determination, we proposed an All Zero Block (AZB) detection method by adaptively searching the maximum transform coefficient amplitude in low frequency of TU after conventional SATD detection was failed. The experimental results show that our algorithm can achieve around 39% transform and quantization time-saving with only 0.1% on average RD performance reduction. The detection accuracy of larger TU size, i.e. 16×16 and 32 × 32, can reach up to about 95% on average.

ieee international conference on multimedia big data | 2017

Content Based Fast Intra Coding for AVS2

Junru Li; Falei Luo; Yun Zhou; Shanshe Wang; Meng Wang; Siwei Ma

AVS2 is the new generation of video coding standard developed by the Audio Video Coding Standard Working Group of China. In analogies to HEVC/H.265, a more flexible CU (coding unit), PU (prediction unit) and TU (transform unit) based coding structure is adopted to represent and organize the coding data. For intra coding, AVS2 adopts quad tree based partition coding structure on CU level, and one CU can be alternatively split into four PUs, which is known as short distance intra prediction (SDIP). SDIP can improve the coding performance but bring in great complexity. In this paper, we propose a content based fast intra coding optimization algorithm to reduce the encoding time for AVS2 all intra coding. Firstly, the content flatness (CF) of the coding block is computed. Based on the achieved content flatness, whether to split the current coding block into the next coding depth and whether to perform SDIP can be adaptively determined. Secondly, the rough mode set (RMS) is adaptively selected according to the achieved CF. Thus the number of mode for rate distortion optimization (RDO) can be significantly reduced. Experimental results show that the proposed fast algorithm can achieve over 43% complexity reduction on average under all intra testing configuration, while the average efficiency loss is negligible.

visual communications and image processing | 2016

A novel mode decision for depth map coding in 3D-AVS

Jing Su; Falei Luo; Shanshe Wang; Shiqi Wang; Xiaoqiang Guo; Siwei Ma

In this paper, a new mode decision scheme is proposed for depth map coding in 3D-AVS. The novelty of the paper mainly contains the following two points. Firstly, an improved distortion estimation model of synthesized views is proposed. Secondly, for the mode decision of depth map coding, the distortion is represented to be the weighted sum of depth distortion and estimated distortion of the synthesized view. We proposed a new scheme to derive the weighting factors adaptively based on the disparity. Then the distortion is utilized to calculate the rate distortion cost for mode decision. Experimental results demonstrate that the proposed scheme achieves remarkable performance improvement in 3D-AVS. The average BD-rate gain is about 12%.

Explore More