Dajiang Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dajiang Zhou is active.

Explore More

Publication

Featured researches published by Dajiang Zhou.

international conference on multimedia and expo | 2012

A Low-Complexity HEVC Intra Prediction Algorithm Based on Level and Mode Filtering

Heming Sun; Dajiang Zhou; Satoshi Goto

HEVC achieves a better coding efficiency relative to prior standards, but also involves increased complexity. For intra prediction, complexity is especially intensive due to a highly flexible coding unit structure and a large number of prediction modes. This paper presents a low-complexity intra prediction algorithm for HEVC. A fast preprocessing stage based on a simplified cost model is proposed. Based on its results, a level filtering scheme reduces the number of prediction unit levels that requires fine processing from 5 to 2. To supply level filtering decision with appropriate thresholds, a fast training method is also designed. A mode filtering scheme further reduces the maximum number of angular modes to be evaluated from 34 to 9. Complexity reduction from HM 3.0 is over 50% and stable for various sequences, which makes the proposed algorithm suitable for real-time applications. The corresponding bit rate increase is lower than 2.5%.

IEEE Journal of Solid-state Circuits | 2011

A 530 Mpixels/s 4096x2160@60fps H.264/AVC High Profile Video Decoder Chip

Dajiang Zhou; Jinjia Zhou; Xun He; Jiayi Zhu; Ji Kong; Peilin Liu; Satoshi Goto

The increased resolution of Quad Full High Definition (QFHD) offers significantly enhanced visual experience. However, the corresponding huge data throughput of up to 530 Mpixels/s greatly challenges the design of real-time video decoder VLSI with the extensive requirement on both DRAM bandwidth and computational power. In this work, a lossless frame recompression technique and a partial MB reordering scheme are proposed to save the DRAM access of a QFHD video decoder chip. Besides, pipelining and parallelization techniques such as NAL/slice-parallel entropy decoding are implemented to efficiently enhance its computational power. The chip supporting H.264/AVC high profile is fabricated in 90 nm CMOS and verified. It delivers a maximum throughput of 4096×2160@60fps, which is at least 4.3 times higher than the state-of-the-art. DRAM bandwidth requirement is reduced by typically 51%, which fits the design into a 64-bit LPDDR SDRAM interface and results in 58% DRAM power saving. Meanwhile, the core energy is saved by 54% by pipelining and parallelization.

international conference on acoustics, speech, and signal processing | 2012

An optimized MC interpolation architecture for HEVC

Zhengyan Guo; Dajiang Zhou; Satoshi Goto

In the latest draft video compression standard, HEVC, a new 8-tap MC interpolation filter is adopted. For this component, we propose an efficient VLSI design which is composed of a reconfigurable filter, an optimized pipeline engine organization, and a filter reuse scheme. This results in 30% area saving from a non-optimized design. The proposed implementation supports a maximal throughput of QFHD@60fps. Our results also demonstrate the implementation cost of a well optimized HEVC interpolation component can be comparable to that of H.264, despite of the enhanced coding performance.

symposium on vlsi circuits | 2010

A 530Mpixels/s 4096×2160@60fps H.264/AVC high profile video decoder chip

Dajiang Zhou; Jinjia Zhou; Xun He; Ji Kong; Jiayi Zhu; Peilin Liu; Satoshi Goto

An H.264/AVC HP video decoder is implemented in 90nm CMOS. Its maximum throughput reaches 4096×2160@60fps, which is at least 4.3x higher than the state-of-the-art. By using partial MB reordering and lossless frame recompression, 51% of DRAM bandwidth is reduced which results in 58% DRAM power saving. Meanwhile, various efficient parallelization techniques contribute to a core energy saving of 54%.

international symposium on circuits and systems | 2010

A lossless frame recompression scheme for reducing DRAM power in video encoding

Xuena Bao; Dajiang Zhou; Satoshi Goto

Owing to the huge bandwidth requirement, the DRAM power composes a significant portion of the power consumed by video coding system. In this paper, a lossless frame recompression scheme for reducing DRAM bandwidth is presented. The basic structure of the scheme which includes the memory organization, the data fetching strategy as well as the cache organization is proposed. Furthermore, an adaptive DPCM (Differential Pulse Code Modulation) scanning order selection is used and an efficient coding method that is suitable for compressing the DPCM samples in reference frames is also discussed. Experimental results show a 50%~60% saving of bandwidth on 720p and 1080p sequences, which indicates that the proposed scheme can be useful in reducing the system power through saving DRAM bandwidth.

IEEE Transactions on Very Large Scale Integration Systems | 2015

High-Throughput Power-Efficient VLSI Architecture of Fractional Motion Estimation for Ultra-HD HEVC Video Encoding

Gang He; Dajiang Zhou; Yunsong Li; Zhixiang Chen; Tianruo Zhang; Satoshi Goto

Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for 7680 × 4320 30 frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.

IEEE Transactions on Circuits and Systems for Video Technology | 2015

Ultra-High-Throughput VLSI Architecture of H.265/HEVC CABAC Encoder for UHDTV Applications

Dajiang Zhou; Jinjia Zhou; Wei Fei; Satoshi Goto

Ultra high definition television (UHDTV) imposes extremely high throughput requirement on video encoders based on High Efficiency Video Coding (H.265/HEVC) and Advanced Video Coding (H.264/AVC) standards. Context-adaptive binary arithmetic coding (CABAC) is the entropy coding component of these standards. In very-large-scale integration implementation, CABAC has known difficulties in being effectively pipelined and parallelized, due to the critical bin-to-bin data dependencies in its algorithm. This paper addresses the throughput requirement of CABAC encoding for UHDTV applications. The proposed optimizations including prenormalization, hybrid path coverage and lookahead rLPS to reduce the critical path delay of binary arithmetic encoding (BAE) by exploiting the incompleteness of data dependencies in rLPS updating. Meanwhile, the number of bins BAE delivers per clock cycle is increased by the proposed bypass bin splitting technique. The context modeling and binarization components are also optimized. As a result, our CABAC encoder delivers an average of 4.37 bins per clock cycle. Its maximum clock frequency reaches 420 MHz when synthesized in 90 nm. The corresponding overall throughput is 1836 Mbin/s that is 62.5% higher than the state-of-the-art architecture.

international conference on image processing | 2013

A combined SAO and de-blocking filter architecture for HEVC video decoder

Jiayi Zhu; Dajiang Zhou; Gang He; Satoshi Goto

The up-coming video compression standard, high efficiency video coding (HEVC), reduces 50% bit rates in encoding video sequences with same picture quality compared to H.264/AVC. In the in-loop filter (LF) part of HEVC, sample adaptive offset (SAO) is newly added and de-blocking filter (DBF) has been changed a lot. Thus how to construct a high speed and low cost VLSI architecture for HEVC SAO and de-blocking filter is a challenge. In this article, we propose a HEVC LF architecture composed of fully utilized de-blocking filter and SAO. Block based SAO and DBF are employed in this architecture to achieve seamless pipeline between them. The implementation results show that it can be synthesized to 240MHz with 65nm technology. Thus this solution can process 3.84G pixels/s and support 4320p(7680×4320)@120fps decoding.

IEEE Transactions on Multimedia | 2012

An Advanced Hierarchical Motion Estimation Scheme With Lossless Frame Recompression and Early-Level Termination for Beyond High-Definition Video Coding

Xuena Bao; Dajiang Zhou; Peilin Liu; Satoshi Goto

In this paper, we present a hardware-efficient fast algorithm with a lossless frame recompression scheme and early-level termination strategy for large search range (SR) motion estimation (ME) utilized in beyond high-definition video encoder. To achieve high ME quality for hierarchical motion search, we propose an advanced hierarchical ME scheme which processes the multiresolution motion search with an efficient refining stage. This enables high data and hardware reuse for much lower bandwidth and memory cost, while achieving higher ME quality than previous works. In addition, a lossless frame recompression scheme based on this ME algorithm is presented to further reduce bandwidth. A hierarchical memory organization as well as a leveling two-step data fetching strategy is applied to meet constraint of random access for hierarchical motion search structure. Also, the leveling compression strategy by allowing a lower level to refer to a higher one for compression is proposed to efficiently reduce the bandwidth. Furthermore, an early-level termination method suitable for hierarchical ME structure is also applied. This method terminates high-level redundant motion searches by establishing thresholds based on current block mode and motion search level; it also applies the early refinement termination in order to avoid unnecessary refinement for high levels. Experimental results show that the total scheme has a much lower bit rate increasing compared with previous works especially for high motion sequences, while achieving a considerable saving of memory and bandwidth cost for large SR of [-128,127].

IEEE Journal of Solid-state Circuits | 2014

A 1.59 Gpixel/s Motion Estimation Processor With

Dajiang Zhou; Jinjia Zhou; Gang He; Satoshi Goto

3840 × 2160 and 7680 × 4320 UHDTV formats deliver remarkably enhanced visual experience relative to high definition but in the meanwhile involve huge complexity and memory bandwidth requirements in video encoding. Especially, enlarged motion distances of UHDTV lead to additional difficulties in the implementation of motion estimation, which is originally the most critical bottleneck of an encoder. This paper presents a motion estimation processor design for H.264/AVC. A test chip is implemented in 40 nm CMOS. With algorithm and architecture co-optimization, the processor delivers a maximum throughput of 1.59 Gpixel/s for 7680 × 4320 48 fps video, at least 7.5 times faster than previous designs. The corresponding core power dissipation is 622 mW at 210 MHz, with energy efficiency improved by at least 23%. The chips DRAM bandwidth requirement is also 68% lower than previous chips. With a maximum search range of ±211 (horizontal) by ±106 (vertical) around a predictive search center, the proposed motion estimation processor well accommodates the high motion of UHDTV.

Explore More