Don Xie
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Don Xie.
IEEE Transactions on Very Large Scale Integration Systems | 2009
Peng Zhang; Don Xie; Wen Gao
This paper presents an efficient VLSI architecture for H.264/AVC content-adaptive binary arithmetic code (CABAC) decoding. We introduce several new techniques to maximize the parallelism of the decoding process, including variable-bin-rate strategy, multiple-bin arithmetic decoding, and efficient probability propagation scheme. The CABAC engine can ensure the real-time decoding for H.264/AVC main profile HD level 4.0. Synthesis results show that the multi-bin decoder can be operated up to 45 MHz, and the total logic area is only 42 K gates when targeted at TSMCs 0.18-mum process.
international conference on consumer electronics | 2007
Peng Zhang; Wen Gao; Don Xie; Di Wu
This paper presents an efficient VLSI architecture for H.264/AVC CABAC decoding. We introduce several new techniques to extremely exploit, to the largest extent possible, the parallelism of the decoding process, including line-bit-rate decoding, multiple bin arithmetic decoding and efficient probability propagation scheme. The CABAC engine can ensure the real-time decoding for H.264/AVC main profile HD level 4.0. synthesis results show that the multi-bin decoder can run up to 45 MHz, and the total area is only 42K gates.
IEEE Transactions on Consumer Electronics | 2006
Bin Sheng; Wen Gao; Don Xie; Di Wu
In this paper, we present a VLSI design of variable length code decoder for AVS video standard. As a co-processor of a RISC CPU, the design can decode fixed length code, unsigned or signed k-th Exp-Golomb code, and AVS 2-D variable length code. Furthermore, it has a pre-processing submodule, which can perform start code detection and de-stuffing for the input bitstream. The proposed architecture has been described in Verilog HDL, simulated with VCS digital simulator, and implemented using 0.18 /spl mu/ Artisan CMOS cells library by synopsys design compiler. The circuit costs about 15k equivalent logic gates (not including 4 kb on-chip SRAM). And the critical path is less than 6 ns in the worst case. This design has been implemented in a single chip AVS HDTV decoder, AVS101, which can support real-time decoding for NTSC, PAL, 720p 60 frames/s or 1080i 60 fields/s programs. Although the architecture was originally designed for AVS video standard, it can be easily adapted to other coding standards.
international conference on multimedia and expo | 2006
Peng Zhang; Wen Gao; Di Wu; Don Xie
This paper proposes an efficient reference frame storage scheme for HDTV VLSI decoder to reduce external memory bandwidth requirement. The proposed scheme consists of the pixel duplication mechanism and the L-C (luma-chroma) correlated mapping method. Pixel duplication completely eliminates the possibility of an access crossing word boundary and therefore substantially increases the memory bandwidth efficiency. L-C correlated mapping exploits address relationships between the luma and chroma reference pixels and largely reduces bank conflict overhead of memory accesses. The two mechanisms combined together efficiently improve the bandwidth usage: up to 47% bandwidth in worst case is saved compared with the previous schemes, and 25% in average case
IEEE Transactions on Consumer Electronics | 2006
Huizhu Jia; Peng Zhang; Don Xie; Wen Gao
In this paper, we propose an optimized real-time AVS (a Chinese next-generation audio/video coding standard) HDTV video decoder. The decoder has been implemented in a single SoC with HW/SW partitioning. AVS algorithms and complexity are first analyzed. Based on the analysis, a hardware implementation of the MB level 7-stage pipeline is selected. The software tasks are realized with a 32-bit RISC processor. We further propose the optimization of interface and RISC processor based on the proposed architecture. The AVS decoder (RISC processor and hardware accelerators) is described in high-level Verilog/VHDL hardware description language and implemented in a single-chip AVS HDTV real-time decoder. At 148.5 MHz working frequency, the decoder chip can support real-time decoding of NTSC, PAL or HDTV (720p@60 frames/s or 1080i@60 fields/s) bit-streams. Finally, the decoder has been fully tested on a prototyping board
international symposium on circuits and systems | 2010
Hai Bing Yin; Honggang Qi; Huizhu Jia; Don Xie; Wen Gao
In traditional four-stage pipeline structures for H.264 video encoder hardware implementation, rate distortion optimization (RDO) based mode decision was turned off, and dual-port or ping-pang on-chip search window SRAM was used to achieve data reuse between the integer and fractional pixel motion estimation. To support RDO based mode decision for efficient high definition AVS video coding implementation, we propose an improved four-stage MB pipeline structure. Also on-chip buffer structure is optimized to achieve the balance between circuit consumption and coding performance. The Jizhun profile AVS video encoder is successfully mapped into hardware implementation with the proposed pipeline structure with small performance degradation.
advances in multimedia | 2007
Junhao Zheng; David Wu; Don Xie; Wen Gao
H.264/AVC is the newest international video coding standard. This paper presents a novel hardware design for CABAC decoding in H.264/AVC. CABAC is the key innovative technology, but it brings huge challenge for high throughput implementation. The current bin decoding depends on the previous bin, which results in the long latency and limits the system performance. In this paper, the data hazards are analyzed and resolved using the algorithmic features. We present a new pipeline-based architecture using the standard look-ahead technique where the arithmetic decoding engine works in parallel with the context maintainer. An efficient finite state machine is developed to match the requirement of the pipeline controlling and the critical path is optimized for the timing. The proposed implementation can generate one bin per clock cycle at the 160-MHz working frequency.
advances in multimedia | 2006
Junhao Zheng; Di Wu; Lei Deng; Don Xie; Wen Gao
In the advanced Audio Video coding Standard (AVS), many efficient coding tools are adopted in motion compensation, such as new motion vector prediction, direct mode matching, variable block-sizes etc. However, these features enormously increase the computational complexity and the memory bandwidth requirement and make the traditional MV predictor more complicated. This paper proposes an efficient MV predictor architecture for both AVS and MPEG-2 decoder. The proposed architecture exploits the parallelism to accelerate the speed of operations and uses the dedicated design to optimize the memory access. In addition, it can reuse the on-chip buffer to support the MV error-resilience for MPEG-2 decoding. The design has been described in Verilog HDL and synthesized using 0.1 8μm CMOS cells library by Design Compiler. The circuit costs about 62k logic gates when the working frequency is set to 148.5MHz. This design can support the real-time MV predictor of HDTV 1080i video decoding for both AVS and MPEG-2.
visual communications and image processing | 2014
Xiaofeng Huang; Huizhu Jia; Kaijin Wei; Jie Liu; Chuang Zhu; Zhengguang Lv; Don Xie
The emerging high efficiency video coding standard (HEVC) achieves significantly better coding efficiency than all existing video coding standards. The quad tree structured coding unit (CU) is adopted in HEVC to improve the compression efficiency, but this causes a very high computational complexity because it exhausts all the combinations of the prediction unit (PU) and transform unit (TU) in every CU attempt. In order to alleviate the computational burden in HEVC intra coding, a fast CU depth decision algorithm is proposed in this paper. The CU texture complexity and the correlation between the current CU and neighbouring CUs are adaptively taken into consideration for the decision of the CU split and the CU depth search range. Experimental results show that the proposed scheme provides 39.3% encoder time savings on average compared to the default encoding scheme in HM-RExt-13.0 with only 0.6% BDBR penalty in coding performance.
international conference on multimedia and expo | 2012
Kaijin Wei; Rongwei Zhou; Shanghang Zhang; Huizhu Jia; Don Xie; Wen Gao
In a hardware video encoder, Level C+ data reuse for motion estimation can reuse two-dimensional overlapped search window (SW) and thus is a good choice to trade off the memory bandwidth with the on-chip buffer size. However, the irregular zigzag coding order brings some other troubles to the encoder implementation. This paper mainly focuses on the special considerations for a Level C+ zigzag encoder. First we present a guideline about how to select the Level C+ zigzag HFmVn scan for the adopted encoder pipeline. Second, according to the guideline, zigzag HF5V3 coding order is applied into our Level C+ encoder in which a new function is added to alter zigzag bit-stream into standard raster order and exact motion vector predictor (MVP) can be used for most macro blocks (MBs) except some corner MBs to increase the coding performance. Third, zigzag-aware scheduling for prefetching the SW is proposed so that the pipeline will never be disturbed by this irregular coding order and can smoothly run MB by MB. In addition, balancing the bandwidth into each MB processing period can improve the bandwidth utilization. With these techniques, a real-time high-definition (HD) 1080P AVS encoder is successfully implemented on FPGA verification board with search range [-128, 128]×[-96, 96] and two reference frames at an operating frequency of 160 MHz.