Haitong Ge | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haitong Ge is active.

Explore More

Publication

Featured researches published by Haitong Ge.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013

Performance Estimation Techniques With MPSoC Transaction-Accurate Models

De Ma; Rongjie Yan; Kai Huang; Min Yu; Siwen Xiu; Haitong Ge; Xiaolang Yan; Ahmed Amine Jerraya

Efficient design of multiprocessor system-on-chip (MPSoC) requires early, fast, and accurate performance estimation techniques. In this paper, we present new techniques based on fine-grained code analysis to estimate accurate performance during simulation of MPSoC transaction accurate models. First, a GCC profiling tool is applied in the native simulation process. Based on the profiling result, an instruction analyzer of the target CPU architecture is proposed to analyze the cycle cost of C code under estimation. In addition, a memory analyzer is used to further estimate memory access latency including both instruction/data cache time cost and global memory access cycles. Both data and instruction cache models are proposed to estimate cache miss penalty, and a segment-based strategy is adopted to update the cache models more efficiently. Furthermore, an equalized access model is presented to imitate the memory access behavior of processors for estimating global memory access latency caused by bus contention and memory bandwidth. We have applied these techniques on an H.264 decoder application with different hardware architectures. The experimental results show that applying these techniques can obviously improve estimation accuracy of transaction accurate models close to that of the virtual prototype models, with a tolerable overhead on simulation speed.

application specific systems architectures and processors | 2010

A high efficient memory architecture for H.264/AVC motion compensation

Chunshu Li; Kai Huang; Xiaolang Yan; Jiong Feng; De Ma; Haitong Ge

In H.264/AVC decoding system, motion compensation operation occupies about 80% of the total memory access and becomes the system bottleneck. In this paper, a high efficient memory architecture for H.264/AVC motion compensation is proposed to extremely reduce external memory access bandwidth. A four-level hierarchical memory organization scheme is utilized to explore the reusability of neighboring blocks at an acceptable area cost. To improve the system processing throughput, five optimization techniques are adopted in motion compensation operation, which enable video decoder to achieve real-time decoding of HD 1080p video stream when operating at 110 MHz. Compared with the existing works, the proposed architecture is able to reduce the memory bandwidth requirement in motion compensation progress by 83.7% and performs better in the real-time application.

Journal of Zhejiang University Science C | 2013

High throughput VLSI architecture for H.264/AVC context-based adaptive binary arithmetic coding (CABAC) decoding

Kai Huang; De Ma; Rong-jie Yan; Haitong Ge; Xiaolang Yan

Context-based adaptive binary arithmetic coding (CABAC) is the major entropy-coding algorithm employed in H.264/AVC. In this paper, we present a new VLSI architecture design for an H.264/AVC CABAC decoder, which optimizes both decode decision and decode bypass engines for high throughput, and improves context model allocation for efficient external memory access. Based on the fact that the most possible symbol (MPS) branch is much simpler than the least possible symbol (LPS) branch, a newly organized decode decision engine consisting of two serially concatenated MPS branches and one LPS branch is proposed to achieve better parallelism at lower timing path cost. A look-ahead context index (ctxIdx) calculation mechanism is designed to provide the context model for the second MPS branch. A head-zero detector is proposed to improve the performance of the decode bypass engine according to UEGk encoding features. In addition, to lower the frequency of memory access, we reorganize the context models in external memory and use three circular buffers to cache the context models, neighboring information, and bit stream, respectively. A pre-fetching mechanism with a prediction scheme is adopted to load the corresponding content to a circular buffer to hide external memory latency. Experimental results show that our design can operate at 250 MHz with a 20.71k gate count in SMIC18 silicon technology, and that it achieves an average data decoding rate of 1.5 bins/cycle.

international conference on solid state and integrated circuits technology | 2006

GEM-SOC: A RISC/DSP dual-core platform for portable media applications

Ye Yang; Jian Yang; Xing Qin; Kai Huang; Peiyong Zhang; Haitong Ge; Xiaolang Yan

A RISC/DSP dual-core platform for portable media application named GEM-SOC is presented in this paper. RISC is responsible for control related tasks and running OS, DSP is in charge of media applications and works as a slave for RISC. Mailbox based shared-memory inter-core communication mechanism is used in this platform. The GEM-SOC is fabricated in a 0.18 mum 6LM SMIC standard cell technology, occupies about 25 mm2, operates at 166 MHz, and consumes 0.6 Watts. Ogg Vorbis audio application has been developed on this platform and results show that GEM-SOC can achieve real time decode at 37.68MHz with communication overhead at 2.64-2.77%

Archive | 2009