Tung-Chien Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tung-Chien Chen is active.

Explore More

Publication

Featured researches published by Tung-Chien Chen.

IEEE Transactions on Circuits and Systems for Video Technology | 2005

Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder

Yu-Wen Huang; Bing-Yu Hsieh; Tung-Chien Chen; Liang-Gee Chen

Intra prediction with rate-distortion constrained mode decision is the most important technology in H.264/AVC intra frame coder, which is competitive with the latest image coding standard JPEG2000, in terms of both coding performance and computational complexity. The predictor generation engine for intra prediction and the transform engine for mode decision are critical because the operations require a lot of memory access and occupy 80% of the computation time of the entire intra compression process. A low cost general purpose processor cannot process these operations in real time. In this paper, we proposed two solutions for platform-based design of H.264/AVC intra frame coder. One solution is a software implementation targeted at low-end applications. Context-based decimation of unlikely candidates, subsampling of matching operations, bit-width truncation to reduce the computations, and interleaved full-search/partial-search strategy to stop the error propagation and to maintain the image quality, are proposed and combined as our fast algorithm. Experimental results show that our method can reduce 60% of the computation used for intra prediction and mode decision while keeping the peak signal-to-noise ratio degradation less than 0.3 dB. The other solution is a hardware accelerator targeted at high-end applications. After comprehensive analysis of instructions and exploration of parallelism, we proposed our system architecture with four-parallel intra prediction and mode decision to enhance the processing capability. Hadamard-based mode decision is modified as discrete cosine transform-based version to reduce 40% of memory access. Two-stage macroblock pipelining is also proposed to double the processing speed and hardware utilization. The other features of our design are reconfigurable predictor generator supporting all of the 13 intra prediction modes, parallel multitransform and inverse transform engine, and CAVLC bitstream engine. A prototype chip is fabricated with TSMC 0.25-/spl mu/m CMOS 1P5M technology. Simulation results show that our implementation can process 16 mega-pixels (4096/spl times/4096) within 1 s, or namely 720/spl times/480 4:2:0 30 Hz video in real time, at the operating frequency of 54 MHz. The transistor count is 429 K, and the core size is only 1.855/spl times/1.885 mm/sup 2/.

IEEE Transactions on Circuits and Systems for Video Technology | 2006

Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder

Tung-Chien Chen; Shao-Yi Chien; Yu-Wen Huang; Chen-Han Tsai; Ching-Yeh Chen; To-Wei Chen; Liang-Gee Chen

H.264/AVC significantly outperforms previous video coding standards with many new coding tools. However, the better performance comes at the price of the extraordinarily huge computational complexity and memory access requirement, which makes it difficult to design a hardwired encoder for real-time applications. In addition, due to the complex, sequential, and highly data-dependent characteristics of the essential algorithms in H.264/AVC, both the pipelining and the parallel processing techniques are constrained to be employed. The hardware utilization and throughput are also decreased because of the block/MB/frame-level reconstruction loops. In this paper, we describe our techniques to design the H.264/AVC video encoder for HDTV applications. On the system design level, in consideration of the characteristics of the key components and the reconstruction loops, the four-stage macroblock pipelined system architecture is first proposed with an efficient scheduling and memory hierarchy. On the module design level, the design considerations of the significant modules are addressed followed by the hardware architectures, including low-bandwidth integer motion estimation, parallel fractional motion estimation, reconfigurable intrapredictor generator, dual-buffer block-pipelined entropy coder, and deblocking filter. With these techniques, the prototype chip of the efficient H.264/AVC encoder is implemented with 922.8 K logic gates and 34.72-KB SRAM at 108-MHz operation frequency.

IEEE Transactions on Circuits and Systems | 2006

Analysis and architecture design of variable block-size motion estimation for H.264/AVC

Ching-Yeh Chen; Shao-Yi Chien; Yu-Wen Huang; Tung-Chien Chen; Tu-Chih Wang; Liang-Gee Chen

Variable block-size motion estimation (VBSME) has become an important video coding technique, but it increases the difficulty of hardware design. In this paper, we use inter-/intra-level classification and various data flows to analyze the impact of supporting VBSME in different hardware architectures. Furthermore, we propose two hardware architectures that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches. By broadcasting reference pixel rows and propagating partial sums of absolute differences (SADs), the first design has the fewer reference pixel registers and a shorter critical path. The second design utilizes a two-dimensional distortion array and one adder tree with the reference buffer that can maximize the data reuse between successive searching candidates. The first design is suitable for low resolution or a small search range, and the second design has advantages of supporting a high degree of parallelism and VBSME. Finally, we propose an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation (IME). Its processing ability is eight times of the single SAD tree, but the reference buffer size is only doubled. Moreover, the most critical issue of H.264 IME, which is huge memory bandwidth, is overcome. We are able to save 99.9% off-chip memory bandwidth and 99.22% on-chip memory bandwidth. We demonstrate a 720-p, 30-fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory

international solid-state circuits conference | 2005

A 1.3TOPS H.264/AVC single-chip encoder for HDTV applications

Yu-Wen Huang; Tung-Chien Chen; Chen-Han Tsai; Ching-Yeh Chen; To-Wei Chen; Chi-Shi Chen; Chun-Fu Shen; Shyh-Yih Ma; Tu-Chih Wang; Bing-Yu Hsieh; Hung-Chi Fang; Liang-Gee Chen

An H.264/AVC encoder is implemented on a 31.72mm/sup 2/ die with 0.18/spl mu/m CMOS technology. A four-stage macroblock pipelined architecture encodes 720p 30f/s HDTV videos in real time at 108MHz. The encoded video quality is competitive with reference software requiring 3.6TOPS on a general-purpose processor-based platform.

international conference on acoustics, speech, and signal processing | 2004

Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC

Tung-Chien Chen; Yu-Wen Huang; Liang-Gee Chen

We contributed a new VLSI architecture for fractional motion estimation of the H.264/AVC video compression standard. Seven inter-related loops extracted from the complex procedure are analyzed and two decomposing techniques are proposed to parallelize the algorithm for hardware with a regular schedule and full utilization. The proposed architecture, also characterized by a reusable feature, can support situations in different specifications, multiple standards, fast algorithms and some cost considerations. H.264/AVC baseline profile level 3 with complete Lagrangian mode decision can be realized with 290K gates at operating frequency of 100 MHz. It is a useful intellectual property (IP) design for platform based multimedia systems.

IEEE Transactions on Circuits and Systems for Video Technology | 2007

Fast Algorithm and Architecture Design of Low-Power Integer Motion Estimation for H.264/AVC

Tung-Chien Chen; Yu-Han Chen; Sung-Fang Tsai; Shao-Yi Chien; Liang-Gee Chen

In an H.264/AVC video encoder, integer motion estimation (IME) requires 74.29% computational complexity and 77.49% memory access and becomes the most critical component for low-power applications. According to our analysis, an optimal low-power IME engine should be a parallel hardware architecture supporting fast algorithms and efficient data reuse (DR). In this paper, a hardware-oriented fast algorithm is proposed with the intra-/inter-candidate DR considerations. In addition, based on the systolic array and 2-D adder tree architecture, a ladder-shaped search window data arrangement and an advanced searching flow are proposed to efficiently support inter-candidate DR and reduce latency cycles. According to the implementation results, 97% computational complexity is saved by the proposed fast algorithm. In addition, 77.6% memory bandwidth is further saved with the proposed DR techniques at architecture level. In the ultra-low-power mode, the power consumption is 2.13 mW for real-time encoding CIF 30-fps videos at 13.5-MHz operating frequency

international symposium on circuits and systems | 2004

Analysis and design of macroblock pipelining for H.264/AVC VLSI architecture

Tung-Chien Chen; Yu-Wen Huang; Liang-Gee Chen

This paper presents a new macroblock (MB) pipelining scheme for H.264/AVC encoder. Conventional video encoders adopt two-stage MB pipelines, which are not suitable for H.264/AVC due to the long encoding path, sequential procedure, and large bandwidth requirement. According to our analysis of encoding process, an H.264/AVC accelerator is divided into five major functional blocks with four-stage MB pipelines to highly increase the processing capability and hardware utilization. By adopting shared memories between adjacent pipelines with sophisticated task scheduling, 55% of the bus bandwidth can be further reduced. Besides, hardware-oriented algorithms are proposed without loss of video quality to remove data dependencies that prevent parallel processing and MB pipelining. The H.264/AVC Baseline Profile Level Three encoder, which requires computational complexity of 1.8 tera-instructions per second (TIPS), is successfully mapped into hardware with our MB pipeline scheme at 100 MHz.

international symposium on circuits and systems | 2005

Architecture design of H.264/AVC decoder with hybrid task pipelining for high definition videos

To-Wei Chen; Yu-Wen Huang; Tung-Chien Chen; Yu-Han Chen; Chuan-Yung Tsai; Liang-Gee Chen

The most critical issue of an H.264/AVC decoder is the system architecture design with balanced pipelining schedules and proper degrees of parallelism. In this paper, a hybrid task pipelining scheme is first presented to greatly reduce the internal memory size and bandwidth. Block-level, macroblock-level, and macroblock/frame-level pipelining schedules are arranged for CAVLD/IQ/IT/INTRA/spl I.bar/PRED, INTER/spl I.bar/PRED, and DEBLOCK, respectively. Appropriate degrees of parallelism for each pipeline task are also proposed. Moreover, efficient modules are contributed. The CAVLD unit smoothly decodes the bitstream into symbols without bubble cycles. The INTER/spl I.bar/PRED unit highly exploits the data reuse between interpolation windows of neighboring blocks to save 60% of external memory bandwidth. The DEBLOCK unit doubles the processing capability of our previous work with only 35.3% of logic gate count overhead. The proposed baseline profile decoder architecture can support up to 2048/spl times/1024 30 fps videos with 217 K logic gates, 10 KB SRAMs, and 528.9 MB/s bus bandwidth when operating at 120 MHz.

asia and south pacific design automation conference | 2006

Hardware architecture design of an H.264/AVC video codec

Tung-Chien Chen; Chung-Jr Lian; Liang-Gee Chen

H.264/AVC is the latest video coding standard. It significantly outperforms the previous video coding standards, but the extraordinary huge computation complexity and memory access requirement make the hardwired codec solution a tough job. This paper describes the design methodology for H.264/AVC video codec. The system architecture and scheduling are addressed. The design consideration and optimization for its significant modules including bandwidth optimized motion compensation engine, reconfigurable intra predictor generator, low bandwidth parallel integer motion estimation are mentioned. Due to the complex, sequential, and highly data-depended characteristics of all essential algorithms in H.264/AVC, not only the pipeline structure but also efficient memory hierarchy is required. The design case with a hybrid task pipelining scheme, a balanced schedule with block-level, MB-level, and frame-level pipelining, are presented. By combining with many bandwidth reduction techniques and data reused schemes, very efficient architecture and implementation for plate-form based system is proved by the prototype chips

international solid-state circuits conference | 2009

A 212 MPixels/s 4096

Li-Fu Ding; Wei-Yin Chen; Pei-Kuei Tsung; Tzu-Der Chuang; Hsu-Kuang Chiu; Yu-Han Chen; Pai-Heng Hsiao; Shao-Yi Chien; Tung-Chien Chen; Ping-Chih Lin; Chia-Yu Chang; Liang-Gee Chen

To provide more vivid perception, TV resolution is increasing dramatically. In addition, 3D video is emerging because it can present immersive and complete scenes. Therefore, multiview video coding (MVC) is currently being developed as an extension of H.264/AVC [1]. Disparity estimation (DE), which effectively exploits the inter-view redundancy and reduces bit rates 20% to 30%, is the most significant feature. However, DE and motion estimation (ME) require ultra-high computation and memory access. To encode a 3-view 1080p video, 82.4TOPS computing power and 54.6TB/s memory access are required with a full search algorithm. Moreover, view scalability is a critical functionality to deal with various MVC structures.

Explore More