Chuan-Yung Tsai
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chuan-Yung Tsai.
international symposium on circuits and systems | 2005
To-Wei Chen; Yu-Wen Huang; Tung-Chien Chen; Yu-Han Chen; Chuan-Yung Tsai; Liang-Gee Chen
The most critical issue of an H.264/AVC decoder is the system architecture design with balanced pipelining schedules and proper degrees of parallelism. In this paper, a hybrid task pipelining scheme is first presented to greatly reduce the internal memory size and bandwidth. Block-level, macroblock-level, and macroblock/frame-level pipelining schedules are arranged for CAVLD/IQ/IT/INTRA/spl I.bar/PRED, INTER/spl I.bar/PRED, and DEBLOCK, respectively. Appropriate degrees of parallelism for each pipeline task are also proposed. Moreover, efficient modules are contributed. The CAVLD unit smoothly decodes the bitstream into symbols without bubble cycles. The INTER/spl I.bar/PRED unit highly exploits the data reuse between interpolation windows of neighboring blocks to save 60% of external memory bandwidth. The DEBLOCK unit doubles the processing capability of our previous work with only 35.3% of logic gate count overhead. The proposed baseline profile decoder architecture can support up to 2048/spl times/1024 30 fps videos with 217 K logic gates, 10 KB SRAMs, and 528.9 MB/s bus bandwidth when operating at 120 MHz.
midwest symposium on circuits and systems | 2005
Chuan-Yung Tsai; Tung-Chien Chen; To-Wei Chen; Liang-Gee Chen
Design of H.264/AVC motion compensation (MC) is very challenging through the high memory bandwidth and low hardware utilization caused by the new functionalities of variable block size and 6-tap interpolation filter. In this paper, the vertically integrated double Z (VIDZ) schedule, and interpolation window reuse (IWR) and interpolation window classification (IWC) bandwidth reduction schemes are proposed to keep the MC highly utilized and save 60-80% memory bandwidth. The hardware of proposed MC is implemented at 120MHz with 47K logic gates and can support 2048 times 1024 30fps H.264/AVC HDTV decoder with less than 200MB/s memory bandwidth
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2006
Tung-Chien Chen; Yu-Wen Huang; Chuan-Yung Tsai; Bing-Yu Hsieh; Liang-Gee Chen
Context-based adaptive variable-length coding (CAVLC) is a new and important feature of the latest video coding standard, H.264/AVC. The direct VLSI implementation of CAVLC modified from the conventional run-length coding architecture will lead to low throughput and utilization. In this brief, an efficient CAVLC design is proposed. The main concept is the two-stage block pipelining scheme for parallel processing of two 4 times 4 blocks. When one block is processed by the scanning engine to collect the required symbols, its previous block is handled by the coding engine to translate symbols into bitstream. Our dual-block-pipelined architecture doubles the throughput and utilization of CAVLC at high bit rates. Moreover, a zero skipping technique is adopted to reduce up to 90% of cycles at low bit rates. Last but not least, Exp-Golomb coding for other general symbols and bitstream encapsulation for the network abstraction layer are integrated with CAVLC as a complete H.264/AVC baseline profile entropy coder. Simulation shows that our design is capable of real-time processing for 1920 times 1088 30-fps videos with 23.6 K logic gates at 100 MHz
symposium on vlsi circuits | 2007
Tung-Chien Chen; Yu-Han Chen; Chuan-Yung Tsai; Sung-Fang Tsai; Shao-Yi Chien; Liang-Gee Chen
A 2.8 to 67.2 mW H.264 encoder is implemented on a 12.8 mm2 die with 0.18 mum CMOS technology. The proposed parallel architectures along with fast algorithms and data reuse schemes enable 77.9% power savings. The power awareness is provided through a flexible system hierarchy that supports content-aware algorithms and module-wise gated clock.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Tung-Chien Chen; Chuan-Yung Tsai; Yu-Wen Huang; Liang-Gee Chen
Due to the multiple reference frame motion estimation (MRF-ME), an H.264/AVC encoder requires ultrahigh memory bandwidth. Conventional multiple reference frames single current macroblock (MRSC) scheme only considers the data reuse within one frame, and requires on-chip memory size and off-chip memory bandwidth in proportional to the reference frame number. In this paper, a single reference frame multiple current macroblocks (SRMC) scheme is presented to further exploit the data reuse at frame level. With frame-level rescheduling of the motion estimation ME procedures in different reference frames, one loaded search window can be utilized by multiple current MBs in different original frames. The demanded on-chip memory size and off-chip memory bandwidth for MRF-ME can thus be reduced to those supporting only one reference frame. Moreover, based on SRMC scheme, an architecture prototype with two-stage mode decision flow is proposed. For HDTV specifications, 62.21 KB (74.8%) of SRAM and 364.3 MB/s (62.6%) of system bandwidth are saved in comparison with the MRSC scheme
IEEE Transactions on Circuits and Systems for Video Technology | 2009
Yu-Han Chen; Tung-Chien Chen; Chuan-Yung Tsai; Sung-Fang Tsai; Liang-Gee Chen
Because video services are becoming popular on portable devices, power becomes the primary design issue for video coders nowadays. H.264/AVC is an emerging video coding standard which can provide outstanding coding performance and thus is suitable for mobile applications. In this paper, we target a power-efficient H.264/AVC encoder. The main power consumption in an H.264/AVC encoding system is induced by data access of motion estimation (ME). At first, we propose hardware-oriented algorithms and corresponding parallel architectures of integer ME (IME) and fractional ME (FME) to achieve memory access power reduction. Then, a parameterized encoding system and flexible system architecture are proposed to provide power scalability and hardware efficiency, respectively. Finally, our design is implemented under TSMC 0.18 mum CMOS technology with 12.84 mm2 core area. The required hardware resources are 452.8 K logic gates and 16.95 KB SRAMs. The power consumption ranges from 67.2 to 43.5 mW under D1 (720 x 480) 30 frames/s video encoding, and more than 128 operating configurations are provided.
international symposium on circuits and systems | 2006
Tung-Chien Chen; Yu-Han Chen; Chuan-Yung Tsai; Liang-Gee Chen
In this paper, the low power design techniques from algorithm to architecture levels are proposed for fractional motion estimation in H.264/AVC. The proposed AMPD algorithm can reduce 50.8% power with up to 0.1 dB quality drop. The proposed parallel architecture with efficient memory hierarchy can efficiently reuse data and save 61.6% power. Furthermore, the power aware functionality is included. Our design can gracefully vary the quality degradation of 0.1-3.9 dB in response to the 22.58-1.64 mW power consumption. This power-oriented design is very efficient for different mobile applications in various power situations
international conference on multimedia and expo | 2009
Pei-Kuei Tsung; Wei-Yin Chen; Li-Fu Ding; Chuan-Yung Tsai; Tzu-Der Chuang; Liang-Gee Chen
Fractional motion estimation (FME) is widely used in video compression standards. In H.264/AVC, the precision of motion vector is down to quarter pixels to improve the coding efficiency. However, FME occupies over 45% of the computation complexity in an H.264 encoder and this high complexity limits the processing capability. In this paper, a single-iteration full search FME is proposed. By the algorithm and architecture co-optimization, the bandwidth to the frame buffer is reduced by 31%. Furthermore, 82% of circuit area for the Hadamard transformation and subtraction are saved from the direct implementation. Compared with prior arts, the proposed design supports 3.39 × higher throughput with only 0.02 dB PSNR drop. Thus, the specification of 4096 × 2160 quad full high definition H.264/AVC FME processing can be achieved.
international conference on multimedia and expo | 2006
Chuan-Yung Tsai; Tung-Chien Chen; Liang-Gee Chen
Low power hardware design for entropy coding of H.264/AVC baseline profile encoder is urgent for the increasing mobile applications. However, previous works are poor in the power performance. In this paper, the first low power context-based adaptive variable length coding (CAVLC) scheme named the side information aided (SIA) symbol look ahead (SLA) one-pass CAVLC is proposed, with the non-zero and abs-one SIA flags. A reconfigurable architecture for the SLA module is also proposed to support the low power CAVLC scheme efficiently. The resultant hardware power is reduced by 69% to only 3.7 mW at 27 MHz and 1.8 V for CIF-sized video coding. The total logic gate count is 27 K gates
international conference on acoustics, speech, and signal processing | 2007
Chuan-Yung Tsai; Chen-Han Chung; Yu-Han Chen; Tung-Chien Chen; Liang-Gee Chen
Low power motion estimation (ME) of H.264/AVC is an important research issue because of the growing mobile applications of H.264/AVC encoder. In this paper, low power cache algorithm and architecture for fast ME of H.264/AVC is proposed in order to replace the conventional search range (SR) memory. With the block translation (BT) cache architecture, search trajectory prediction (STP) prefetching algorithm, and ultra low power cache miss hiding (CMH) strategy, 35% SR memory writing power and 67% SR memory static power are reduced for D1 videos. Combining fast ME with the proposed cache provides the total solution for low power ME hardware.