Chao-Yang Kao
National Tsing Hua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chao-Yang Kao.
IEEE Transactions on Very Large Scale Integration Systems | 2010
Chao-Yang Kao; Cheng-Long Wu; Youn-Long Lin
Multiple-reference-frame, quarter-pixel accuracy, and variable-block-size motion estimation (VBSME) employed in H.264/AVC is one of the major contributors to its outstanding compression efficiency and video quality. However, due to its high computational complexity, VBSME needs acceleration for real-time application. We propose a high throughput hardware architecture for H.264/AVC fractional motion estimation (FME). The proposed architecture consists of three parallel processing engines. In addition, we propose a resource sharing method which leads to 50% hardware saving in the computation sum of absolute transformed difference (SATD). Synthesized into a TSMC 130 nm CMOS cell library, our design takes 311.7K gates at 154 MHz and can encode 1080 pHD video at 30 frames per second (fps). Compared to previous works, the proposed design runs at much lower frequency for the same resolution and frame rate.
IEEE Transactions on Very Large Scale Integration Systems | 2010
Chao-Yang Kao; Youn-Long Lin
Variable block size motion estimation (VBSME) is one of several contributors to H.264/AVCs excellent coding efficiency. However, its high computational complexity and huge memory traffic make deign difficult. In this paper, we propose a memory-efficient and highly parallel VLSI architecture for full search VBSME (FSVBSME). Our architecture consists of 16 2-D arrays each consists of 16 × 16 processing elements (PEs). Four arrays form a group to match in parallel four reference blocks against one current block. Four groups perform block matching for four current blocks in a pipelined fashion. Taking advantage of overlapping among multiple reference blocks of a current block and between search windows of adjacent current blocks, we propose a novel data reuse scheme to reduce memory access. Compared with the popular Level C data reuse scheme, our approach can save 98% of on-chip memory access with only 25% of local memory overhead. Synthesized into a TSMC 180-nm CMOS cell library, our design is capable of processing 1920 × 1088 30 fps video when running at 130 MHz. The architecture is scalable for wider search range, multiple reference frames and pixel truncation as well as down sampling. We suggest a criterion called design efficiency for comparing different works. It shows that the proposed design is 72% more efficient than the best design to date.
asia and south pacific design automation conference | 2006
Jian-Wen Chen; Chao-Yang Kao; Youn-Long Lin
We give a tutorial on video coding principles and standards with emphasis on the latest technology called H.264 or MPEG-4 part 10. We describe a basic method called block-based hybrid coding employed by most video coding standards. We use graphical illustration to show the functionality. This paper is suitable for those who are interested in implementing video codec in embedded software, pure hardwired, or a combination of both.
international conference on multimedia and expo | 2006
Chao-Yang Kao; Huang-Chih Kuo; Youn-Long Lin
We propose a high performance architecture for fractional motion estimation and Lagrange mode decision in H.264/AVC. Instead of time-consuming fractional-pixel interpolation and secondary search, our fractional motion estimator employees a mathematical model to estimate SADs at quarter-pixel position. Both computation time and memory access requirements are greatly reduced without significant quality degradation. We propose a novel cost function for mode decision that leads to much better performance than traditional low complexity method. Synthesized into a TSMC 0.13 mum CMOS technology, our design takes 56 k gates at 100 MHz and is sufficient to process QUXGA (3200times2400) video sequences at 30 frames per second (fps). Compared with a state-of-the-art design operating under the same frequency, ours is 30% smaller and has 18 times more throughput at the expense of only 0.05 db in PSNR difference
international conference on multimedia and expo | 2008
Cheng-Long Wu; Chao-Yang Kao; Youn-Long Lin
Variable-block-size motion estimation (VBSME) is one of the contributors to H.264/Advanced Video Coding (AVC)s excellent coding efficiency. Due to its high computational complexity, however, VBSME needs acceleration for real-time high-resolution applications. We propose a high-performance hardware architecture for H.264/AVC fractional motion estimation. Our architecture consists of three parallel processing engines, one for 4 × 4 and 8 × 8 blocks, one for 8 × 4 and 4 × 8 blocks, and another for the remaining type of blocks. In addition, we propose a resource-sharing scheme which saves 33% of hardware cost for the computation of the sum of absolute transformed difference. Synthesized into a Taiwan Semiconductor Manufacturing Company (TSMC) 180-nm CMOS cell library, our 321-K gate design only needs to run at 154 MHz when encoding a 1920 ×1088 video at 30 frames per second. Compared with a most comparable previous work that consumes 311 K gates and runs at 200 MHz, our proposed architecture is more efficient.
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Huang-Chih Kuo; Jian-Wen Chen
Motion estimation in H.264/AVC supports quarter-pixel precision and is usually carried out in two phases: integer motion estimation (IME) and fractional motion estimation (FME). We have talked about IME in Chap.3. After IME finds an integer motion vector (IMV) for each of the 41 subblocks, FME performs motion search around the refinement center pointed to by IMV and further refines 41 IMVs into fractional MVs (FMVs) of quarter-pixel precision. FME interpolates half-pixels using a six-tap filter and then quarter-pixels a two-tap one. Nine positions are searched in both half refinement (one integer-pixel search center pointed to by IMV and 8 half-pixel positions) and then quarter refinement (1 half-pixel position and 8 quarter-pixel positions). The position with minimum residual error is chosen as the best match. FME can significantly improve the video quality (+0.3 to+0.5dB) and reduce bit-rate (20–37%) according to our experimental results. However, our profiling report shows that FME consumes more than 40% of the total encoding time. Therefore, an efficient hardware accelerator for FME is indispensable.
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Huang-Chih Kuo; Jian-Wen Chen
Interframe prediction in H.264/AVC is carried out in three phases: integer motion estimation (IME), fractional motion estimation, and motion compensation. We will discuss these functions in this chapter and Chaps.4 and 5, respectively. Because motion estimation in H.264/AVC supports variable block sizes and multiple reference frames, high computational complexity and huge data traffic become main difficulties in VLSI implementation. Moreover, high-resolution video applications, such as HDTV, make these problems more critical. Therefore, current VLSI designs usually adopt parallel architecture to increase the total throughput and solve high computational complexity. On the other hand, many data-reuse schemes try to increase data-reuse ratio and, hence, reduce required data traffic. In this chapter, we will introduce several key points of VLSI implementation for IME.
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Huang-Chih Kuo; Jian-Wen Chen
A video signal is represented as a sequence of frames of pixels. There exists a vast amount of redundant information that can be eliminated with video compression technology so that transmission and storage becomes more efficient. To facilitate interoperability between compression at the video producing source and decompression at the consumption end, several generations of video coding standards have been defined and adapted. For low-end applications, software solutions are adequate. For high-end applications, dedicated hardware solutions are needed. This chapter gives an overview of the principles behind video coding in general and the advanced features of H.264/AVC standard in particular. It serves as an introduction to the remaining chapters; each covers an important coding tool and its VLSI architectural design of an H.264/AVC encoder.
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Hung-Chih Kuo; Jian-Wen Chen
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Hung-Chih Kuo; Jian-Wen Chen