Huang-Chih Kuo
National Tsing Hua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Huang-Chih Kuo.
IEEE Transactions on Very Large Scale Integration Systems | 2011
Huang-Chih Kuo; Li-Cian Wu; Hao-Ting Huang; Sheng-Tsung Hsu; Youn-Long Lin
H.264/AVC intra-frame encoding contains several computation-intensive coding tools that form a long data dependency loop that is difficult to speed up. In this paper, we present a low-power and high-performance H.264/AVC intra-frame encoder. We propose several novel approaches to alleviate the performance bottleneck caused by the long data dependency loop among 4 × 4 luma blocks, integrate an efficient CABAC entropy encoder, and apply a clock-gating technique to reduce power consumption. Synthesized into a TSMC 0.13 μm CMOS cell library, our design requires 265.3 K gates at 114 MHz and consumes 23.56 mW to encode 1080pHD (1920 × 1088) video sequences at 30 frames per second (fps). It also delivers the same video quality as the H.264/AVC reference software. Compared with all state-of-the-art designs, our design has a lower working frequency and achieves both better bit-rate saving and lower power consumption.
IEEE Transactions on Multimedia | 2012
Huang-Chih Kuo; Youn-Long Lin
We propose a simple and effective lossless compression algorithm for video display frames. It combines the dictionary coding, the Huffman coding, and three proposed innovative schemes to achieve a high compression ratio. We quantitatively analyze the characteristics of display frame data for designing the algorithm. We first propose a two-stage classification scheme to classify all pixels into three categories. Then we employ the dictionary coding and propose an adaptive prefix bit truncation scheme to generate codewords for video pixels in each category. We subsequently employ the Huffman coding scheme to assign bit values to the codewords. Finally, we propose a head code compression scheme to further reduce the size of the codeword bits. Experimental results show that the proposed algorithm achieves 22% more reduction than prior arts.
international conference on multimedia and expo | 2006
Chao-Yang Kao; Huang-Chih Kuo; Youn-Long Lin
We propose a high performance architecture for fractional motion estimation and Lagrange mode decision in H.264/AVC. Instead of time-consuming fractional-pixel interpolation and secondary search, our fractional motion estimator employees a mathematical model to estimate SADs at quarter-pixel position. Both computation time and memory access requirements are greatly reduced without significant quality degradation. We propose a novel cost function for mode decision that leads to much better performance than traditional low complexity method. Synthesized into a TSMC 0.13 mum CMOS technology, our design takes 56 k gates at 100 MHz and is sufficient to process QUXGA (3200times2400) video sequences at 30 frames per second (fps). Compared with a state-of-the-art design operating under the same frequency, ours is 30% smaller and has 18 times more throughput at the expense of only 0.05 db in PSNR difference
asia pacific conference on circuits and systems | 2006
Yu-Chien Kao; Huang-Chih Kuo; Yin-Tzu Lin; Chia-Wen Hou; Yi-Hsien Li; Hao-Tin Huang; Youn-Long Lin
The authors propose a high-performance hardware accelerator for intra prediction and mode decision in H.264/AVC video encoding. They use two intra prediction units to increase the performance. Taking advantage of function similarity and data reuse, the authors successfully reduce the hardware cost of the intra prediction units. Based on a modified mode decision algorithm, the design can deliver almost the same video quality as the reference software. The authors implemented the proposed architecture in Verilog and synthesized it targeting towards a TSMC 0.13mum CMOS cell library. Running at 75MHz, the 36K-gate circuit is capable of realtime encoding 720p HD (1280times720) video sequences at 30 frames per second (fps)
embedded systems for real-time multimedia | 2009
Hui-Ting Yang; Jian-Wen Chen; Huang-Chih Kuo; Youn-Long Lin
For all video applications, large amounts of data are processed within a bounded time. These data are usually stored in a low-cost slow external DRAM which results in high memory bandwidth requirement. The memory bandwidth will dominate the system performance, especially for applications running on embedded systems. In this paper, we propose an effective dictionary-based compression and de-compression algorithm for display frames in a video decoding system and present its hardware implementation. We have integrated the proposed design into an H.264/AVC video decoder. Simulation result shows that the proposed algorithm achieves 54% of compression ratio and 34% of memory traffic reduction when decoding 1080HD video. It is much more effective than all previous works.
embedded systems for real-time multimedia | 2009
Huang-Chih Kuo; Jian-Wen Chen; Youn-Long Lin
We present a high-performance and low-power pure-hardware accelerator for decoding H.264/AVC video. We propose novel VLSI architectures for every stage of the decoding pipeline. We wrap the decoder core with an AMBA bus interface, integrate it into a multimedia SOC platform, and verify it with FPGA prototyping. In order to reduce external memory traffic, we propose a memory fetch unit to increase the length of burst access. Running at a 16 MHz, our FPGA decoder prototype can real-time decode D1 video (720×480) at 30 fps. We also propose several techniques to reduce both average and peak power consumption. Simulation result shows that our design consumes only 21.2 mW of average power. The proposed H.264/AVC video decoder is suitable for embedded multimedia systems for mobile applications.
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Huang-Chih Kuo; Jian-Wen Chen
Motion estimation in H.264/AVC supports quarter-pixel precision and is usually carried out in two phases: integer motion estimation (IME) and fractional motion estimation (FME). We have talked about IME in Chap.3. After IME finds an integer motion vector (IMV) for each of the 41 subblocks, FME performs motion search around the refinement center pointed to by IMV and further refines 41 IMVs into fractional MVs (FMVs) of quarter-pixel precision. FME interpolates half-pixels using a six-tap filter and then quarter-pixels a two-tap one. Nine positions are searched in both half refinement (one integer-pixel search center pointed to by IMV and 8 half-pixel positions) and then quarter refinement (1 half-pixel position and 8 quarter-pixel positions). The position with minimum residual error is chosen as the best match. FME can significantly improve the video quality (+0.3 to+0.5dB) and reduce bit-rate (20–37%) according to our experimental results. However, our profiling report shows that FME consumes more than 40% of the total encoding time. Therefore, an efficient hardware accelerator for FME is indispensable.
Ipsj Transactions on System Lsi Design Methodology | 2013
Huang-Chih Kuo; Youn-Long Lin
Intra-frame encoding is useful for many video applications such as security surveillance, digital cinema, and video conferencing because it supports random access to every video frame for easy editing and has low computational complexity that results in low hardware cost. H.264/AVC, which is the most popular video coding standard today, also defines novel intra-coding tools to achieve high compression performance at the expense of significantly increased computational complexity. We present a VLSI design for H.264/AVC intra-frame encoder. The paper summaries several novel approaches to alleviate the performance bottleneck caused by the long data dependency loop among 4 × 4 luma blocks, integrate a high-performance hardwired CABAC entropy encoder, and apply a clock-gating technique to reduce power consumption. Synthesized with a TSMC 130 nm CMOS cell library, our design requires 194.1 K gates at 108 MHz and consumes 19.8 mW to encode 1080p (1920 × 1088) video sequences at 30 frames per second (fps). It also delivers the same video quality as the H.264/AVC reference software. We suggest a figure of merit called Design Efficiency for fair comparison of different works. Experimental results show that the proposed design is more efficient than prior arts.
international conference on multimedia and expo | 2011
Huang-Chih Kuo; Youn-Long Lin
We propose a simple and effective lossless compression algorithm for video display frames. It combines a dictionary-based compression algorithm and the Huffman coding method to achieve a high compression ratio. We quantitatively analyze the characteristics of display frame data and propose the algorithm accordingly. We first use a dictionary-based algorithm and an adaptive quotient bit truncation method to generate codewords for all video pixels. Then, we employ the Huffman coding scheme to assign bit values to the codewords. Finally, we apply a simple algorithm to further reduce the size of the codeword bits. Compared with previous works, the proposed algorithm achieves at least 13% improvement in data reduction ratio.
Archive | 2010
Youn-Long Lin; Chao-Yang Kao; Huang-Chih Kuo; Jian-Wen Chen
Interframe prediction in H.264/AVC is carried out in three phases: integer motion estimation (IME), fractional motion estimation, and motion compensation. We will discuss these functions in this chapter and Chaps.4 and 5, respectively. Because motion estimation in H.264/AVC supports variable block sizes and multiple reference frames, high computational complexity and huge data traffic become main difficulties in VLSI implementation. Moreover, high-resolution video applications, such as HDTV, make these problems more critical. Therefore, current VLSI designs usually adopt parallel architecture to increase the total throughput and solve high computational complexity. On the other hand, many data-reuse schemes try to increase data-reuse ratio and, hence, reduce required data traffic. In this chapter, we will introduce several key points of VLSI implementation for IME.