Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chao-Tsung Huang is active.

Publication


Featured researches published by Chao-Tsung Huang.


IEEE Transactions on Signal Processing | 2004

Flipping structure: an efficient VLSI architecture for lifting-based discrete wavelet transform

Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen

In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.


IEEE Transactions on Circuits and Systems for Video Technology | 2006

Level C+ data reuse scheme for motion estimation with corresponding coding orders

Ching-Yeh Chen; Chao-Tsung Huang; Yi-Hau Chen; Liang-Gee Chen

The memory bandwidth reduction for motion estimation is important because of the power consumption and limited memory bandwidth in video coding systems. In this paper, we propose a Level C+ scheme which can fully reuse the overlapped searching region in the horizontal direction and partially reuse the overlapped searching region in the vertical direction to save more memory bandwidth compared to the Level C scheme. However, direct implementation of the Level C+ scheme may conflict with some important coding tools and then induces a lower hardware efficiency of video coding systems. Therefore, we propose n-stitched zigzag scan for the Level C+ scheme and discuss two types of 2-stitched zigzag scan for MPEG-4 and H.264 as examples. They can reduce memory bandwidth and solve the conflictions. When the specification is HDTV 720p, where the searching range is [-128,128), the required memory bandwidth is only 54%, and the increase of on-chip memory size is only 12% compared to those of traditional Level C data reuse scheme.


IEEE Transactions on Circuits and Systems for Video Technology | 2005

Generic RAM-based architectures for two-dimensional discrete wavelet transform with line-based method

Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen

In this paper, three generic RAM-based architectures are proposed to efficiently construct the corresponding two-dimensional architectures by use of the line-based method for any given hardware architecture of one-dimensional (1-D) wavelet filters, including conventional convolution-based and lifting-based architectures. An exhaustive analysis of two-dimensional architectures for discrete wavelet transform in the system view is also given. The first proposed architecture is for 1-level decomposition, which is presented by introducing the categories of internal line buffers, the strategy of optimizing the line buffer size, and the method of integrating any 1-D wavelet filter. The other two proposed architectures are for multi-level decomposition. One applies the recursive pyramid algorithm directly to the proposed 1-level architecture, and the other one combines the two previously proposed architectures to increase the hardware utilization. According to the comparison results, the proposed architecture outperforms previous architectures in the aspects of line buffer size, hardware cost, hardware utilization, and flexibility.


IEEE Transactions on Signal Processing | 2005

Analysis and VLSI architecture for 1-D and 2-D discrete wavelet transform

Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen

In this paper, a detailed analysis of very large scale integration (VLSI) architectures for the one-dimensional (1-D) and two-dimensional (2-D) discrete wavelet transform (DWT) is presented in many aspects, and three related architectures are proposed as well. The 1-D DWT and inverse DWT (IDWT) architectures are classified into three categories: convolution-based, lifting-based, and B-spline-based. They are discussed in terms of hardware complexity, critical path, and registers. As for the 2-D DWT, the large amount of the frame memory access and the die area occupied by the embedded internal buffer become the most critical issues. The 2-D DWT architectures are categorized and analyzed by different external memory scan methods. The implementation issues of the internal buffer are also discussed, and some real-life experiments are given to show that the area and power for the internal buffer are highly related to memory technology and working frequency, instead of the required memory size only. Besides the analysis, the B-spline-based IDWT architecture and the overlapped stripe-based scan method are also proposed. Last, we propose a flexible and efficient architecture for a one-level 2-D DWT that exploits many advantages of the presented analysis.


Proceedings of the IEEE | 2005

Advances in Hardware Architectures for Image and Video Coding - A Survey

Po-Chih Tseng; Yung-Chi Chang; Yu-Wen Huang; Hung-Chi Fang; Chao-Tsung Huang; Liang-Gee Chen

This paper provides a survey of state-of-the-art hardware architectures for image and video coding. Fundamental design issues are discussed with particular emphasis on efficient dedicated implementation. Hardware architectures for MPEG-4 video coding and JPEG 2000 still image coding are reviewed as design examples, and special approaches exploited to improve efficiency are identified. Further perspectives are also presented to address the challenges of hardware architecture design for advanced image and video coding in the future.


international symposium on circuits and systems | 2002

Efficient VLSI architectures of lifting-based discrete wavelet transform by systematic design method

Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen

In this paper, an effective systematic design method is proposed to construct several efficient VLSI architectures of 1-D and 2-D lifting-based discrete wavelet transform. This design method first performs a specific lifting factorization for any finite discrete wavelet transform filter to obtain an optimal algorithm representation for hardware implementation. The optimized algorithm then turns into 1-D systolic architectures through dependence graph formation and systolic arrays mapping. Based on the 1-D architectures, a general 2-D discrete wavelet transform framework is used to construct the corresponding 2-D architectures. According to the comparison results, the constructed VLSI architectures are more efficient than previous arts in term of arithmetic units and memory storage.


asia pacific conference on circuits and systems | 2002

Generic RAM-based architecture for two-dimensional discrete wavelet transform with line-based method

Po-Chih Tseng; Chao-Tsung Huang; Liang-Gee Chen

In this paper, by using line-based methods, a generic RAM-based architecture is proposed to construct the corresponding two-dimensional architectures efficiently for any given hardware architecture of one-dimensional wavelet filters, including conventional convolution-based and advanced lifting-based architectures. The categories of line buffer and the strategy to optimize the size of internal memory are also described. For multi-level two-dimensional discrete wavelet transforms, the recursive pyramid algorithm is adopted to turn our proposed architecture into another efficient architecture. According to the comparison results, the proposed architecture outperforms previous arts in the aspects of memory size, control complexity, and flexibility.


international symposium on circuits and systems | 2002

An efficient architecture for two-dimensional inverse discrete wavelet transform

Po-Cheng Wu; Chao-Tsung Huang; Liang-Gee Chen

This paper proposes an efficient architecture for the two-dimensional inverse discrete wavelet transform (2D IDWT). The proposed architecture includes an inverse transform module, a RAM module, and a multiplexer. In the inverse transform module, we employ the coefficient folding technique and the polyphase decomposition technique to the interpolation filters of stages 1 and 2, respectively. The RAM size is N/2/spl times/N/2. The advantages of the proposed architecture are the 100% hardware utilization, fast computing time, regular data flow, and low control complexity, making this architecture suitable for next generation image coding/decoding systems.


IEEE Journal of Solid-state Circuits | 2014

A 249-Mpixel/s HEVC Video-Decoder Chip for 4K Ultra-HD Applications

Mehul Tikekar; Chao-Tsung Huang; Chiraag Juvekar; Vivienne Sze; Anantha P. Chandrakasan

High Efficiency Video Coding, the latest video standard, uses larger and variable-sized coding units and longer interpolation filters than H.264/AVC to better exploit redundancy in video signals. These algorithmic techniques enable a 50% decrease in bitrate at the cost of computational complexity, external memory bandwidth, and, for ASIC implementations, on-chip SRAM of the video codec. This paper describes architectural optimizations for an HEVC video decoder chip. The chip uses a two-stage subpipelining scheme to reduce on-chip SRAM by 56 kbytes-a 32% reduction. A high-throughput read-only cache combined with DRAM-latency-aware memory mapping reduces DRAM bandwidth by 67%. The chip is built for HEVC Working Draft 4 Low Complexity configuration and occupies 1.77 mm2 in 40-nm CMOS. It performs 4K Ultra HD 30-fps video decoding at 200 MHz while consuming 1.19 nJ/pixel of normalized system power.


international solid-state circuits conference | 2004

81MS/s JPEG2000 single-chip encoder with rate-distortion optimization

Hung-Chi Fang; Chao-Tsung Huang; Yu-Wei Chang; Tu-Chih Wang; Po-Chih Tseng; Chung-Jr Lian; Liang-Gee Chen

An 81MS/s JPEG 2000 single-chip encoder is implemented on a 5.5mm/sup 2/ die using 0.25/spl mu/m CMOS technology. This IC can encode HDTV 720p resolution at 30 frames/s in real time. The rate-distortion optimized chip encodes tile size of 128/spl times/128, code block size of 64/spl times/64, and image size up to 32K/spl times/32K.

Collaboration


Dive into the Chao-Tsung Huang's collaboration.

Top Co-Authors

Avatar

Liang-Gee Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Po-Chih Tseng

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Ching-Yeh Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Chih-Chi Cheng

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Anantha P. Chandrakasan

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mehul Tikekar

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Chiraag Juvekar

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Vivienne Sze

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yi-Hau Chen

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Yu-Wei Chang

National Taiwan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge