Po-Chih Tseng
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Po-Chih Tseng.
IEEE Transactions on Signal Processing | 2004
Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen
In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.
IEEE Transactions on Circuits and Systems for Video Technology | 2005
Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen
In this paper, three generic RAM-based architectures are proposed to efficiently construct the corresponding two-dimensional architectures by use of the line-based method for any given hardware architecture of one-dimensional (1-D) wavelet filters, including conventional convolution-based and lifting-based architectures. An exhaustive analysis of two-dimensional architectures for discrete wavelet transform in the system view is also given. The first proposed architecture is for 1-level decomposition, which is presented by introducing the categories of internal line buffers, the strategy of optimizing the line buffer size, and the method of integrating any 1-D wavelet filter. The other two proposed architectures are for multi-level decomposition. One applies the recursive pyramid algorithm directly to the proposed 1-level architecture, and the other one combines the two previously proposed architectures to increase the hardware utilization. According to the comparison results, the proposed architecture outperforms previous architectures in the aspects of line buffer size, hardware cost, hardware utilization, and flexibility.
IEEE Transactions on Signal Processing | 2005
Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen
In this paper, a detailed analysis of very large scale integration (VLSI) architectures for the one-dimensional (1-D) and two-dimensional (2-D) discrete wavelet transform (DWT) is presented in many aspects, and three related architectures are proposed as well. The 1-D DWT and inverse DWT (IDWT) architectures are classified into three categories: convolution-based, lifting-based, and B-spline-based. They are discussed in terms of hardware complexity, critical path, and registers. As for the 2-D DWT, the large amount of the frame memory access and the die area occupied by the embedded internal buffer become the most critical issues. The 2-D DWT architectures are categorized and analyzed by different external memory scan methods. The implementation issues of the internal buffer are also discussed, and some real-life experiments are given to show that the area and power for the internal buffer are highly related to memory technology and working frequency, instead of the required memory size only. Besides the analysis, the B-spline-based IDWT architecture and the overlapped stripe-based scan method are also proposed. Last, we propose a flexible and efficient architecture for a one-level 2-D DWT that exploits many advantages of the presented analysis.
Proceedings of the IEEE | 2005
Po-Chih Tseng; Yung-Chi Chang; Yu-Wen Huang; Hung-Chi Fang; Chao-Tsung Huang; Liang-Gee Chen
This paper provides a survey of state-of-the-art hardware architectures for image and video coding. Fundamental design issues are discussed with particular emphasis on efficient dedicated implementation. Hardware architectures for MPEG-4 video coding and JPEG 2000 still image coding are reviewed as design examples, and special approaches exploited to improve efficiency are identified. Further perspectives are also presented to address the challenges of hardware architecture design for advanced image and video coding in the future.
IEEE Circuits and Systems Magazine | 2007
Chung-Jr Lian; Shao-Yi Chien; Chia-Ping Lin; Po-Chih Tseng; Liang-Gee Chen
© C O M S T O C K Feature
international symposium on circuits and systems | 2002
Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen
In this paper, an effective systematic design method is proposed to construct several efficient VLSI architectures of 1-D and 2-D lifting-based discrete wavelet transform. This design method first performs a specific lifting factorization for any finite discrete wavelet transform filter to obtain an optimal algorithm representation for hardware implementation. The optimized algorithm then turns into 1-D systolic architectures through dependence graph formation and systolic arrays mapping. Based on the 1-D architectures, a general 2-D discrete wavelet transform framework is used to construct the corresponding 2-D architectures. According to the comparison results, the constructed VLSI architectures are more efficient than previous arts in term of arithmetic units and memory storage.
asia pacific conference on circuits and systems | 2002
Po-Chih Tseng; Chao-Tsung Huang; Liang-Gee Chen
In this paper, by using line-based methods, a generic RAM-based architecture is proposed to construct the corresponding two-dimensional architectures efficiently for any given hardware architecture of one-dimensional wavelet filters, including conventional convolution-based and advanced lifting-based architectures. The categories of line buffer and the strategy to optimize the size of internal memory are also described. For multi-level two-dimensional discrete wavelet transforms, the recursive pyramid algorithm is adopted to turn our proposed architecture into another efficient architecture. According to the comparison results, the proposed architecture outperforms previous arts in the aspects of memory size, control complexity, and flexibility.
IEEE Transactions on Circuits and Systems for Video Technology | 2009
Chih-Chi Cheng; Po-Chih Tseng; Liang-Gee Chen
In a typical portable multimedia system, external access, which is usually dominated by block-based video content, induces more than half of total system power. Embedded compression (EC) effectively reduces external access caused by video content by reducing the data size. In this paper, an algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented. Lossless mode, and lossy modes with rate control modes and quality control modes are all supported by single algorithm. The proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding. The proposed EC codec engine can save 62%, 66%, and 77% external access at lossless mode, half-size mode, and quarter-size mode and can be used in various system power conditions. With TSMC 0.18 mum 1P6M CMOS logic process, the proposed EC codec engine can encode or decode CIF 30 frame per second video data and achieve power saving of more than 109 mW. The EC codec engine itself consumes only 2 mW power.
international solid-state circuits conference | 2004
Hung-Chi Fang; Chao-Tsung Huang; Yu-Wei Chang; Tu-Chih Wang; Po-Chih Tseng; Chung-Jr Lian; Liang-Gee Chen
An 81MS/s JPEG 2000 single-chip encoder is implemented on a 5.5mm/sup 2/ die using 0.25/spl mu/m CMOS technology. This IC can encode HDTV 720p resolution at 30 frames/s in real time. The rate-distortion optimized chip encodes tile size of 128/spl times/128, code block size of 64/spl times/64, and image size up to 32K/spl times/32K.
signal processing systems | 2003
Chao-Tsung Huang; Po-Chih Tseng; Liang-Gee Chen
Based on B-spline factorization, a new category of architectures for Discrete Wavelet Transform (DWT) is proposed in this paper. The B-spline factorization mainly consists of the B-spline part and the distributed part. The former is proposed to be constructed by use of the direct implementation or Pascal implementation. And the latter is the part introducing multipliers and can be implemented with the Type-I or Type-II polyphase decomposition. Since the degree of the distributed part is usually designed as small as possible, the proposed architectures could use fewer multipliers than previous arts, but more adders would be required. However, many adders can be implemented with smaller area and lower speed because only few adders are on the critical path. Three case studies, including the JPEG2000 default (9, 7) filter, the (6, 10) filter, and the (10, 18) filter, are given to demonstrate the efficiency of the proposed architectures.