Chih-Chi Cheng
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chih-Chi Cheng.
IEEE Transactions on Circuits and Systems for Video Technology | 2009
Chih-Chi Cheng; Po-Chih Tseng; Liang-Gee Chen
In a typical portable multimedia system, external access, which is usually dominated by block-based video content, induces more than half of total system power. Embedded compression (EC) effectively reduces external access caused by video content by reducing the data size. In this paper, an algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented. Lossless mode, and lossy modes with rate control modes and quality control modes are all supported by single algorithm. The proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding. The proposed EC codec engine can save 62%, 66%, and 77% external access at lossless mode, half-size mode, and quarter-size mode and can be used in various system power conditions. With TSMC 0.18 mum 1P6M CMOS logic process, the proposed EC codec engine can encode or decode CIF 30 frame per second video data and achieve power saving of more than 109 mW. The EC codec engine itself consumes only 2 mW power.
international solid-state circuits conference | 2008
Chih-Chi Cheng; Chia-Hua Lin; Chung-Te Li; Samuel C. Chang; Chia-Jung Hsu; Liang-Gee Chen
Visual sensors combined with video analysis algorithms can enhance applications in surveillance, healthcare, intelligent vehicle control, human-machine interfaces, etc. Hardware solutions exist for video analysis. Analog on-sensor processing solutions feature image sensor integration. However, the precision loss of analog signal processing prevents those solutions from realizing complex algorithms, and they lack flexibility. Vision processors realize high GOPS numbers by combining a processor array for parallel operations and a decision processor for other ones. Converting from parallel data in the processor array to scalar in the decision processor creates a throughput bottleneck. Parallel memory accesses also lead to high power consumption. Privacy is a critical issue in setting up visual sensors because of the danger of revealing video data from image sensors or processors. These issues exist with the above solutions because inputting or outputting video data is inevitable.
international solid-state circuits conference | 2006
Chia-Ping Lin; Po-Chih Tseng; Yao-Ting Chiu; Siou-Shen Lin; Chih-Chi Cheng; Hung-Chi Fang; Wei-Min Chao; Liang-Gee Chen
A 5mW MPEG4 SP encoder is implemented on a 7.7mm2 die in 0.18mum CMOS technology. It encodes CIF 30frames/s in real-time at 9.5MHz using 5mW at 1.3V and VGA 30frames/s at 28.5MHz uses 18mW at 1.4V. This chip employs a 2D bandwidth-sharing ME design, content-aware DCT/IDCT, and clock gating techniques to minimize power consumption
signal processing systems | 2005
Chih-Chi Cheng; Po-Chih Tseng; Chao-Tsung Huang; Liang-Gee Chen
In a typical multi-chip handheld system for multi-media applications, external access, which is usually dominated by block-based video content, induces more than half of total system power. Embedded compression (EC) effectively reduces external access caused by video content by reducing the data size. In this paper, an algorithm and a hardware architecture of a new type EC codec engine with multiple modes are presented. Lossless mode, and lossy modes with rate control modes and quality control modes are all supported by single algorithm. The proposed four-tree pipelining scheme can reduce 83% latency and 67% buffer size between transform and entropy coding. The proposed EC codec engine can save 50%, 61%, and 77% external access at lossless mode, half-size mode, and quarter-size mode and can be used in various system power conditions. With Artisan 0.18 /spl mu/m cell library, the proposed EC codec engine can encode or decode at VGA@30fps with area and power consumption of 293,605 /spl mu/m/sup 2/ and 3.36 mW.
design automation conference | 2008
Chih-Chi Cheng; Chia-Hua Lin; Chung-Te Li; Samuel C. Chang; Liang-Gee Chen
iVisual, an intelligent visual sensor SoC integrating 2790fps CMOS image sensor and 76.8GOPS, 374 mW vision processor, is implemented on a 7.5 mm times 9.4 mm die with 0.18 mum CIS process. The light-in, answer-out SoC architecture avoids privacy problems of intelligent visual sensor. The feature processor and inter-processor synchronization scheme together increase 51% of average throughput. The 205GOPS/W power efficiency and 1.16GOPS/mm2 area efficiency are achieved.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Chih-Chi Cheng; Chao-Tsung Huang; Ching-Yeh Chen; Chung-Jr Lian; Liang-Gee Chen
The on-chip line buffer dominates the total area and power of line-based 2-D discrete wavelet transform (DWT). In this paper, a memory-efficient VLSI implementation scheme for line-based 2-D DWT is proposed, which consists of two parts, the wordlength analysis methodology and the multiple-lifting scheme. The required wordlength of on-chip memory is determined firstly by use of the proposed wordlength analysis methodology, and a memory-efficient VLSI implementation scheme for line-based 2-D DWT, named multiple-lifting scheme, is then proposed. The proposed wordlength analysis methodology can guarantee to avoid overflow of coefficients, and the average difference between predicted and experimental quality level is only 0.1 dB in terms of PSNR. The proposed multiple-lifting scheme can reduce not only at least 50% on-chip memory bandwidth but also about 50% area of line buffer in 2-D DWT module.
IEEE Transactions on Image Processing | 2006
Yu-Wei Chang; Hung-Chi Fang; Chih-Chi Cheng; Chun-Chia Chen; Liang-Gee Chen
In this paper, a precompression quality-control algorithm is proposed. It can greatly reduce computational power of the embedded block coding (EBC) and memory requirement to buffer bit streams. By using the propagation property and the randomness property of the EBC algorithm, rate and distortion of coding passes is approximately predicted. Thus, the truncation points are chosen before actual coding by the entropy coder. Therefore, the computational power, which is measured with the number of contexts to be processed, is greatly reduced since most of the computations are skipped. The memory requirement, which is measured with the amount required to buffer bit streams, is also reduced since the skipped contexts do not generate bit streams. Experimental results show that the proposed algorithm reduces the computational power of the EBC by 80% on average at 0.8 bpp compared with the conventional postcompression rate-distortion optimization algorithm . Moreover, the memory requirement is also reduced by 90%. The average PSNR degrades only about 0.1~0.3 dB, on average
international solid-state circuits conference | 2011
Pei-Kuei Tsung; Ping-Chih Lin; Kuan-Yu Chen; Tzu-Der Chuang; Hsin-Jung Yang; Shao-Yi Chien; Li-Fu Ding; Wei-Yin Chen; Chih-Chi Cheng; Tung-Chien Chen; Liang-Gee Chen
3DTV promises to become the mainstream of next-generation TV systems. Highresolution 3DTV provides users with a vivid watching experience. Moreover, free-viewpoint view synthesis (FVVS) extends the common two-view stereo 3D vision into virtual reality by generating unlimited views from any desired viewpoint. In the next-generation 3DTV systems, the set-top box (STB) SoC requires both a high-definition (HD) multiview video-coding (MVC) decoder to reconstruct the real camera-captured scenes and a free-viewpoint view synthesizer to generate the virtual scenes [1–2].
IEEE Transactions on Circuits and Systems for Video Technology | 2008
Yi-Hau Chen; Chih-Chi Cheng; Tzu-Der Chuang; Ching-Yeh Chen; Shao-Yi Chien; Liang-Gee Chen
Since motion-compensated temporal filtering (MCTF) becomes an important temporal prediction scheme in video coding algorithms, this paper presents an efficient temporal prediction engine which not only is the first MCTF hardware work but also supports traditional motion-compensated prediction (MCP) scheme to provide computation scalability. For the prediction stage of MCTF and MCP schemes, modified extended double current Frames is adopted to reduce the system memory bandwidth, and a frame-interleaved macroblock pipelining scheme is proposed to eliminate the induced data buffer overhead. In addition, the proposed update stage architecture with pipelined scheduling and motion estimation (ME)-like motion compensation (MC) with level C+ scheme can also save about half external memory bandwidth and eliminate irregular memory access for MC. Moreover, 76.4% hardware area of the update stage is saved by reusing the hardware resources of the prediction stage. This MCTF chip can process CIF 30 fps in real-time, and the searching range is [-32, 32) for 5/3 MCTF with four-decomposition level and also support 1/3 MCTF, hierarchical B-frames, and MCP coding schemes in JSVM and H.264/AVC. The gate count is 352-K gates with 16.8 KBytes internal memory, and the maximum operating frequency is 60 MHz.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Yu-Wei Chang; Chih-Chi Cheng; Chun-Chia Chen; Hung-Chi Fang; Liang-Gee Chen
A 124 MSamples/s JPEG 2000 codec is implemented on a 20.1 mm2 die with 0.18 mum CMOS technology dissipating 385 mW at 1.8 V and 42 MHz. This chip is capable of processing 1920times1080 HD video at 30 fps. For previous works, the tile-level pipeline scheduling is used between the discrete wavelet transform (DWT) and embedded block coding (EBC). For a tile with size 256times256, it costs 175 kB on-chip SRAM for the architectures using on-chip tile memory or costs 310 MB/s SDRAM bandwidth for the architectures using off-chip tile memory. In this design, a level-switched scheduling is developed to eliminate tile memory and the DWT and the EBC are pipelined at pixel-level. This scheduling eliminates 175 kB on-chip SRAM and 310 MB/s off-chip SDRAM bandwidth. The level-switched DWT (LS-DWT) and the code-block switched EBC (CS-EBC) are developed to enable this scheduling. The codec functions are realized on an unified hardware, and hardware sharing between encoder and decoder reduces silicon area by 40%