Chih-Hao Sun
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chih-Hao Sun.
international solid-state circuits conference | 2003
Hsiang-Hui Chang; Chih-Hao Sun; Shen-Iuan Liu
The DLL, in 0.35/spl mu/m CMOS, uses the shifted averaging VCDL to reduce the mismatch-induced timing error among the delay stages without extra hardware. The DLL can generate precise multiphase outputs with improved duty cycle, reduced skew errors, and lowered jitter. Compared with a conventional DLL, this design improves the peak-to-peak jitter by a factor of 1.4 at 150MHz.
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2011
Tse-Wei Chen; Chih-Hao Sun; Hsiao-Hang Su; Shao-Yi Chien; Daisuke Deguchi; Ichiro Ide; Hiroshi Murase
A power-efficient K-Means hardware architecture that can automatically estimate the number of clusters in the clustering process is proposed. The contributions of this work include two main aspects. The first is the integration of the hierarchical data sampling in the hardware to accelerate the clustering speed. The second is the development of the “Bayesian-Information-Criterion (BIC) Processor” to estimate the number of clusters of K-Means. The architecture of the “BIC Processor” is designed based on the simplification of the BIC computations, and the precision of the logarithm function is also analyzed. The experiments show that the proposed architecture can be employed in different multimedia applications, such as motion segmentation and edge-adaptive noise reduction. Besides, the gate count of the hardware is 51 K with the 90-nm complimentary metal-oxide-semiconductor technology. It is also shown that this work can achieve high efficiency compared with a GPU, and the power consumption scales well with the number of clusters and the number of dimensions. The power consumption ranges between 10.72 and 12.95 mW in different modes when the operating frequency is 233 MHz.
international symposium on circuits and systems | 2008
Tse-Wei Chen; Chih-Hao Sun; Jiun-Ying Bai; Han-Ru Chen; Shao-Yi Chien
K-means is a clustering algorithm that is widely applied in many fields, including pattern classification, multimedia analysis, and image retrieval. Due to real-time requirements of image segmentation in embedded systems, it is necessary to accelerate K-means algorithm by hardware implementations. The contribution of this paper includes a series of K-means hardware analyses and a newly proposed SIP for image segmentation in SoC environments. Experiments show that the proposed SIP has the maximum clock speed 200 MHz with TSMC 0.18 mum technology, and that it can be successfully used for image segmentation on an FPGA board with AMBA AHB.
IEEE Transactions on Multimedia | 2009
Chih-Hao Sun; You-Ming Tsao; Shao-Yi Chien
Texture compression is an important technique in graphics processing units (GPUs) for saving memory bandwidth. This paper presents a high-quality mipmapping texture compression (MTC) system with alpha maps. Based upon the wavelet transform, a hierarchical approach is adopted for mipmapping textures in the YCbCr color space and alpha channel. By inspecting the similarity between the alpha and luminance channels, the two channels are efficiently encoded together with linear prediction in the differential mode. In addition, the split mode manages textures with no strong relationship between the alpha and luminance channels. A layer overlapping technique is also proposed to reduce the texture memory bandwidth. Simulation results show that MTC can reduce the texture access traffic by 80% to 90% and provides high image quality as well. Compared with DirectX texture compression (DXTC), the most well-known texture compression with alpha maps, MTC reduces the texture access bandwidth by 30% more. VLSI implementation results show that the hardware cost of MTC is similar to that of DXTC and that MTC is suitable for integration in GPUs to provide high-quality textures with low memory bandwidth requirements.
international conference on multimedia and expo | 2009
Chih-Hao Sun; You-Ming Tsao; Ka-Hang Lok; Shao-Yi Chien
The rasterization stage in a graphics processing unit (GPU), which consists of triangle setup, rasterization, and parameter interpolation with plane equations, always requires huge operations and is usually the bottleneck of the performance. For real-time applications, a Universal Rasterizer (UR) with edge equations and a tile-scan triangle traversal algorithm are proposed for low cost graphics rendering. In UR, the basic functions for parameter interpolation and rasterization can be executed with a universal shared hardware to reduce the cost. The result shows that it can minimize the processing time of triangle traversal and guarantee no reiteration when traverse. With the hardware sharing and architecture design techniques of pipelining and scheduling, it can achieve the real-time requirements for graphics applications with reasonable hardware cost.
international soc design conference | 2008
Chia-Ming Chang; Shao-Yi Chien; You-Ming Tsao; Chih-Hao Sun; Ka-Hang Lok; Yu-Jung Cheng
This paper presents a graphics processing unit with energy-saving techniques. Several techniques and architectures are proposed to achieve high performance with low power consumption. First of all, low power core pipeline is designed with 2-issue VLIW architecture to reduce power consumption while achieving the processing capability of 400MFLOPS or 800MOPS. In addition, inter/intra adaptive mutli-threading scheme can increase the performance by increasing hardware utilization, and the proposed configurable memory array architecture can reduce off-chip memory accessing frequency by caching both input data and output results. Furthermore, for graphics applications, a geometry-content-aware technique called early-rejection-after-transformation is proposed to remove redundant operations for invisible triangles. As for circuit level power reduction, power-aware frequency scaling is proposed to further reduce the power consumption.
high performance graphics | 2009
Chih-Hao Sun; Ka-Hang Lok; You-Ming Tsao; Chia-Ming Chang; Shao-Yi Chien
In order to increase the capability of mobile GPUs in image/video processing, a multi-purpose configurable filtering unit (CFU), which is a new configurable unit for image filtering on stream processing architecture, is proposed in this paper. CFU is located in the texture unit of a GPU and can efficiently execute many kinds of filtering operations by directly accessing multi-bank texture cache and specially-designed data-paths. The following programmabilities are supported in our proposed CFU. First, different sampling point windows can be selected by programmers. Besides, the arithmetic type of the filter can be chosen. Not only original texture filtering functions and finite impulse response (FIR) filters, morphological operations in computer vision are also embedded in CFU. Furthermore, the weighting coefficients of FIR filters and morphological operations can be defined by programmers. Simulation results show that in average, compared with conventional texture unit, 25.35% of processing time in H.264/AVC motion compensation and 58.6% of processing time in video segmentation can be reduced with the assistance of CFU.
symposium on vlsi circuits | 2008
You-Ming Tsao; Chih-Hao Sun; Yu-Cheng Lin; Ka-Hang Lok; Chia-Jung Hsu; Shao-Yi Chien; Liang-Gee Chen
A 26 mW 6.4 GFLOPS multi-core stream processor for mobile applications is implemented in 90 nm CMOS technology. A unified stream processing architecture with power-aware frequency scaling and adaptive task scheduling techniques are proposed to reduce the power consumption and increase the performance to achieve the performance of 200 Mvertices/s and 400 Mpixels/s in 3D graphic applications.
Analog Integrated Circuits and Signal Processing | 2004
Chih-Hao Sun; Shen-Iuan Liu
A digital synchronous mirror delay combined with an analog delay-locked loop (DLL) is introduced. Under the influence of process, voltage, temperature, and load variations, the conventional digital synchronous mirror delay could not compensate the static phase error because of its digital type and open loop by nature. The proposed circuit can compensate the delay mismatch between the output buffer and the inner stage, which is caused by the different loading conditions. It can improve the noise immunity from supply variations. Moreover, because of the tracking property of the DLL, the static phase error and jitter could also be reduced. The proposed circuit has been fabricated by a CMOS 0.35-μm one-poly four-metal process and the whole chip area is 1.47 × 1.07 mm2 including I/O pad peripherals. The measured peak-to-peak jitter is 16.4 ps at supply voltage of 3.3 V and frequency of 300 MHz. The power consumption of the entire chip is 16.5 mW for analog part and 84 mW for digital part. The comparisons between the proposed circuit and the conventional digital synchronous mirror delay are also demonstrated.
symposium on vlsi circuits | 2003
Hsiang-Hui Chang; Chih-Hao Sun; Shen-Iuan Liu
The low jitter Butterworth delay-locked loops (DLLs) are presented in this paper. The proposed Butterworth DLLs can suppress both the jitters generated by the input noise and the voltage-controlled delay line (VCDL) noise without stability considerations. Theoretically, the proposed Butterworth 2/sup nd/-order DLL and 3/sup rd/-order one could reduce the rms jitter due to the VCDL by a factor of /spl radic/2 and 2, respectively. In addition, a technique called dynamic bandwidth-adjusting scheme (DBAS) is adopted to shorten the lock time without compromising the jitter performance. The conventional DLL and the proposed ones are simultaneously fabricated at the same die in a CMOS 0.35-um one-poly four-metal process. Compared with the conventional DLL, the measured rms jitters of the proposed DLLs can be improved by a factor of 1.40 and 1.95, respectively, with an input frequency of 125 MHz. The maximum power consumption of the proposed DLLs is 32 mW.