Chia-Sheng Wen
National Sun Yat-sen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chia-Sheng Wen.
international symposium on vlsi design, automation and test | 2013
Shen-Fu Hsiao; Po-Han Wu; Chia-Sheng Wen; Li-Yao Chen
Recent OpenGL ES 2.0 API Specification for embedded systems graphics operations requires programmable vertex shaders to process vertex data. In order to facilitate 3D coordinate transformation and lighting operations, vertex shaders usually contain single instruction multiple data (SIMD) datapath and a special function unit (SFU). In this paper, we present a new design of the vertex shader processor in which a recently proposed non-uniform segmentation is adopted in the design of the special function unit in order to reduce the sizes of lookup tables (LUTs). Both fixed-point and floating-point arithmetic are supported to satisfy the requirements of various precisions and ranges. Compared with recent similar implementations, the proposed design has satisfactory energy efficiency with performance normalized by power consumption.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2012
Shen-Fu Hsiao; Hou-Jen Ko; Chia-Sheng Wen
A new function-evaluation algorithm is presented using a two-level approximation scheme. In the first level, piecewise degree-one polynomial is used for initial approximation to obtain the so-called normalized difference functions that are similar in shape. Then, a shared normalized difference function is computed to achieve the target precision in the second level of refined approximation. We also perform the error analysis and bit-width optimization with two different design goals: area optimization and ROM optimization. Experimental results show that the proposed ROM-optimized architecture, when used in the multifunction evaluator for computing several elementary arithmetic functions on the same hardware, has significant area saving compared to previous approaches.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2010
Shen-Fu Hsiao; Ming-Yu Tsai; Chia-Sheng Wen
This brief presents a logic synthesis flow that depends on the popular Synopsys Design Compiler to perform logic translation and minimization based on the standard cell library with both pass transistor logic (PTL) and CMOS logic cells. The hybrid PTL/CMOS logic synthesis can generate appropriate circuits considering various design constraints. The proposed multilevel PTL logic cells are automatically constructed from only a few basic cells. Postlayout simulations with UMC 90-nm technology are presented based on the standard cell library with pure PTL, pure CMOS, or hybrid PTL/CMOS cells. Experimental results show that, in most cases, pure PTL circuits have smaller area and power, whereas CMOS circuits, in general, have smaller delay.
IEEE Transactions on Very Large Scale Integration Systems | 2013
Shen-Fu Hsiao; Hou-Jen Ko; Yu-Ling Tseng; Wen-Liang Huang; Shin-Hung Lin; Chia-Sheng Wen
In the piecewise function evaluation with polynomial approximation, nonuniform segmentation can effectively reduce the size of lookup tables for some arithmetic functions compared to uniform segmentation approaches, at the cost of the extra segment address (index) encoder that results in area and delay overhead. Also, it is observed that the nonuniform segmentation reflects a design tradeoff between the ROM size and the area cost of the subsequent arithmetic computation hardware. In this paper, we propose a new nonuniform segmentation method that searches for the optimal segmentation scheme with the goal of minimized ROM, total area, or delay. For some high-variation arithmetic functions, the proposed segmentation method achieves significant area reduction compared to the uniform segmentation method. We also demonstrate the design tradeoff among uniform and nonuniform segmentation, and degree-one and degree-two polynomial approximations, with respect to precision ranging from 12 to 32 bits for the elementary function of reciprocal.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2015
Shen-Fu Hsiao; Po-Han Wu; Chia-Sheng Wen; Pramod Kumar Meher
Table-lookup-and-addition methods provide multiplierless function evaluation using multiple lookup tables and a multioperand adder. In spite of their high-speed operation, they are only practical in low-precision applications due to the fast increase in table size with precision width. In this brief, we present two methods for table size reduction by decomposing the original table of initial values into two or three tables with fewer entries and/or smaller bit width. The proposed table decompositions do not incur any extra rounding errors so that the original table can be completely recovered. Experimental results demonstrate significant saving of table sizes compared with the best of the prior designs of the multipartite methods.
asia pacific conference on circuits and systems | 2012
Shen-Fu Hsiao; Chi-Guang Lin; Po-Han Wu; Chia-Sheng Wen
Several types of asynchronous bus interface units for AMBA AHB bus are designed so that an OpenGL ES 2.0 vertex shader (VS) processor can communicate with other hardware units of a 3D graphics system via AHB bus working under different frequencies. We consider the data-write and data-read operations separately for the VS functioning as a master or as a slave. The first types AHB wrapper design is direct implementation of the required AHB interface signals. The second and third types of wrapper designs are based on the implementation of Open Core Protocol (OCP) interface signals. We have made comparisons of different implementations for both single mode and burst mode bus transactions. The multi-clock domain wrapper design has been used in the design of a 3D graphics SoC and has been verified on FPGA board.
international symposium on next-generation electronics | 2010
Shen-Fu Hsiao; Chia-Sheng Wen; Ming-Yu Tsai; Ming-Chih Chen
Exclusive-OR (XOR) gate is one of the critical components in many applications such as cryptography. In this paper, we present an efficient multi-input XOR circuit design based on pass-transistor logic (PTL). A synthesis algorithm is developed to efficiently generate the PTL-based multi-input XOR circuits. Both pre-layout and post-layout simulation results show that our proposed multi-input XOR design outperforms static CMOS design. The multi-input XOR circuits are also used to design the transformations in the Advanced Encryption Standard (AES).
international symposium on next-generation electronics | 2010
Shen-Fu Hsiao; Chia-Sheng Wen; Ming-Yu Tsai
A hybrid method of computing reciprocal is presented by combining the degree-two piecewise polynomial interpolation method and a Newton-Raphson iteration. The degree-two piecewise method is used to obtain an initial approximation for the subsequent Newton-Raphson operations. Architecture for the proposed hybrid method is designed considering the hardware sharing of the composing multipliers in the sub-word level, leading to significant improvement in area cost compared to conventional table-based designs and other hybrid approaches.
international symposium on electronic system design | 2010
Pramod Kumar Meher; Shen-Fu Hsiao; Chia-Sheng Wen; Ming-Yu Tsai
We have designed pass-transistor logic (PTL)-based D flip-flop and T flip-flop to be used in finite field multiplication. Since both CMOS and PTL have their respective advantages in area, speed, and power, we have compared two different designs (conventional implementation and improved implementation) of serial-parallel finite field multiplication using pure CMOS, pure PTL, and hybrid PTL/CMOS logic. Experimental results with UMC 90nm technology show that the improved architecture of finite field multiplication composed of PTL-based T flip-flops can substantially reduce the total area, delay and power. Furthermore, the proposed cell-based design flow with hybrid PTL/CMOS cell library can be used to generate any other combinational and sequential logic circuits.
international symposium on vlsi design, automation and test | 2008
Shen-Fu Hsiao; Ming-Yu Tsai; Chia-Sheng Wen
In the past two decades, pass transistor logic has been shown to have smaller power and area cost compared to traditional CMOS logic for some applications. Some important issues related to the design of pass transistor cell library are discussed in this paper. First, the transistor sizing for the special inverter circuit in the cell library is addressed, which is quite different from the sizing of conventional CMOS inverter. Second, we create new cells that merge combinations of an inverters and some multiplexers in order to reduce the physical layout area. Experimental results show that the layout compaction method also reduces the delay and dynamic power. The proposed transistor sizing and layout compaction methods could be useful guidelines in designing the basic cells required in pass-transistor logic synthesis.