Kwen-Siong Chong
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kwen-Siong Chong.
IEEE Transactions on Very Large Scale Integration Systems | 2005
Kwen-Siong Chong; Bah-Hwee Gwee; Joseph Sylvester Chang
We describe a micropower 16times16-bit multiplier (18.8 muW/MHz @1.1 V) for low-voltage power-critical low speed (les5 MHz) applications including hearing aids. We achieve the micropower operation by substantially reducing (by ~62% and ~79% compared to conventional 16times16-bit and 32times32-bit designs respectively) the spurious switching in the Adder Block in the multiplier. The approach taken is to use latches to synchronize the inputs to the adders in the Adder Block in a predetermined chronological sequence. The hardware penalty of the latches is small because the latches are integrated (as opposed to external latches) into the adder, termed the latch adder (LA). By means of the LAs and timing, the number of switchings (spurious and that for computation) is reduced from ~5.6 and ~10 per adder in the adder block in conventional 16times16-bit and 32times32-bit designs respectively to ~2 in our designs. Based on simulations and measurements on prototype ICs (0.35 mum three metal dual poly CMOS process), we show that our 16times16-bit design dissipates ~32% less power, is ~20% slower but has ~20% better energy-delay-product (EDP) than conventional 16times16-bit multipliers. Our 32times32-bit design is estimated to dissipate ~53% less power, ~29% slower but is ~39% better EDP than the conventional general multiplier
IEEE Journal of Solid-state Circuits | 2007
Kwen-Siong Chong; Bah-Hwee Gwee; Joseph Sylvester Chang
Two 128-point 16-bit radix-2 FFT/IFFT processors based on synchronous-logic (sync) and asynchronous-logic (async) for low voltage (1.1-1.4 V) energy-critical low-speed hearing aids are described. The two processors herein are designed with the same function and similar architecture, and the emphasis is energy efficacy. The async approach, on average, features ~37% lower energy per FFT/IFFT computation than the sync approach but with ~10% larger IC area penalty and an inconsequential 1.4 times worse delay; the async design can be designed to be 0.24 times faster and with largely the same energy dissipation if the matched delay elements and the latch controllers therein are better optimized. In this low-speed application, the lower energy feature of the async design is not attributed to the absence of the clock infrastructure but instead due to the adoption of established and proposed async circuit designs, resulting in reduced redundant operations and reduced spurious/glitch switching, and to the use of latches. The prototype async FFT/IFFT processor (in a 0.35-mum CMOS process) can be operated at 1.0 V and dissipates 93 nJ.
IEEE Journal of Solid-state Circuits | 2012
Kwen-Siong Chong; Kok-Leong Chang; Bah-Hwee Gwee; Joseph Sylvester Chang
We design an Acoustic Digital Signal Processor (ADSP) SoC, the primary signal processing module of an acoustic signal detection system, based on two design approaches: fully-synchronous (Fully-Sync), and globally-asynchronous-locally-synchronous (GALS). The emphasis of the ADSP designs is low power operation where both designs embody modular-level and circuit-level clock gating techniques. For sake of fair benchmarking, both ADSPs have identical functionality, are designed using the same 130 nm CMOS process, and largely embody the same library cells (save that for the signaling protocols in the GALS ADSP). The GALS ADSP is substantially more power-efficient (the Fully-Sync ADSP dissipates 1.9× more power @ nominal VDD = 1.2 V) and the only cost is the marginally higher (1.02×) IC area. Its higher power efficiency is largely attributed to the exploitation of asynchronous signaling between circuit modules by means of more finely-grained partitioning of the clock domains; intra-circuit signaling therein remains fully-sync. This provides for the ensuing simplification of the clocking infrastructure and subsequent reduction of the global clock rate. The prototype GALS ADSP is able to operate to specifications throughout the lifespan of the battery (VDD = 0.9 V-1.4 V, in part depicting Dynamic Voltage Scaling attributes) and at VDD = 1.2 V, it dissipates 186 μW.
international symposium on circuits and systems | 2009
Tong Lin; Kwen-Siong Chong; Bah-Hwee Gwee; Joseph Sylvester Chang
In this paper, a fine-grained power gating technique for an asynchronous-logic pipeline stage is proposed using locally controlled gating transistors. The proposed power gating technique is implemented with minimal control overheads (one additional inverter per pipeline stage for driving PMOS Gating) and delay overheads (within 15% more than the conventional asynchronous-logic pipeline stage). Different types of gating configurations using only PMOS transistor (PMOS Gating), only NMOS transistor (NMOS Gating), and both types of transistors (Dual Gating) are examined and compared. The effectiveness of the proposed power gating technique to the Combinational Block therein with different data input rates is investigated. Based on the computer simulation results, we have found that ≫70% wasted power reduction (including both short-circuit and leakage powers) as compared to the conventional asynchronous-logic pipeline stage can be achieved with all gating configurations. In particular, Dual Gating achieves the best wasted power reduction of 86% for short-circuit power and 99% for leakage power @ 10Mbps input rate.
IEEE Journal of Solid-state Circuits | 2013
Tong Lin; Kwen-Siong Chong; Joseph Sylvester Chang; Bah-Hwee Gwee
We propose a Sub-threshold (Sub-Vt) Self-Adaptive VDD Scaling (SSAVS) system for a Wireless Sensor Network with the objective of lowest possible power dissipation for the prevailing throughput and circuit conditions, yet high robustness and with minimal overheads. The effort to achieve the lowest possible power operation is by means of adjusting VDD to the minimum voltage (within 50 mV) for said conditions. High robustness is achieved by adopting the Quasi-Delay-Insensitive (QDI) asynchronous-logic protocols where the circuits therein are self-timed, and by the embodiment of our proposed Pre-Charged-Static-Logic (PCSL) design approach; when compared against competing approaches, the PCSL is most competitive in terms of energy/operation, delay and IC area. By exploiting the already existing request and acknowledge signals of the QDI protocols, the ensuing overhead of the SSAVS is very modest. The filter bank embodied in the SSAVS is shown to be ultra-low power and highly robust. When benchmarked against the competing conventional Dynamic-Voltage-Frequency-Scaling (DVFS) synchronous-logic counterpart, no one system is particularly advantageous when the operating conditions are known. However, when the competing DVFS system is designed for the worst-case condition, the proposed SSAVS system is somewhat more competitive, including uninterrupted operation while its VDD self-adjusts to the varying conditions.
IEEE Journal on Emerging and Selected Topics in Circuits and Systems | 2013
Kok-Leong Chang; Joseph Sylvester Chang; Bah-Hwee Gwee; Kwen-Siong Chong
Microcontrollers play a vital role in embodying intelligence into battery-powered everyday objects to realize the internet of things (IoT). The desirable attributes of such a microcontroller and the like include high energy and area efficiency, and robust error-free operation under dynamic voltage scaling (DVS), workload, process, voltage, and temperature (PVT) variation effects. In this work, a synchronous-logic (S 8051) and a quasi-delay-insensitive asynchronous-logic (A 8051) 8051 microcontroller core are designed and fabricated for full-range DVS from nominal VDD to deep sub-threshold. The performance of the S 8051 and A 8051 are largely comparable at nominal conditions and the entire DVS range, but differs when PVT and workload are varied. At nominal VDD, both the microcontroller cores feature comparable energy and speed, with the electromagnetic interference of the A 8051 ~ 12 dB lower and the area ~ 2 × larger than the S 8051. When DVS is applied, both the microcontroller cores feature comparable energy and speed; the S 8051 requires simultaneous adjustment of clock frequency with VDD. At wide PVT variations, up to ~ 12 × delay margins are required for the S 8051, whereas the A 8051 operates at actual speed. When the workload of both microcontrollers is varied, the A 8051 features lower energy dissipation per workload due to the exploitation of its asynchronous-logic protocols. For IoT applications that incur wide PVT and workload variations, A 8051 is more suitable due to its self-timed nature, whereas when PVT and workload variations are less severe, S 8051 is more suitable due to a smaller IC area.
IEEE Transactions on Circuits and Systems | 2009
Bah-Hwee Gwee; Joseph Sylvester Chang; Yiqiong Shi; Chien Chung Chua; Kwen-Siong Chong
The design of a low-voltage micropower asynchronous (async) signed truncated multiplier based on a shift-add structure for power-critical applications such as the low-clock-rate (<4 MHz) hearing aids is described. The emphases of the design are micropower operation and small IC area, and these attributes are achieved in several ways. First, a maximum of three signed power-of-two terms accompanied with sign magnitude data representation is used for the multiplier operand. Second, the least significant partial products are truncated to yield a 16-bit signed product. An error correction methodology is proposed to mitigate, where appropriate, the arising truncation errors. The errors arising from truncation and the effectiveness of the error correction are analytically derived. Third, a low-power shifter design and an internal latch adder are adopted. Finally, a power-efficient speculative delay line is proposed to time the async operation of the various circuit modules. A comparison with competing synchronous and async designs shows that the proposed design features the lowest power dissipation (5.86 muW at 1.1 V and 1 MHz) and a very competitive IC area (0.08 mm2 using a 0.35-mum CMOS process). The application of the proposed multiplier for realizing a digital filter for a hearing aid is given.
Iet Circuits Devices & Systems | 2007
Kwen-Siong Chong; Bah-Hwee Gwee; Joseph Sylvester Chang
The paper presents a low-voltage (1–1.5 V) 16-bit Booth leapfrog array multiplier with emphasis on low energy dissipation, relatively high speed and small IC area. These attributes are achieved in two ways. First, low (hardware) complexity dynamic adders (DAs) are proposed and they are used to reduce spurious switching in the multiplier. Second, the specificities of the leapfrog architecture are exploited with the use of different output rates of the sum and carry outputs of the proposed DAs. When compared with other array multiplier designs, the proposed multiplier features the lowest energy dissipation and one of the shortest delays, resulting in the lowest energy–delay product. Furthermore, when compared with the reported dynamic array multiplier that features somewhat similar electrical characteristics, the proposed multiplier is advantageous in its substantially smaller (∼33%) IC area. Based on a 0.35 µm dual-poly four-metal CMOS process and at 1 V operation, the proposed multiplier dissipates ∼18 pJ, has a delay of ∼188 ns and occupies 0.11 mm2 of IC area. The proposed design is appropriate for low-voltage energy-critical and IC area-critical applications including hearing aids.
international symposium on circuits and systems | 2002
Kwen-Siong Chong; Bah-Hwee Gwee; Joseph Sylvester Chang
We propose two 2-bit adders, Type-A and Type-B, as the basic cells for three novel low-voltage (1.1 V) micropower 16-bit asynchronous adders: Adder-A, Adder-B and Adder-AB. Adder-A employs Type-A adders and features high speed and low power dissipation. Adder-B employs Type-B adders and has lower speed and features lower power dissipation. By combining Type-A and Type-B, the hybrid Adder-AB design features the same power dissipation as Adder-B but with significantly higher speed. We analytically derive the speed and power dissipation of our designs. We compare our designs with reported synchronous and asynchronous designs and show that our designs have some improved parameters over these reported designs.
Iet Circuits Devices & Systems | 2007
Kwen-Siong Chong; Bah-Hwee Gwee; Joseph Sylvester Chang
Several asynchronous-logic macrocells for a cell library for low-voltage (1.1 V) power-critical applications are described. The intended application is for the realisation of the datapath of an embedded asynchronous digital signal processor in low-voltage power-critical digital hearing instruments where the speed is relatively low, <5 MHz. The macrocells are two 2-bit and three 16-bit adders, a 16times16-bit truncated parallel multiplier and a 16-bit accumulator. Compared to reported 2-bit adders, one of the 2-bit adders features the lowest energy-delay product (EDP), whereas the other features the lowest energy (power/MHz). Among the three proposed 16-bit adders, two of them feature the lowest EDP compared to the reported designs, and their completion detection circuit is very simple (an OR gate). The truncated parallel 16times16-bit multiplier features the lowest energy multiplier in the literature and this is achieved by truncation and by means of a proposed integrated latch-cum-adder (latch adder) that virtually eliminates the spurious switching in the adder block. The accumulator features the lowest energy accumulator, also by means of the latch adder embodied therein. All macrocells are verified by computer simulations and on the basis of measurements on prototype ICs