Ching Chuen Jong
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ching Chuen Jong.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2007
Fei Xu; Chip-Hong Chang; Ching Chuen Jong
In this paper, a new efficient algorithm is proposed for the synthesis of low-complexity finite-impulse response (FIR) filters with resource sharing. The original problem statement based on the minimization of signed-power-of-two (SPT) terms has been reformulated to account for the sharable adders. The minimization of common SPT (CSPT) terms that were considered in our proposed algorithm addresses the optimization of the reusability of adders for two major types of common subexpressions, together with the minimization of adders that are needed for the spare SPT terms. The coefficient set is synthesized in two stages. In the first stage, CSPT terms in the vicinity of the scaled and rounded canonical signed digit (CSD) coefficients are allocated to obtain a CSD coefficient set, with the total number of CSPT terms not exceeding the initial coefficient set. The balanced normalized peak ripple magnitude due to the quantization error is fulfilled in the second stage by a local search method. The algorithm uses a common-subexpression-based hamming weight pyramid to seek for low-cost candidate coefficients with preferential consideration of shared common subexpressions. Experimental results demonstrate that our algorithm is capable of synthesizing FIR filters with the least CSPT terms compared with existing filter synthesis algorithms.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2005
Fei Xu; Chip-Hong Chang; Ching Chuen Jong
In this paper, a new algorithm, called contention resolution algorithm for weight-two subexpressions (CRA-2), based on an ingenious graph synthesis approach has been developed for the common subexpression elimination of the multiplication block of digital filter structures. CRA-2 provides a leeway to break away from the local minimum and the flexibility of varying optimization options through a new admissibility graph. It manages two-bit common subexpressions and aims at achieving the minimal logic depth as the primary goal. The performances of our proposed algorithm are analyzed and evaluated based on benchmarked finite-impulse-response filters and randomly generated data. It is demonstrated that CRA-2 achieves the shortest logic depth with significant reduction in the number of logic operators compared with other reported algorithms.
IEEE Transactions on Circuits and Systems | 2008
Fei Xu; Chip-Hong Chang; Ching Chuen Jong
Multiple constant multiplications (MCM) have been a core operation in many digital signal processing applications. In this paper, an efficient generalized contention resolution algorithm (CRA) is proposed to eliminate three broad categories of reusable common subexpressions in MCM. The idea is to revert a precedential decision of suboptimal common subexpressions by a localized cost function evaluation when there is a conflict between two competitive subexpressions. The proposed derivatives of the basic CRA are versatile in that they are capable of satisfying search for both intra- and intercoefficient subexpressions, in any legitimate composition of horizontal, vertical and oblique subexpressions. As the algorithms expand the common subexpressions to higher-weight only when there is cost saving, the logic depth can be controlled by constraining the weights of the subexpressions. The variants of CRA follow an important tenet of good heuristic that significant improvement in the solution quality is attained with increased problem size but the computational time remains well bounded. Experimental results with both benchmark filters and randomly generated coefficient sets are analyzed and compared with a number of well known common subexpression elimination methods to demonstrate the effectiveness and efficiency of our proposed approach.
IEEE Transactions on Very Large Scale Integration Systems | 2011
Manas Ranjan Meher; Ching Chuen Jong; Chip-Hong Chang
A novel approach of designing serial-serial hybrid multiplier is proposed for applications with high data sampling rate ( ≥4 GHz). The conventional way of partial product formation is revamped. Our proposed technique effectively forms the entire partial product matrix in just n sampling cycles for an n × n multiplication instead of at least 2n cycles in the conventional serial-serial multipliers. It achieves a high bit sampling rate by replacing conventional full adders and 5:3 counters with asynchronous 1s counters so that the critical path is limited to only an and gate and a D flip-flop (DFF). The use of 1s counter to column compress the partial products preliminarily reduces the height of the partial product matrix from n to [log2n] +1, resulting in a significant complexity reduction of the resultant adder tree. The proposed hybrid column compressed multiplier consists of a serial-serial data accumulation unit and a parallel carry save adder (CSA) array that occupies approximately 35% and 58% less silicon area than the full CSA array multiplier with operands of wordlength 32 × 32 and 64 × 64, respectively. The post-layout simulation results based on 90-nm seven metal single poly CMOS process technology shows that our 64 × 64 multiplier dissipates 39% less average power at a sampling rate of 4 GHz, and has only 11% additional delay penalty to complete a multiplication compared to the conventional fully parallel CSA array multiplier.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2013
Yusong Hu; Ching Chuen Jong
In this brief, we propose a new parallel lifting-based 2-D DWT architecture with high memory efficiency and short critical path. The memory efficiency is achieved with a novel scanning method that enables tradeoff of external memory bandwidth and on-chip memory. Based on the data flow graph of the flipped lifting algorithm, processing units (PUs) are developed for maximally utilizing the inherent parallelism. With S number of PUs, the throughput can be scaled while keeping the latency constant. Compared with the best existing architecture, the proposed architecture requires less memory. For an N × N image, the proposed architecture consumes a total of only 3N + 24S words of transposition memory, temporal memory, and pipeline registers. The synthesized results in a 90-nm CMOS process show that it achieves better area-delay products than the best existing design by 32.3%, 31.5%, and 27.0% when S = 2, 4, and 8, respectively, and by 26%, 26%, and 22% when the overhead for buffering the required overlapped pixels is taken into account.
asia pacific conference on circuits and systems | 2008
Manas Ranjan Meher; Ching Chuen Jong; Chip-Hong Chang
This paper presents a new approach to serial/parallel multiplier design by using parallel 1psilas counters to accumulate the binary partial product bits. The 1psilas in each column of the partial product matrix due to the serially input operands are accumulated using a serial T-flip flop (TFF) counter. Consequently, the column height is reduced from N to [log2 N]+1. This logarithmic reduction results in a very small carry save adder (CSA) array or tree required before the two final summands are added up to obtain the final product. The counters can be clocked at very high frequency (around 1.5 GHz as dictated mainly by the TFF propagation delay) and the accumulation frequency is independent of the operand size. The proposed accumulation method achieves 33%, 38%, 43% gain in speed respectively for 31, 63, 127 operands accumulators and on average 42% reduction in power consumption over CSA based accumulation implemented in 0.18 mum CMOS technology.
international symposium on circuits and systems | 2012
Joshua Yung Lih Low; Ching Chuen Jong; Jeremy Yung Shern Low; Thian Fatt Tay; Chip-Hong Chang
A novel non-iterative circuit for computing integer square root based on logarithm is proposed in the paper. Mitchells methods are used for the logarithmic and antilogarithmic conversions. The proposed method merges two conversion stages into a single one to achieve better accuracy with a compact architecture. Hence, the circuit size and latency are reduced. Compared to an existing design based on the modified Dijkstra algorithm used in a coherent receiver, the proposed design is either 8 times smaller or 9 times faster for 16-bit integer input.
international symposium on circuits and systems | 2013
Joshua Yung Lih Low; Ching Chuen Jong
A novel non-iterative circuit for computing division based on logarithm is proposed in the paper. Mitchell-based methods are used for the logarithmic and antilogarithmic conversions. Merging the conversion stages in the implementation is not possible if the existing antilogarithmic conversion algorithms are used. Thus, the critical path has at least two carry propagate adders (CPAs). This work introduces a new antilogarithmic algorithm to merge the two conversion stages into a single one to remove one of the two CPAs. Compared to the best existing Mitchell-based logarithmic division computation method used in a 3-D graphic system, the proposed design achieves improvements by 45.4% and 34.8%, respectively, in computation speed and area-delay product with an area overhead of 19.5%.
asia pacific conference on circuits and systems | 2012
Howard Tang; Joshua Yung Lih Low; Jeremy Yung Shern Low; Liter Siek; Ching Chuen Jong; Chip-Hong Chang
This paper presents a new 16-bit analog-to-residue converter (ARC) for a three moduli set {26-1, 26, 26+1} RNS with a dynamic range of 18 bits. Based on dual-slope integrating principle, direct conversion from analog to residue representation is achieved with only three modulo counters after the voltage sensing circuits. By eliminating the costly binary-to-residue converter, the proposed ARC saves 81.6% of area in comparison with the conventional two-stage architecture consisting of an integrating ADC and a RNS forward converter.
asia pacific conference on circuits and systems | 2010
Joshua Yung Lih Low; Ching Chuen Jong
This paper presents a high accuracy Mitchell-based logarithmic conversion algorithm suitable for integrated circuit implementation. A novel technique named range mapping is proposed to compress the range of approximation to one-quarter of that of the Mitchells fraction, m. A 3-region piecewise linear approximation is then applied on the compressed range. Simulation results show that the proposed algorithm achieves both low absolute error and percentage error of 0.0037999 and 0.064002% respectively.