Yun Nan Chang
University of Minnesota
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yun Nan Chang.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 2003
Yun Nan Chang; Keshab K. Parhi
This paper presents an efficient VLSI architecture of the pipeline fast Fourier transform (FFT) processor based on radix-4 decimation-in-time algorithm with the use of digit-serial arithmetic units. By combining both the feedforward and feedback commutator schemes, the proposed architecture can not only achieve nearly 100% hardware utilization, but also require much less memory compared with the previous digit-serial FFT processors. Furthermore, in FFT processors, several modules of ROM are required for the storage of twiddle factors. By exploiting the redundancy of the factors, the overall ROM size can be effectively reduced by a factor of 2.
IEEE Journal of Solid-state Circuits | 2000
Yun Nan Chang; Hiroshi Suzuki; Keshab K. Parhi
This paper presents a low-power bit-serial Viterbi decoder chip with the code rate r=1/3 and the constraint length K=9 (256 states) for next generation wireless communication applications. The architecture of the add-compare-select (ACS) module is based on the bit-serial arithmetic and implemented with the pass transistor logic circuit. A cluster-based ACS placement and state metric routing topology is described for the 256 bit-serial ACS units, which achieves very high area efficiency. In the trace-back operation, a power efficient trace-back scheme, allowing higher memory read access rate than memory write in a time-multiplexing method, is implemented to reduce the number of iterations required to generate a decoded output. In addition, a low-power application-specific memory suitable for the function of survivor path memory has also been developed. The chips core, implemented using 0.5-/spl mu/m CMOS technology, contains approximately 200 K transistors and occupies 2.46 mm by 4.17 mm area. This chip can achieve the decode rate of 20 Mb/s under 3.3 V and 2 Mb/s under 1.8 V. The measured power dissipation at 2 Mb/s under 1.8 V is only about 9.8 mW. The Viterbi decoder presented here can be applied to next generation wide-band code division multiple access (W-CDMA) systems.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 1998
Yun Nan Chang; Janardhan H. Satyanarayana; Keshab K. Parhi
Digit-serial implementation styles are best suited for implementation of digital signal processing systems which require moderate sampling rates. Digit-serial architectures obtained using traditional unfolding techniques cannot be pipelined beyond a certain level because of the presence of feedback loops. In this paper, an alternative approach for the design of the digit-serial architectures is presented based on a novel design methodology. This methodology permits bit-level pipelining of the digit-serial architectures by moving all feedback loops to the last stage of the design. This enables bit-level pipelining of digit-serial architectures, thereby achieving sample speeds close to corresponding bit-parallel multipliers with lower area. This increased sample speed can be traded with reduction in power supply voltage resulting in significant reduction in power consumption. The proposed approach is applied to the design of various multipliers which form the backbone of digital signal processing computations. The results show that for transformed multipliers with smaller digit sizes (/spl les/4), the singly-redundant multiplier consumes the least power, and for larger digit sizes, the type-I multiplier consumes the least power. It is also found that the optimum digit size for least power consumption in type-I and type-III multipliers is /spl sim//spl radic/(2W), where W represents the word length. Among the bit-level pipelined digit-serial multipliers, it is found that the redundant multiplier offers the best choice in terms of both latency and power consumption.
international conference on acoustics, speech, and signal processing | 2013
Yun Nan Chang; Keshab K. Parhi
Stochastic computing has recently gained attention due to its fault-tolerance property. In stochastic computing, numbers are represented by probabilities of sequences. This paper addresses implementation of inner products and digital filters using stochastic logic. Straightforward implementations of stochastic inner products and digital filters lead to significantly large output error. To overcome this, this paper proposes a novel scaling method for efficient stochastic logic implementations of inner products and digital filters. By incorporating the filter coefficients into the probability of the selection signals of the multiplexors, the proposed weighted summation circuit can achieve better signal scaling with lower cost than the one derived from a traditional structure. This paper also presents how to vary the seeds in stochastic filters in order to reduce the correlation. Implementing IIR filters using stochastic logic limits possible pole locations. To overcome this, a new stochastic IIR filter structure is presented that includes a binary multiplier and stochastic-to-binary and binary-to-stochastic converters. Our experimental results show that the proposed architecture for the inner-product unit can lead to more than 12 times reduction in the error-to-power ratio. The stochastic FIR filters can perform the desired filtering function, but their accuracy degrades with the increase of filter order. The direct-form stochastic IIR filters may fail for large filter orders, but their performance can be improved by using cascade-form filter architecture.
international conference on computer design | 1998
Yun Nan Chang; Keshab K. Parhi
This paper presents a fast highly regular digit-serial complex-number multiplier-accumulator (CMAC) architecture which is well suited for VLSI implementations. This paper makes two contributions. First, several complex-number representation schemes are discussed. It is shown that the real-imaginary alternate (RIA) scheme is the best among all representation schemes and the prior designs of CMACs based on the radix-(2j) redundant complex number system (RCNS) are not efficient with respect to hardware complexity and processing speed. Second, digit-serial CMAC architectures which can be pipelined at fine-grain level to increase the throughput rate are designed based on carry-save configuration.
signal processing systems | 1999
Yun Nan Chang; Keshab K. Parhi
This paper presents an efficient implementation of the pipeline FFT processor based on the radix-4 decimation-in-time algorithm with the use of digit-serial arithmetic units. By splitting the sequential input sample into parallel digit-serial data streams, the proposed architecture can not only achieve nearly 100% hardware utilization, but also require much less memory compared with the previous digit-serial FFT processors. Furthermore, in FFT processors, several modules of ROM are required for the storage of twiddle factors. By exploiting the redundancy of the factors, the overall ROM size can be effectively reduced by a factor of 2.
international symposium on circuits and systems | 1997
Yun Nan Chang; Janardhan H. Satyanarayana; Keshab K. Parhi
Digit-serial implementation styles are best suited for implementation of digital signal processing systems which require moderate sampling rates. Digit-serial multipliers obtained using traditional unfolding techniques cannot be pipelined beyond a certain level because of the presence of feedback loops. In this paper, an alternative approach for the design of digit-serial multipliers is presented based on a novel cell replacement transformation. This transformation permits bit-level pipelining of the digit-serial multipliers thereby achieving sample speeds close to corresponding bit-parallel multipliers with significantly lower area. This increased sample speed can be traded with reduction in power supply voltage resulting in significant reduction in power consumption. The results show that for smaller digit-sizes (/spl les/4), the type-II multiplier consumes the least power and for larger digit-sizes, the type-I multiplier consumes the least power. It is also found that the optimum digit-size for least power consumption in type-I and type-III multipliers is /spl sim//spl radic/2W, where W represents the word length. The proposed digit-serial multipliers consume on an average 1.75 times lower power than the traditional digit-serial architectures for the non-pipelined case, and about 15 times lower power for the bit-level pipelined case.
custom integrated circuits conference | 1999
Hiroshi Suzuki; Yun Nan Chang; Keshab K. Parhi
This paper presents a low-power bit-serial Viterbi decoder chip with the coding rate r=1/3 and the constraint length K= 9 (256 states). This chip is targeted for high speed convolutional decoding for next generation wireless applications. The Add-Compare-Select (ACS) units have been designed using bit-serial arithmetic and a power efficient trace-back scheme and an application-specific memory have been developed for the trace-back operation. The chip was implemented using 0.5 /spl mu/m CMOS technology and is operative at 20 Mbps under 3.3 V and at 2 Mbps under 1.8 V. The power dissipation is only 9.8 mW at 2 Mbps operation under 1.8 V.
international conference on computer design | 1996
Janardhan H. Satyanarayana; Keshab K. Parhi; Leilei Song; Yun Nan Chang
The paper presents a systematic theoretical approach for the analysis of bounds on power consumption in Baugh-Wooley, binary tree and Wallace tree multipliers. This is achieved by first developing state transition diagrams (STDs) for the sub circuits making up the multipliers. The STD is comprised of states and edges, with the edges representing a transition (switching activity) from one state to another in the sub circuit. Then, maximum (minimum) energy values associated with the edges constituting the STDs are used to derive the zipper (lower) bound in both non pipelined and p-bit level pipelined multipliers. It is shown that as p is decreased, the upper bound approaches the lower bound. Moreover, based on the theoretical analysis we conclude that the upper bound in a Baugh-Wooley multiplier has a cubic dependence on the word length, while that in a binary tree multiplier has a quadratic dependence on the word length.
IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing | 2000
Yun Nan Chang; Keshab K. Parhi
The authors present a fast highly regular digit-serial complex multiplier (CMUL) architecture which is well suited for VLSI implementations. They make two contributions. First, several complex-number representation schemes are discussed. It is shown that the proposed real-imaginary alternate scheme is the best among all representation schemes and the prior designs of CMULs based on the radix-(2j) Redundant Complex Number System (RCNS) are not efficient with respect to hardware complexity and processing speed. Second, digit-serial CMUL architectures which can be pipelined at fine-grain level to increase the throughput rate are designed based on the carry-save configuration. The proposed design methodology can also result in low-power dissipation due to the reduced wiring complexity and glitching activity.