Walter Huang
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Walter Huang.
IEEE Transactions on Circuits and Systems | 2005
Daniel J. Allred; Heejong Yoo; Venkatesh Krishnan; Walter Huang; David V. Anderson
We present a new hardware adaptive filter architecture for very high throughput LMS adaptive filters using distributed arithmetic (DA). DA uses bit-serial operations and look-up tables (LUTs) to implement high throughput filters that use only about one cycle per bit of resolution regardless of filter length. However, building adaptive DA filters requires recalculating the LUTs for each adaptation which can negate any performance advantages of DA filtering. By using an auxiliary LUT with special addressing, the efficiency and throughput of DA adaptive filters can be of the same order as fixed DA filters. In this paper, we discuss a new hardware adaptive filter structure for very high throughput LMS adaptive filters. We describe the development of DA adaptive filters and show that practical implementations of DA adaptive filters have very high throughput relative to multiply and accumulate architectures. We also show that DA adaptive filters have a potential area and power consumption advantage over digital signal processing microprocessor architectures.
international conference on acoustics, speech, and signal processing | 2004
Daniel J. Allred; Heejong Yoo; Venkatesh Krishnan; Walter Huang; David V. Anderson
In this paper, an FIR adaptive filter implementation, using a multiplier-free architecture, is presented. The implementation is based on distributed arithmetic (DA) which substitutes multiply-and-accumulate operations with a series of look-up-table (LUT) accesses. This can be achieved at the cost of a moderate increase in memory usage. The proposed design performs an LMS-type adaptation on a sample-by-sample basis. This is accomplished by an innovative LUT update using a matched auxiliary LUT. The system is implemented on an FPGA that enables rapid prototyping of digital circuits. Implementation results are provided to demonstrate that a high-speed, low logic complexity LMS adaptive filter can be realized employing the proposed architecture.
IEEE Transactions on Circuits and Systems | 2008
Erhan Ozalevli; Walter Huang; Paul E. Hasler; David Anderson
A reconfigurable implementation of distributed arithmetic (DA) for post-processing applications is described. The input of DA is received in digital form and its analog coefficients are set by using the floating-gate voltage references. The effect of the offset and gain errors on DA computational accuracy is analyzed, and theoretical results for the limitations of this design strategy are presented. This architecture is fabricated in a 0.5-mum CMOS process, and configured as a 16-tap finite impulse response (FIR) filter to demonstrate the reconfigurability and computational efficiency. The measurement results for comb, low-pass, and bandpass filters at 32/50-kHz sampling frequencies are presented. This implementation occupies around 1.125 mm2 of die area and consumes 16 mW of static power. The filter order can be increased at the cost of 0.011 mm2 of die area and 0.02 mW of power per tap.
field-programmable custom computing machines | 2004
Daniel J. Allred; Walter Huang; Venkatesh Krishnan; Heejong Yoo; David V. Anderson
In this paper, an FIR adaptive filter implementation using a multiplier-free architecture is presented. The implementation is based on distributed arithmetic (DA) which substitutes multiply-and-accumulate operations with a series of look-up-table (LUT) accesses. This can be achieved at the cost of a moderate increase in memory usage. The proposed design performs an LMS-type adaptation on a sample-by-sample basis. This is accomplished by an innovative LUT update using a matched auxiliary LUT. The system is implemented on an FPGA that enables rapid prototyping of digital circuits. Implementation results are provided to demonstrate that a high-speed LMS adaptive filter can be realized employing the proposed architecture.
asilomar conference on signals, systems and computers | 2003
Daniel J. Allred; Venkatesh Krishnan; Walter Huang; David V. Anderson
In this paper, a multiplexed multiplier architecture (MMA) for a field programmable gate array (FPGA) implementation of the least mean square (LMS) adaptive filter is developed and presented. In the proposed architecture, hardware multipliers are reused, i.e. multiplexed in time, for both filtering and adaptation. The number of multipliers may be chosen to achieve certain design trade-offs. The design trade-offs considered in this paper include on-chip area, filter size, maximum filter throughput, and power consumption.
international conference on acoustics, speech, and signal processing | 2009
Walter Huang; David V. Anderson
An efficient way for computing the response of an adaptive digital filter is to use sliding-block distributed arithmetic (SBDA). One disadvantage of distributed arithmetic is the amount of memory utilized. By encoding the memory tables in offset binary code (OBC), the size of the memory tables is reduced in half. However, the computational workload remains unchanged. By modifying the computational flow, the computational workload can be reduced by almost half at the expense of slightly more memory. This modified SBDA structure is called SBDA-OBC. It has memory requirements 25%–50% lower than SBDA depending on the size of the sub-filter. In terms of the computational workload, SBDA-OBC is most advantageous for large sub-filters and when the filter is split into few subfilters. In this case, the computational workload is reduced almost in half.
asilomar conference on signals, systems and computers | 2003
Walter Huang; Venkatesh Krishnan; Daniel J. Allred; Heejong Yoo
Distributed arithmetic (DA) is an efficient architecture for implementing finite impulse response (FIR) digital filters. The DA FIR filter calculates the filter output using look up tables (LUTs) instead of multipliers. Thus, a DA based implementation of an FIR filter is highly parameterizable and area efficient. Furthermore, the fundamental building blocks in the DA architecture map well to the architecture of todays field programmable gate arrays (FPGAs). In this paper, we analyze the design of an adaptive FIR filter using the DA architecture on an FPGA. The design trade-offs discussed in detail include throughput, number of logic elements utilized, memory usage, and power consumption estimates.
signal processing systems | 2011
Walter Huang; David V. Anderson
An efficient way for computing the response of an adaptive digital filter is to use sliding-block distributed arithmetic (SBDA). However, a disadvantage of distributed arithmetic is the amount of memory utilized. By encoding the memory tables in offset binary code (OBC), the size of the memory tables is reduced by half, while the computational workload remains unchanged. By modifying the computational flow, the computational workload can be reduced by almost by half at the expense of slightly more memory. This modified SBDA structure is called SBDA-OBC. It has memory requirements 25–50% lower than SBDA depending on the size of the sub-filter. In terms of the computational workload, SBDA-OBC is most advantageous for large sub-filters and when the filter is split into few sub-filters. In this case, the computational workload is reduced almost in half.
international symposium on circuits and systems | 2007
Erhan Ozalevli; Walter Huang; Paul E. Hasler; David V. Anderson
We present an implementation of a reconfigurable 16-tap finite impulse response filter for post-processing applications. This filter exploits the distributed arithmetic technique for signal processing and floating-gate voltage references for setting tunable analog coefficients. The filter is fabricated in 0.5mum CMOS process, and its order can be increased at the cost of 0.011mm2 of die area and 0.02mW of power per tap. Measurement results for low-pass and band-pass filters at 50kHz sampling frequency are presented.
international midwest symposium on circuits and systems | 2006
Walter Huang; Venkatesh Krishnan; David V. Anderson
Adaptive filtering constitutes an important class of DSP algorithms employed in several hand held mobile devices for applications such as echo cancellation, signal de-noising, and channel equalization. In this paper, a new hardware architecture using conjugate distributed arithmetic (CDA) which is suitable for high throughput hardware implementations of LMS adaptive filters is presented. Unlike a traditional distributed arithmetic (DA) implementation where all possible combination sums of the filter coefficients are stored in a look-up-table (LUT), in the CDA architecture, all possible combination sums of the input signal samples are stored in the LUT and updated at the arrival of every sample using an efficient update procedure. We describe the design of CDA adaptive filters and show that practical implementations of CDA adaptive filters have very high throughput relative to multiply and accumulate architectures. We also show that CDA adaptive filters have a potential area and power consumption advantage over DSP microprocessor architectures for a given throughput.