Is this you? Create Your Porfile

Walter Huang

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Walter Huang is active.

Explore More

Publication

Featured researches published by Walter Huang.

IEEE Transactions on Circuits and Systems | 2005

LMS adaptive filters using distributed arithmetic for high throughput

Daniel J. Allred; Heejong Yoo; Venkatesh Krishnan; Walter Huang; David V. Anderson

We present a new hardware adaptive filter architecture for very high throughput LMS adaptive filters using distributed arithmetic (DA). DA uses bit-serial operations and look-up tables (LUTs) to implement high throughput filters that use only about one cycle per bit of resolution regardless of filter length. However, building adaptive DA filters requires recalculating the LUTs for each adaptation which can negate any performance advantages of DA filtering. By using an auxiliary LUT with special addressing, the efficiency and throughput of DA adaptive filters can be of the same order as fixed DA filters. In this paper, we discuss a new hardware adaptive filter structure for very high throughput LMS adaptive filters. We describe the development of DA adaptive filters and show that practical implementations of DA adaptive filters have very high throughput relative to multiply and accumulate architectures. We also show that DA adaptive filters have a potential area and power consumption advantage over digital signal processing microprocessor architectures.

international conference on acoustics, speech, and signal processing | 2004

A novel high performance distributed arithmetic adaptive filter implementation on an FPGA

Daniel J. Allred; Heejong Yoo; Venkatesh Krishnan; Walter Huang; David V. Anderson

In this paper, an FIR adaptive filter implementation, using a multiplier-free architecture, is presented. The implementation is based on distributed arithmetic (DA) which substitutes multiply-and-accumulate operations with a series of look-up-table (LUT) accesses. This can be achieved at the cost of a moderate increase in memory usage. The proposed design performs an LMS-type adaptation on a sample-by-sample basis. This is accomplished by an innovative LUT update using a matched auxiliary LUT. The system is implemented on an FPGA that enables rapid prototyping of digital circuits. Implementation results are provided to demonstrate that a high-speed, low logic complexity LMS adaptive filter can be realized employing the proposed architecture.

IEEE Transactions on Circuits and Systems | 2008

A Reconfigurable Mixed-Signal VLSI Implementation of Distributed Arithmetic Used for Finite-Impulse Response Filtering

Erhan Ozalevli; Walter Huang; Paul E. Hasler; David Anderson

A reconfigurable implementation of distributed arithmetic (DA) for post-processing applications is described. The input of DA is received in digital form and its analog coefficients are set by using the floating-gate voltage references. The effect of the offset and gain errors on DA computational accuracy is analyzed, and theoretical results for the limitations of this design strategy are presented. This architecture is fabricated in a 0.5-mum CMOS process, and configured as a 16-tap finite impulse response (FIR) filter to demonstrate the reconfigurability and computational efficiency. The measurement results for comb, low-pass, and bandpass filters at 32/50-kHz sampling frequencies are presented. This implementation occupies around 1.125 mm2 of die area and consumes 16 mW of static power. The filter order can be increased at the cost of 0.011 mm2 of die area and 0.02 mW of power per tap.

field-programmable custom computing machines | 2004

An FPGA implementation for a high throughput adaptive filter using distributed arithmetic

Daniel J. Allred; Walter Huang; Venkatesh Krishnan; Heejong Yoo; David V. Anderson

In this paper, an FIR adaptive filter implementation using a multiplier-free architecture is presented. The implementation is based on distributed arithmetic (DA) which substitutes multiply-and-accumulate operations with a series of look-up-table (LUT) accesses. This can be achieved at the cost of a moderate increase in memory usage. The proposed design performs an LMS-type adaptation on a sample-by-sample basis. This is accomplished by an innovative LUT update using a matched auxiliary LUT. The system is implemented on an FPGA that enables rapid prototyping of digital circuits. Implementation results are provided to demonstrate that a high-speed LMS adaptive filter can be realized employing the proposed architecture.

asilomar conference on signals, systems and computers | 2003

Implementation of an LMS adaptive filter on an FPGA employing multiplexed multiplier architecture

Daniel J. Allred; Venkatesh Krishnan; Walter Huang; David V. Anderson

In this paper, a multiplexed multiplier architecture (MMA) for a field programmable gate array (FPGA) implementation of the least mean square (LMS) adaptive filter is developed and presented. In the proposed architecture, hardware multipliers are reused, i.e. multiplexed in time, for both filtering and adaptation. The number of multipliers may be chosen to achieve certain design trade-offs. The design trade-offs considered in this paper include on-chip area, filter size, maximum filter throughput, and power consumption.

international conference on acoustics, speech, and signal processing | 2009

Adaptive filters using modified sliding-block distributed arithmetic with offset binary coding

Walter Huang; David V. Anderson

An efficient way for computing the response of an adaptive digital filter is to use sliding-block distributed arithmetic (SBDA). One disadvantage of distributed arithmetic is the amount of memory utilized. By encoding the memory tables in offset binary code (OBC), the size of the memory tables is reduced in half. However, the computational workload remains unchanged. By modifying the computational flow, the computational workload can be reduced by almost half at the expense of slightly more memory. This modified SBDA structure is called SBDA-OBC. It has memory requirements 25%–50% lower than SBDA depending on the size of the sub-filter. In terms of the computational workload, SBDA-OBC is most advantageous for large sub-filters and when the filter is split into few subfilters. In this case, the computational workload is reduced almost in half.

asilomar conference on signals, systems and computers | 2003

Design analysis of a distributed arithmetic adaptive FIR filter on an FPGA

Walter Huang; Venkatesh Krishnan; Daniel J. Allred; Heejong Yoo

Distributed arithmetic (DA) is an efficient architecture for implementing finite impulse response (FIR) digital filters. The DA FIR filter calculates the filter output using look up tables (LUTs) instead of multipliers. Thus, a DA based implementation of an FIR filter is highly parameterizable and area efficient. Furthermore, the fundamental building blocks in the DA architecture map well to the architecture of todays field programmable gate arrays (FPGAs). In this paper, we analyze the design of an adaptive FIR filter using the DA architecture on an FPGA. The design trade-offs discussed in detail include throughput, number of logic elements utilized, memory usage, and power consumption estimates.

signal processing systems | 2011

Modified Sliding-Block Distributed Arithmetic with Offset Binary Coding for Adaptive Filters

Walter Huang; David V. Anderson

An efficient way for computing the response of an adaptive digital filter is to use sliding-block distributed arithmetic (SBDA). However, a disadvantage of distributed arithmetic is the amount of memory utilized. By encoding the memory tables in offset binary code (OBC), the size of the memory tables is reduced by half, while the computational workload remains unchanged. By modifying the computational flow, the computational workload can be reduced by almost by half at the expense of slightly more memory. This modified SBDA structure is called SBDA-OBC. It has memory requirements 25–50% lower than SBDA depending on the size of the sub-filter. In terms of the computational workload, SBDA-OBC is most advantageous for large sub-filters and when the filter is split into few sub-filters. In this case, the computational workload is reduced almost in half.

international symposium on circuits and systems | 2007

VLSI Implementation of a Reconfigurable Mixed-Signal Finite Impulse Response Filter

Erhan Ozalevli; Walter Huang; Paul E. Hasler; David V. Anderson

We present an implementation of a reconfigurable 16-tap finite impulse response filter for post-processing applications. This filter exploits the distributed arithmetic technique for signal processing and floating-gate voltage references for setting tunable analog coefficients. The filter is fabricated in 0.5mum CMOS process, and its order can be increased at the cost of 0.011mm2 of die area and 0.02mW of power per tap. Measurement results for low-pass and band-pass filters at 50kHz sampling frequency are presented.

international midwest symposium on circuits and systems | 2006

Conjugate Distributed Arithmetic Adaptive FIR Filters and their Hardware Implementation

Walter Huang; Venkatesh Krishnan; David V. Anderson

Adaptive filtering constitutes an important class of DSP algorithms employed in several hand held mobile devices for applications such as echo cancellation, signal de-noising, and channel equalization. In this paper, a new hardware architecture using conjugate distributed arithmetic (CDA) which is suitable for high throughput hardware implementations of LMS adaptive filters is presented. Unlike a traditional distributed arithmetic (DA) implementation where all possible combination sums of the filter coefficients are stored in a look-up-table (LUT), in the CDA architecture, all possible combination sums of the input signal samples are stored in the LUT and updated at the arrival of every sample using an efficient update procedure. We describe the design of CDA adaptive filters and show that practical implementations of CDA adaptive filters have very high throughput relative to multiply and accumulate architectures. We also show that CDA adaptive filters have a potential area and power consumption advantage over DSP microprocessor architectures for a given throughput.

Explore More