Mathias Faust
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mathias Faust.
international symposium on circuits and systems | 2010
Mathias Faust; Chip-Hong Chang
Research on optimization of fixed coefficient FIR filters modeled as Multiple Constant Multiplication (MCM) has been ongoing for two decades. An analysis of Minimal Signed Digit (MSD) reveals that potential good solutions are omitted by Common Subexpression Elimination (CSE) algorithms as they are hidden in the MSD representations. Some CSE algorithms ensure that all coefficients are implemented at minimal Logic Depth (LD) which is advantageous from power saving perspective. Imposing this requirement on a graph dependant (GD) algorithm reduces the search space as well as the runtime. It also eliminates the long critical path of GD algorithm. This paper presents a minimal logic depth GD algorithm which requires no lookup table. Simulation results show that it has lower number of adders than CSE algorithms while having the minimal logic depth. For all filters tested, it consumes less switching power than the latest LD constrained GD methods based on the Glitch Path Count and Glitch Path Score metrics.
international symposium on circuits and systems | 2012
Martin Kumm; Peter Zipf; Mathias Faust; Chip-Hong Chang
This paper addresses the direct optimization of pipelined adder graphs (PAGs) for high speed multiple constant multiplication (MCM). The optimization opportunities are described and a definition of the pipelined multiple constant multiplication (PMCM) problem is given. It is shown that the PMCM problem is a generalization of the MCM problem with limited adder depth (AD). A novel algorithm to solve the PMCM problem heuristically, called RPAG, is presented. RPAG outperforms previous methods which are based on pipelining the solutions of conventional MCM algorithms. A flexible cost evaluation is used which enables the optimization for FPGA or ASIC targets on high or low abstraction levels. Results for both technologies are given and compared with the most recent methods. Even for the special case of limited AD it is shown that RPAG often produces better results compared to the prominent Hcub algorithm with minimal total AD constraint.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2010
Chip-Hong Chang; Mathias Faust
A thorough analysis of the paper above revealed several controversial arguments about the superiority of binary representation over canonical signed digits (CSD) for common subexpression elimination (CSE). It was improper to model the number of logic operators (LO) required after CSE as a linear sum of independently weighted numbers of nonzero bits, common subexpressions and unpaired bits. The logic depth (LD) penalty of binary CSE had been deemphasized by the errors in the reported LD. This comment corrects the LD of contention resolution algorithm, and points out some contradictions with reference to the latest experimentation of binary, CSD and minimal signed digit number representations for CSE. Upon correcting the error in the reported filter lengths for different stopband attenuations of digital advanced mobile phone system specification, the LO and LD data of the CSE algorithms compared in the above paper are recalculated using the corrected filter coefficient sets.
international symposium on circuits and systems | 2009
Mathias Faust; Chip-Hong Chang
Over the last two decades, fixed coefficient FIR filters were generally optimized by minimizing the number of adders required to implement the multiplier block in the transposed direct form filter structure. In this paper, an optimization method for the structural adders in the transposed tapped delay line is proposed. Although additional registers are required, an optimal trade-off can be made such that the overall combinational logic is reduced. For a majority of taps, the delay through the structural adder is shortened except for the last tap. The one full adder delay increase for the last optimized tap is tolerable as it does not fall in the critical path in most cases. The criterion for which area reduction is possible is analytically derived and an area reduction of up to 4.5% for the structural adder block of three benchmark filters is estimated theoretically. The saving is more prominent as the number of taps grows. Actual synthesis results obtained by Synopsys Design compiler with 0.18µm TSMC CMOS libraries show a total area reduction of up to 13.13% when combined with common subexpression elimination. In all examples, up to 11.96% of the total area saved were due to the reduction of structural adder costs by our proposed method.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2011
Ruimin Huang; Chip-Hong Chang; Mathias Faust; Niklas Lotze; Yiannos Manoli
This brief proposes a new approach to utilizing positive-offset representation for sign-extension avoidance in shift-and-add implementation of a finite-impulse response filter. Affine arithmetic is used to model the excess offsets in order to curtail the word-length (WL) expansion problem. Tighter probabilistically justified WL bounds are determined to enable further offset to be removed from each tap. The approach is applicable even after the redundant adders in the multiplier block of the filter have been minimized. Our simulation results show an average power reduction of about 19% over and above the savings achieved by sharing of adders in multiple constant multiplication.
international symposium on circuits and systems | 2011
Mathias Faust; Chip-Hong Chang
The research on optimization of Multiple Constant Multiplication (MCM) during the last two decades has been focusing mainly on common subexpression elimination and reduced adder graph algorithms when bit-parallel computation is required. The advancement of FPGA technology enables the implementation of complex MCM instances on FPGA, but the shift-and-add network implementation does not make full use of the fundamental resources of FPGA, like the Look-Up Tables (LUT). Since bit-serial implementation optimized for FPGA is slow, an attempt for bit-parallel LUT-based implementation for single constant multiplication has been made. This paper extends this LUT-based method to multiple constant multiplications. It presents an interesting insight and unexpected outcome that the maximal number of LUTs required can be limited far below the theoretical number by mere enumeration without considering the legitimacy of all possible output combinations. Simulation results show that the required logic slices are comparable to the traditional adder-based MCM optimization methods while the delay is reduced by approximately 33%. The advantages are more prominent with increasing number of constants and the bit width used for their representation.
asilomar conference on signals, systems and computers | 2010
Mathias Faust; Oscar Gustafsson; Chip-Hong Chang
The problem of reconfigurable multiple constant multiplication (ReMCM) is about finding an cost-effective network of shifts, additions, subtractions, and multiplexers to implement the multiplication of a single input variable with one out of several sets of coefficients. Most previous publications only focus on the problem with a single output, whereas the algorithm proposed here solves a multiple output ReMCM problem using a adder-graph based minimal logic depth approach. The use of minimal logic depth restricts the length of critical path and it was shown in previous work that minimum depth MCM is advantageous in terms of power consumption. The use of a adder-graph heuristic gives more possibilities for adder formation to reduce the total number of adders and multiplexers. For the polyphase decimation filters, the relation between filter length and decimation factor has been shown to have a influence on the implementation cost. Experimental results showed that a ReMCM can be implemented with up to 38% less area for decimation factor of 8 than a parallel implementation of the polyphase subfilters, while the single output problems can also be solved with results comparable to the known algorithms.
european conference on circuit theory and design | 2011
Mathias Faust; Chip-Hong Chang
The optimization of fixed coefficient FIR filter implementation has been focused mainly on the multiplier block where full precision fixed point arithmetic is normally used. Recently, an optimization method was proposed for the structural adders in FIR filters. This paper further proposes a method for gradually reducing the number of fractional bits within the structural adder block such that the output has the same number of fractional bits as the input signal. The resulting output signal is very close to the rounded signal obtained from full-precision calculation. This is achieved by applying truncation and round-half-up operations on the inputs to the structural adders. The proposed method reduces the area of FIR filter implementation and the magnitude of the error is not larger than one LSB. Example filters were synthesized and the simulation results show an error mean of less than 0.25% of the LSB and a variance of less than 15% of the LSB. Overall, the areas of the example filters have been reduced by up to 12.42%.
international conference on digital signal processing | 2015
Mathias Faust; Martin Kumm; Chip-Hong Chang; Peter Zipf
Pipelining is a common method to implement high speed FIR filters. While the efficient pipelining of multiplications is well understood, no attention has been paid on the pipelining of structural adders so far. The delay of structural adders becomes crucial in high speed designs as they have the largest word size in non-truncated FIR filters and typically lie in the critical path. The common pipelining method results in an excessive overhead in registers when applied to the structural adders as many additional paths have to be delayed. An efficient method for pipelining structural adders using a partially redundant number representation is proposed in this paper. With a very little area overhead of 5.4%, the throughput of the structural adders can be doubled while a speedup factor of up to 7 can be achieved with an area overhead of 26.7%.
IEEE Transactions on Circuits and Systems I-regular Papers | 2018
Jiajia Chen; Chip-Hong Chang; Jiatao Ding; Rui Qiao; Mathias Faust
Finite-impulse response filters are widely used in digital signal processing applications. Prodigious research in the past two decades has substantially reduced the implementation cost of the multiple constant multiplication blocks. Further area and power consumption savings are stagnated by the structural adders and registers in the tap delay-and-accumulate line, which unfortunately dominate the overall hardware cost of FIR filter and are difficult to minimize by existing resource sharing approaches. Retiming or relocating the structural adders and registers can improve merely the throughput. To close the area-power efficiency gap, we reformulate the filter coefficient synthesis problem to explore the design space for the tap delay-and accumulate line by bisecting at some tap position. An efficient Genetic Algorithm is proposed to solve this integer programming problem at quadratic computational complexity by refining the search space for finding an optimized solution to fulfill the frequency response specifications. Field programmable gate array and application specific integrated circuit logic synthesis results from twelve benchmark filter specifications showed that the average area and power consumptions of the solutions generated by our proposed algorithm have been reduced by up to 26.8% and 27.5% respectively, in comparison with the solutions obtained by existing design methods.