Is this you? Create Your Porfile

Hadi Parandeh-Afshar

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hadi Parandeh-Afshar is active.

Explore More

Publication

Featured researches published by Hadi Parandeh-Afshar.

asia and south pacific design automation conference | 2008

Efficient synthesis of compressor trees on FPGAs

Hadi Parandeh-Afshar; Philip Brisk; Paolo Ienne

FPGA performance is currently lacking for arithmetic circuits. Large sums of k > 2 integer values is a computationally intensive operation in applications such as digital signal and video processing. In ASIC design, compressor trees, such as Wallace and Dadda trees, are used for parallel accumulation; however, the LUT structure and fast carry-chains employed by modern FPGAs favor trees of carry-propagate adders (CPAs), which are a poor choice for ASIC design. This paper presents the first method to successfully synthesize compressor trees on LUT-based FPGAs. In particular, we have found that generalized parallel counters (GPCs) map quite well to LUTs on FPGAs; a heuristic, presented within, constructs a compressor tree from a library of GPCs that can efficiently be implemented on the target FPGA. Compared to the ternary adder trees produced by commercial synthesis tools, our heuristic reduces the combinational delay by 27.5%, on average, within a tolerable average area increase of 5.7%.

field-programmable logic and applications | 2009

Exploiting fast carry-chains of FPGAs for designing compressor trees

Hadi Parandeh-Afshar; Philip Brisk; Paolo Ienne

Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic blocks of FPGAs for fast addition. Conventional intuition is that such carry chains can be used only for implementing carry-propagate addition; state-of-the-art FPGA synthesizers can only exploit the carry chains for these specific circuits. This paper demonstrates that the carry chains can be used to build compressor trees, i.e., multi-input addition circuits used for parallel accumulation and partial product reduction for parallel multipliers implemented in FPGA logic. The key to our technique is to program the lookup tables (LUTs) in the logic blocks to stop the propagation of carry bits along the carry chain at appropriate points. This approach improves the area of compressor trees significantly compared to previous methods that synthesized compressor trees solely on LUTs, without compromising the performance gain over trees built from ternary carry-propagate adders.

design, automation, and test in europe | 2008

Improving synthesis of compressor trees on FPGAs via integer linear programming

Hadi Parandeh-Afshar; Philip Brisk; Paolo Ienne

Multi-input addition is an important operation for many DSP and video processing applications. On FPGAs, multi-input addition has traditionally been implemented using trees of carry-propagate adders. This approach has been used because the traditional lookup table (LUT) structure of FPGAs is not amenable to compressor trees, which are used to implement multi-input addition and parallel multiplication in ASIC technology. In prior work, we developed a greedy heuristic method to map compressor trees onto the general logic of an FPGA using a component called generalized parallel counter (GPC). Although this technique reduced the combinational delay of our circuits, when synthesized onto Altera Stratix-II FPGAs, by 27% on average; however, the area was increased by an average 11%. To further reduce the delay and limit the increase in area, we have developed a new solution to the mapping problem based on integer linear programming. This new approach reduced the delay of the compressor tree by 32% on average and reduced the area by 3% compared to an adder tree.

ACM Transactions on Reconfigurable Technology and Systems | 2011

Compressor tree synthesis on commercial high-performance FPGAs

Hadi Parandeh-Afshar; Arkosnato Neogy; Philip Brisk; Paolo Ienne

Compressor trees are a class of circuits that generalizes multioperand addition and the partial product reduction trees of parallel multipliers using carry-save arithmetic. Compressor trees naturally occur in many DSP applications, such as FIR filters, and, in the more general case, their use can be maximized through the application of high-level transformations to arithmetically intensive data flow graphs. Due to the presence of carry-chains, it has long been thought that trees of 2- or 3-input carry-propagate adders are more efficient than compressor trees for FPGA synthesis; however, this is not the case. This article presents a heuristic for FPGA synthesis of compressor trees that outperforms adder trees and exploits carry-chains when possible. The experimental results show that, on average, the use of compressor trees can reduce critical path delay by 33% and 45% respectively, compared to adder trees synthesized on the Xilinx Virtex-5 and Altera Stratix III FPGAs.

field programmable gate arrays | 2012

Rethinking FPGAs: elude the flexibility excess of LUTs with and-inverter cones

Hadi Parandeh-Afshar; Hind Benbihi; David Novo; Paolo Ienne

Look-Up Tables (LUTs) are universally used in FPGAs as the elementary logic blocks. They can implement any logic function and thus covering a circuit is a relatively straightforward problem. Naturally, flexibility comes at a price, and increasing the number of LUT inputs to cover larger parts of a circuit has an exponential cost in the LUT complexity. Hence, rarely LUTs with more than 4-6 inputs have been used. In this paper we argue that other elementary logic blocks can provide a better compromise between hardware complexity, flexibility, delay, and input and output counts. Inspired by recent trends in synthesis and verification, we explore blocks based on And-Inverter Graphs (AIGs): they have a complexity which is only linear in the number of inputs, they sport the potential for multiple independent outputs, and the delay is only logarithmic in the number of inputs. Of course, these new blocks are extremely less flexible than LUTs; yet, we show (i) that effective mapping algorithms exist, (ii) that, due to their simplicity, poor utilization is less of an issue than with LUTs, and (iii) that a few LUTs can still be used in extreme unfortunate cases. We show first results indicating that this new logic block combined to some LUTs in hybrid FPGAs can reduce delay up to 22-32% and area by some 16% on average. Yet, we explored only a few design points and we think that these results could still be improved by a more systematic exploration.

IEEE Transactions on Very Large Scale Integration Systems | 2010

Improving FPGA Performance for Carry-Save Arithmetic

Hadi Parandeh-Afshar; Ajay K. Verma; Philip Brisk; Paolo Ienne

The selective use of carry-save arithmetic, where appropriate, can accelerate a variety of arithmetic-dominated circuits. Carry-save arithmetic occurs naturally in a variety of DSP applications, and further opportunities to exploit it can be exposed through systematic data flow transformations that can be applied by a hardware compiler. Field-programmable gate arrays (FPGAs), however, are not particularly well suited to carry-save arithmetic. To address this concern, we introduce the ¿field programmable counter array¿ (FPCA), an accelerator for carry-save arithmetic intended for integration into an FPGA as an alternative to DSP blocks. In addition to multiplication and multiply accumulation, the FPCA can accelerate more general carry-save operations, such as multi-input addition (e.g., add k > 2 integers) and multipliers that have been fused with other adders. Our experiments show that the FPCA accelerates a wider variety of applications than DSP blocks and improves performance, area utilization, and energy consumption compared with soft FPGA logic.

field programmable gate arrays | 2008

A novel FPGA logic block for improved arithmetic performance

Hadi Parandeh-Afshar; Philip Brisk; Paolo Ienne

To improve FPGA performance for arithmetic circuits, this paper proposes a new architecture for FPGA logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed locally without the global routing network. Unlike previous carry-chains for binary and ternary addition, the carry chain used by the new cell only spans 2 logic blocks, which significantly improves the delay of multi-input addition operations mapped onto the FPGA. The delay and area overhead that arises from augmenting a traditional FPGA logic cell with the new compressor structure is minimal. Using this new cell, we observed an average speedup in combinational delay of 1.41x compared to adder trees synthesized using ternary adders

field programmable gate arrays | 2014

Revisiting and-inverter cones

Grace Zgheib; Liqun Yang; Zhihong Huang; David Novo; Hadi Parandeh-Afshar; Haigang Yang; Paolo Ienne

And-Invert Cones (AICs) have been suggested as an alternative to the ubiquitous Look-Up Tables (LUTs) used in commercial FPGAs. The original article suggesting the new architecture made some untested assumptions on the circuitry needed to implement AIC architectures and did not develop completely the toolset necessary to assess comprehensively the idea. In this paper, we pick up the architecture that some of us proposed in the original AIC paper and try to implement it as thoroughly as we can afford. We build all components for the logic cluster at transistor level in a 40~nm technology as well as a LUT-based architecture inspired by Alteras Stratix~IV. We first determine that the characteristics of our LUT-based architecture are reasonably similar to those of the commercial counterpart. Then, we compare the AIC architecture to the baseline on a number of benchmarks, and we find a few difficulties that had been overlooked before. We thus explore other design possibilities around the original design point and show their detailed impact. Finally, we discuss how the very structure of current logic clusters seems not perfectly appropriate for getting the best out of AICs and conclude that, even though they are not confirmed as an immediate blessing today, AICs still offer rich research opportunities.

design automation conference | 2007

Enhancing FPGA performance for arithmetic circuits

Philip Brisk; Ajay K. Verma; Paolo Ienne; Hadi Parandeh-Afshar

FPGAs offer flexibility and cost-effectiveness that ASICs cannot match; however, their performance is quite poor in comparison, especially for arithmetic dominated circuits. To address this issue, this paper introduces a novel reconfigurable lattice built from counters rather than look-up tables that can effectively accelerate the arithmetic portions of a circuit. We intend to integrate this novel lattice onto the same die as an FPGA.

field programmable gate arrays | 2008

Architectural improvements for field programmable counter arrays: enabling efficient synthesis of fast compressor trees on FPGAs

Alessandro Cevrero; Panagiotis Athanasopoulos; Hadi Parandeh-Afshar; Ajay K. Verma; Philip Brisk; Frank K. Gürkaynak; Yusuf Leblebici; Paolo Ienne

The Field Programmable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit is transformed by merging disparate addition and multiplication operations into large multi-input addition operations, which are synthesized as compressor trees on the FPCA; the remaining portion of the circuit is synthesized on the FPGA. This paper presents a series of architectural improvements to the FPCA that reduce routing delay, increase flexibility and component utilization, and simplify the integration process. Using an FPGA containing six FPCAs, we observed average and maximum speedups of 1.60x and 2.40x on a set of arithmetic benchmarks

Explore More