Karel Bruneel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karel Bruneel is active.

Explore More

Publication

Featured researches published by Karel Bruneel.

ACM Transactions on Design Automation of Electronic Systems | 2011

Dynamic data folding with parameterizable FPGA configurations

Karel Bruneel; Wim Heirman; Dirk Stroobandt

In many applications, subsequent data manipulations differ only in a small set of parameter values. Because of their reconfigurability, FPGAs (field programmable gate arrays) can be configured with a specialized circuit each time the parameter values change. This technique is called dynamic data folding. The specialized circuits are smaller and faster than their generic counterparts. However, the overhead involved in generating the configurations for the specialized circuits at runtime is very large when conventional tools are used, and this overhead will in many cases negate the benefit of using optimized configurations. This article introduces an automatic method for generating runtime parameterizable configurations from arbitrary Boolean circuits. These configurations, in which some of the configuration bits are expressed as a closed-form Boolean expression of a set of parameters, enable very fast run-time specialization, since specialization only involves evaluating these expressions. Our approach is validated on a ternary content-addressable memory (TCAM). We show that the specialized configurations, produced by our method use 2.82 times fewer LUTs than the generic configuration, and even 1.41 times fewer LUTs than the implementation generated by Xilinx Coregen. Moreover, while Coregen needs hand-crafted generators for each type of circuit, our toolflow can be applied to any VHDL design. Using our automatic and generally applicable method, run-time hardware optimization suddenly becomes feasible for a large class of applications.

field-programmable logic and applications | 2008

Automatic generation of run-time parameterizable configurations

Karel Bruneel; Dirk Stroobandt

In many applications, subsequent data manipulations differ only in a small set of parameter values. Because of their reconfigurability, FPGAs (field programmable gate arrays) can be configured with an optimized configuration every time the parameter values change. These optimized configurations are smaller and faster than their generic counterparts. However, the overhead involved in generating the configurations at run-time with conventional tools is very large. This paper introduces an automatic method for generating runtime parameterizable configurations from arbitrary Boolean circuits. These configurations in which some of the configuration bits are expressed as a function of a set of parameters enable very fast run-time specialization since specialization only involves evaluating these functions. Our approach is validated on adaptive filtering. We show that the specialized filter configurations produced by our method are 2.3 times smaller and 36% faster than a generic filter configuration and that these configurations can be generated in on average 166 mus. Being a generic method, run-time hardware optimization suddenly becomes feasible for a large class of applications.

field programmable logic and applications | 2012

Mapping logic to reconfigurable FPGA routing

Karel Heyse; Karel Bruneel; Dirk Stroobandt

Parameterised configurations for FPGAs are configuration bitstreams of which part of the bits are defined as Boolean functions of parameters. By evaluating these Boolean functions using different parameter values, it is possible to quickly and efficiently derive specialised configuration bitstreams with different properties. An important application of parameterised configurations is the generation of specialised configuration bitstreams for Dynamic Circuit Specialisation. Generating and using parameterised configurations requires a new FPGA tool flow. In this paper we present an algorithm for technology mapping of parameterised designs that can exploit the reconfigurability of the logic blocks and routing of the FPGA. This algorithm, called TCONMAP, is based on “Cut enumeration, cut ranking, node selection”. As part of it, a new method to calculate the feasibility of cuts based on the Binary Decision Diagrams (BDD) of their local function is proposed.

field-programmable logic and applications | 2007

A Method for Fast Hardware Specialization at Run-Time

Karel Bruneel; Peter Bertels; Dirk Stroobandt

Dynamic hardware generation is a powerful technique that can substantially reduce both the required hardware resources and the time needed to perform a calculation, reflected in an improved functional density. This performance improvement is a result of additional run-time optimizations enabled by the knowledge of values at certain inputs at runtime. However, due to the large overhead conventional hardware generation tools incur, the usability of dynamic hardware generation is limited. We present a dual approach that combines compile-time generation of generic hardware and run-time specialization. This drastically decreases the dynamic generation overhead. Our approach is used for dynamic generation of FIR filters and compared to a static and a conventional dynamic implementation. The experiments clearly show that the dual approach improves the usability of dynamic hardware generation.

applied reconfigurable computing | 2010

TROUTE: a reconfigurability-aware FPGA router

Karel Bruneel; Dirk Stroobandt

Since FPGAs are inherently reconfigurable, making FPGA designs generic does not reduce chip cost, as is the case for ASICs. However, designing and mapping lots of specialized FPGA designs introduces an extra EDA cost. We describe a two staged fully automatic FPGA tool flow that efficiently maps a generic HDL design to multiple specialized FPGA configurations. The mapping is fast enough to be executed on-line in dynamically reconfigurable systems. In this paper we focus on troute, the routing algorithm used in our tool flow. We used troute to implement reconfigurable Multistage Interconnection Networks and show huge improvements in area, speed and mapping time compared to conventional non-reconfigurable implementations.

ACM Transactions on Design Automation of Electronic Systems | 2013

How to efficiently implement dynamic circuit specialization systems

Fatma Mostafa Mohamed Ahmed Abouelella; Tom Davidson; Wim Meeus; Karel Bruneel; Dirk Stroobandt

Dynamic circuit specialization (DCS) is a technique used to implement FPGA applications where some of the input data, called parameters, change slowly compared to other inputs. Each time the parameter values change, the FPGA is reconfigured by a configuration that is specialized for those new parameter values. This specialized configuration is much smaller and faster than a regular configuration. However, the overhead associated with the specialization process should be minimized to achieve the desired benefits of using the DCS technique. This overhead is represented by both the FPGA resources needed to specialize the FPGA at runtime and by the specialization time. The introduction of parameterized configurations [Bruneel and Stroobandt 2008] has improved the efficiency of DCS implementations. However, the specialization overhead still takes a considerable amount of resources and time. In this article, we explore how to efficiently build DCS systems by presenting a variety of possible solutions for the specialization process and the overhead associated with each of them. We split the specialization process into two main phases: the evaluation and the configuration phase. The PowerPC embedded processor, the MicroBlaze, and a customized processor (CP) are used as alternatives in the evaluation phase. In the configuration phase, the ICAP and a custom configuration interface (SRL configuration) are used as alternatives. Each solution is used to implement a DCS system for three applications: an adaptive finite impulse response (FIR) filter, a ternary content-addressable memory (TCAM), and a regular expression matcher (RegEx). The experiments show that the use of our CP along with the SRL configuration achieves minimum overhead in terms of resources and time. Our CP is 1.8 and 3.5 times smaller than the PowerPC and the area-optimized implementation of the MicroBlaze, respectively. Moreover, the use of the CP enables a more compact representation for the parameterized configuration in comparison to both the PowerPC and the MicroBlaze processors. For instance, in the FIR, the parameterized configuration compiled for our CP is 6--7 times smaller than that for the embedded processors.

reconfigurable computing and fpgas | 2012

Dynamic circuit specialisation for key-based encryption algorithms and DNA alignment

Tom Davidson; Fatma Mostafa Mohamed Ahmed Abouelella; Karel Bruneel; Dirk Stroobandt

Parameterised reconfiguration is a method for dynamic circuit specialization on FPGAs. The main advantage of this new concept is the high resource efficiency. Additionally, there is an automated tool flow, TMAP, that converts a hardware design into a more resource-efficient run-time reconfigurable design without a large design effort. We will start by explaining the core principles behind the dynamic circuit specialization technique. Next, we show the possible gains in encryption applications using an AES encoder. Our AES design shows a 20.6% area gain compared to an unoptimized hardware implementation and a 5.3% gain compared to a manually optimized third-party hardware implementation. We also used TMAP on a Triple-DES and an RC6 implementation, where we achieve a 27.8% and a 72.7% LUT-area gain. In addition, we discuss a run-time reconfigurable DNA aligner. We focus on the optimizations to the dynamic specialization overhead. Our final design is up to 2.80-times more efficient on cheaper FPGAs than the original DNA aligner when at least one DNA sequence is longer than 758 characters. Most sequences in DNA alignment are of the order 213.

design, automation, and test in europe | 2013

An automatic tool flow for the combined implementation of multi-mode circuits

Brahim Al Farisi; Karel Bruneel; João M. P. Cardoso; Dirk Stroobandt

A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration for every mode separately. To switch between modes the complete reconfigurable region is rewritten, which often leads to very long reconfiguration times. In this paper we present a novel, fully automated tool flow that exploits similarities between the modes and uses Dynamic Circuit Specialization to drastically reduce reconfiguration time. Experimental results show that the number of bits that is rewritten in the configuration memory reduces with a factor from 4.6× to 5.1× without significant performance penalties.

reconfigurable computing and fpgas | 2008

Reconfigurability-Aware Structural Mapping for LUT-Based FPGAs

Karel Bruneel; Dirk Stroobandt

In many applications, subsequent tasks differ only in a specific set of parameters. Because of their reconfigurability, FPGAs (field programmable gate arrays) can be configured with an optimized configuration every time these parameter values change. This results in configurations that are smaller and faster than their generic counterparts. Unfortunately, the overhead involved in generating these configurations at run-time with conventional tools is very large. However, if the incoming tasks only differ in a set of parameter values, the use of Tunable LUT (TLUT) circuits can drastically reduce this overhead. A TLUT circuit is a LUT circuit in which the truth tables of the LUTs are expressed as a function of a set of parameters. At run-time the truth tables for a specific set of parameter values can rapidly be calculated by evaluating these functions. Up to now TLUT circuits had to be designed manually resulting in a huge design cost. This paper introduces TMAP2, a software tool based on conventional structural mapping that automatically generates a TLUT circuit starting from an arbitrary Boolean circuit.We have tested TMAP2 on a set of 12 micro-benchmarks and we show a substantial reduction in both the circuits area and maximum depth compared to conventional implementations.

field programmable logic and applications | 2012

Maximizing the reuse of routing resources in a reconfiguration-aware connection router

Elias Vansteenkiste; Karel Bruneel; Dirk Stroobandt

Parameterised configurations for FPGAs are configuration bitstreams of which some of the bits are defined as Boolean functions of parameters. By evaluating these Boolean functions using different parameter values, it is possible to quickly and efficiently derive specialised configuration bitstreams with different properties. Generating and using parameterized configurations requires a new tool flow. In this paper we propose a novel algorithm for the routing step of this tool flow. This new router, called the connection bundle router, is able to route a circuit with parameterized interconnections. It produces routing solutions in less time (up to a factor 5,2) and with a better quality in terms of number of wires (up to 38%) and minimum track width (up to 25%) than its predecessors. The connection bundle router is fully automated and uses a scalable connection-based representation for the parameterized interconnections in a tunable circuit.

Explore More