Karel Heyse | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karel Heyse is active.

Explore More

Publication

Featured researches published by Karel Heyse.

field programmable logic and applications | 2012

Mapping logic to reconfigurable FPGA routing

Karel Heyse; Karel Bruneel; Dirk Stroobandt

Parameterised configurations for FPGAs are configuration bitstreams of which part of the bits are defined as Boolean functions of parameters. By evaluating these Boolean functions using different parameter values, it is possible to quickly and efficiently derive specialised configuration bitstreams with different properties. An important application of parameterised configurations is the generation of specialised configuration bitstreams for Dynamic Circuit Specialisation. Generating and using parameterised configurations requires a new FPGA tool flow. In this paper we present an algorithm for technology mapping of parameterised designs that can exploit the reconfigurability of the logic blocks and routing of the FPGA. This algorithm, called TCONMAP, is based on “Cut enumeration, cut ranking, node selection”. As part of it, a new method to calculate the feasibility of cuts based on the Binary Decision Diagrams (BDD) of their local function is proposed.

field-programmable logic and applications | 2013

Efficient implementation of Virtual Coarse Grained Reconfigurable Arrays on FPGAS

Karel Heyse; Tom Davidson; Elias Vansteenkiste; Karel Bruneel; Dirk Stroobandt

Fine grained Field Programmable Gate Arrays (FPGA) are complex to program and therefore suffer from high development costs. To solve this problem, Virtual Coarse Grained Reconfigurable Arrays (Virtual CGRA), or CGRAs implemented on FPGAs, have been proposed. Conventional implementations of VCGRAs use functional FPGA resources, such as LookUp Tables, to implement the virtual switch blocks, registers and other components that make the VCGRA configurable. We show that this is a large overhead that can often be avoided by mapping these components directly on lower level FPGA resources such as physical switch blocks and configuration memory. We show how this can be achieved using the tool flow for parameterised FPGA configurations and illustrate the advantages of this method by showing that an area reduction of 50% is attainable for a VCGRA aimed at regular expression matching.

Proceedings of the FPGA World Conference 2014 on | 2014

Performance Evaluation of Dynamic Circuit Specialization on Xilinx FPGAs

Amit Kulkarni; Karel Heyse; Tom Davidson; Dirk Stroobandt

Dynamic Circuit Specialization (DCS) is a technique used to optimize FPGA applications when some of the inputs, called parameters, are infrequently changing compared to other inputs. For every change of parameter input values, a specialized FPGA configuration is generated during run time and the FPGA is reconfigured with a specialized bitstream. We examine how the performance of the DCS technique evolves with the advent of newer Xilinx FPGA architectures. The performance of the DCS technique is evaluated on three different Xilinx FPGA architectures: Virtex-II Pro, Virtex-5 and Zynq SoC. We have used a 16-tap, 8-bit FIR filter as a parameterized design, with the filter coefficients as the parameters of the FIR design.

reconfigurable computing and fpgas | 2014

Improving reconfiguration speed for dynamic circuit specialization using placement constraints

Amit Kulkarni; Tom Davidson; Karel Heyse; Dirk Stroobandt

Dynamic Circuit Specialization (DCS) is an optimization technique used for implementing a parameterized application on an FPGA. The application is said to be parameterized when some of its inputs, called parameters, are infrequently changing compared to the other inputs. Instead of implementing these parameter inputs as regular inputs, in the DCS approach these inputs are implemented as constants and the design is optimized for these constants. When the parameter values change, the design is re-optimized for the new constant values by reconfiguring the FPGA. It has been investigated that run-time reconfiguration speed is the limiting factor of the DCS implementations on Xilinx FPGAs. We propose an idea to constrain the designs placement and use the custom Xilinx HWICAP driver to improve reconfiguration speed at the cost of a small reduction in design performance. We use Xilinx Virtex-5 and Zynq-SoC as experimental platforms and we have used an 8-bit FIR filter with different tap configurations as our parameterized design whose filter coefficient values are infrequently changing inputs. A drastic improvement in the reconfiguration speed with a factor of 14 is achieved with only a ≈ 6% decrease in performance.

ACM Transactions on Design Automation of Electronic Systems | 2015

TCONMAP: Technology Mapping for Parameterised FPGA Configurations

Karel Heyse; Brahim Al Farisi; Karel Bruneel; Dirk Stroobandt

Parameterised configurations are FPGA configuration bitstreams in which the bits are defined as functions of user-defined parameters. From a parameterised configuration, it is possible to quickly and efficiently derive specialised, regular configuration bitstreams by evaluating these functions. The specialised bitstreams have different properties and functionality depending on the chosen values of the parameters. The most important application of parameterised configurations is the generation of specialised configuration bitstreams for Dynamic Circuit Specialisation, a technique for optimising circuits at runtime using partial reconfiguration of the FPGA. Generating and using parameterised configurations requires a new FPGA tool flow. In this article, we present a new technology mapping algorithm for parameterised designs, called TCONMAP, that can be used to produce parameterised configurations in which both the configuration of the logic blocks and routing is a function of the parameters. In our experiments, we demonstrate that in using TCONMAP, the depth and area of the mapped circuit is close to the minimal depth and area attainable. Both Dynamic Circuit Specialisation and fine-grained modular reconfiguration are extracted by TCONMAP from the HDL description of the design requiring only simple parameter annotations.

field-programmable logic and applications | 2011

Memory-Efficient and Fast Run-Time Reconfiguration of Regularly Structured Designs

Brahim Al Farisi; Karel Heyse; Karel Bruneel; Dirk Stroobandt

Previous work has shown that run-time reconfiguration of FPGAs benefits greatly from the use of Tunable LUT (TLUT) circuits. These can be rapidly transformed into a specialized LUT circuit and are also very memory efficient when representing regularly structured designs, where the same hardware module is instantiated many times. However, the memory requirements and reconfiguration time of a run-time reconfigurable application are also dependent on the reconfiguration mechanism. In this paper, we will show that the memory requirements of conventional ICAP reconfiguration grow very fast with the number of modules, resulting in excessive memory usage. We propose to use Shift-Register-LUT (SRL) reconfiguration which is faster and results in a memory usage that is independent of the number of modules.

field programmable logic and applications | 2015

Estimating circuit delays in FPGAs after technology mapping

Berg Severens; Elias Vansteenkiste; Karel Heyse; Dirk Stroobandt

An FPGA implementation requires a significant effort of the hardware designer, who optimizes FPGA designs by going through many time-consuming CAD flow iterations. These iterations provide two types of feedback: (1) the FPGA performance and (2) the identification of the parts having the highest impact on the FPGA performance. Both depend on the wirelength behavior. Studies have been dedicated to the estimation of local [5] and global [4] wirelengths, but to our knowledge both performance estimations and identification of the critical zone are not present in literature. Therefore this paper, firstly, presents a comparison of three performance estimation techniques: logic depth, Monte Carlo simulation and fast placement (ordered from low to high accuracy and runtime). Secondly, four methods identifying the critical zone are compared. Results show that Monte Carlo simulations provide a good identification of the parts having the highest impact on the performance. We conclude that Monte Carlo simulations provide useful feedback within a short runtime (about 30 times faster than placement), reducing the time-to-market of FPGA implementations.

applied reconfigurable computing | 2015

On the Impact of Replacing Low-Speed Configuration Buses on FPGAs with the Chip’s Internal Configuration Infrastructure

Karel Heyse; Jente Basteleus; Brahim Al Farisi; Dirk Stroobandt; Oliver Kadlcek; Oliver Pell

It is common for large hardware designs to have a number of registers or memories whose contents have to be changed very seldom (e.g., only at startup). The conventional way of accessing these memories is through a low-speed memory bus. This bus uses valuable hardware resources, introduces long global connections, and contributes to routing congestion. Hence, it has an impact on the overall design even though it is only rarely used. A Field-Programmable Gate Array (FPGA) already contains a global communication mechanism in the form of its configuration infrastructure. In this article, we evaluate the use of the configuration infrastructure as a replacement for a low-speed memory bus on the Maxeler HPC platform. We find that by removing the conventional low-speed memory bus, the maximum clock frequency of some applications can be improved by 8%. Improvements by 25% and more are also attainable, but constraints of the Xilinx reconfiguration infrastructure prevent fully exploiting these benefits at the moment. We present a number of possible changes to the Xilinx reconfiguration infrastructure and tools that would solve this and make these results more widely applicable.

design automation conference | 2015

Avoiding transitional effects in dynamic circuit specialisation on FPGAs

Karel Heyse; Dirk Stroobandt

Dynamic Circuit Specialisation (DCS) is a technique that uses the reconfigurability of an FPGA to optimise a circuit during run-time, thus achieving higher performance and lower resource cost. However, run-time reconfiguration causes transitional effects that form an important problem for DCS. Because of these, the DCS circuit cannot be used while it is being reconfigured. This limits the usability of DCS for streaming applications and other applications that cannot tolerate downtime. For other applications, this results in a loss of performance. In this paper, we present a technique to perform partial reconfiguration for DCS without transitional effects, thus allowing the circuit to remain fully functional at all times. The proposed method performs DCS by reconfiguring only LookUp Tables of the FPGA and does not require changes to the configuration architecture of the FPGA. The approach was tested and evaluated on current Xilinx FPGAs.

ACM Transactions on Reconfigurable Technology and Systems | 2015

Identification of Dynamic Circuit Specialization Opportunities in RTL Code

Tom Davidson; Elias Vansteenkiste; Karel Heyse; Karel Bruneel; Dirk Stroobandt

Dynamic Circuit Specialization (DCS) optimizes a Field-Programmable Gate Array (FPGA) design by assuming a set of its input signals are constant for a reasonable amount of time, leading to a smaller and faster FPGA circuit. When the signals actually change, a new circuit is loaded into the FPGA through runtime reconfiguration. The signals the design is specialized for are called parameters. For certain designs, parameters can be selected so the DCS implementation is both smaller and faster than the original implementation. However, DCS also introduces an overhead that is difficult for the designer to take into account, making it hard to determine whether a design is improved by DCS or not. This article presents extensive results on a profiling methodology that analyses Register-Transfer Level (RTL) implementations of applications to check if DCS would be beneficial. It proposes to use the functional density as a measure for the area efficiency of an implementation, as this measure contains both the overhead and the gains of a DCS implementation. The first step of the methodology is to analyse the dynamic behaviour of signals in the design, to find good parameter candidates. The overhead of DCS is highly dependent on this dynamic behaviour. A second stage calculates the functional density for each candidate and compares it to the functional density of the original design. The profiling methodology resulted in three implementations of a profiling tool, the DCS-RTL profiler. The execution time, accuracy, and the quality of each implementation is assessed based on data from 10 RTL designs. All designs, except for the two 16-bit adaptable Finite Impulse Response (FIR) filters, are analysed in 1 hour or less.

Explore More