Tom Davidson
Ghent University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tom Davidson.
field-programmable logic and applications | 2013
Karel Heyse; Tom Davidson; Elias Vansteenkiste; Karel Bruneel; Dirk Stroobandt
Fine grained Field Programmable Gate Arrays (FPGA) are complex to program and therefore suffer from high development costs. To solve this problem, Virtual Coarse Grained Reconfigurable Arrays (Virtual CGRA), or CGRAs implemented on FPGAs, have been proposed. Conventional implementations of VCGRAs use functional FPGA resources, such as LookUp Tables, to implement the virtual switch blocks, registers and other components that make the VCGRA configurable. We show that this is a large overhead that can often be avoided by mapping these components directly on lower level FPGA resources such as physical switch blocks and configuration memory. We show how this can be achieved using the tool flow for parameterised FPGA configurations and illustrate the advantages of this method by showing that an area reduction of 50% is attainable for a VCGRA aimed at regular expression matching.
Proceedings of the FPGA World Conference 2014 on | 2014
Amit Kulkarni; Karel Heyse; Tom Davidson; Dirk Stroobandt
Dynamic Circuit Specialization (DCS) is a technique used to optimize FPGA applications when some of the inputs, called parameters, are infrequently changing compared to other inputs. For every change of parameter input values, a specialized FPGA configuration is generated during run time and the FPGA is reconfigured with a specialized bitstream. We examine how the performance of the DCS technique evolves with the advent of newer Xilinx FPGA architectures. The performance of the DCS technique is evaluated on three different Xilinx FPGA architectures: Virtex-II Pro, Virtex-5 and Zynq SoC. We have used a 16-tap, 8-bit FIR filter as a parameterized design, with the filter coefficients as the parameters of the FIR design.
ACM Transactions on Design Automation of Electronic Systems | 2013
Fatma Mostafa Mohamed Ahmed Abouelella; Tom Davidson; Wim Meeus; Karel Bruneel; Dirk Stroobandt
Dynamic circuit specialization (DCS) is a technique used to implement FPGA applications where some of the input data, called parameters, change slowly compared to other inputs. Each time the parameter values change, the FPGA is reconfigured by a configuration that is specialized for those new parameter values. This specialized configuration is much smaller and faster than a regular configuration. However, the overhead associated with the specialization process should be minimized to achieve the desired benefits of using the DCS technique. This overhead is represented by both the FPGA resources needed to specialize the FPGA at runtime and by the specialization time. The introduction of parameterized configurations [Bruneel and Stroobandt 2008] has improved the efficiency of DCS implementations. However, the specialization overhead still takes a considerable amount of resources and time. In this article, we explore how to efficiently build DCS systems by presenting a variety of possible solutions for the specialization process and the overhead associated with each of them. We split the specialization process into two main phases: the evaluation and the configuration phase. The PowerPC embedded processor, the MicroBlaze, and a customized processor (CP) are used as alternatives in the evaluation phase. In the configuration phase, the ICAP and a custom configuration interface (SRL configuration) are used as alternatives. Each solution is used to implement a DCS system for three applications: an adaptive finite impulse response (FIR) filter, a ternary content-addressable memory (TCAM), and a regular expression matcher (RegEx). The experiments show that the use of our CP along with the SRL configuration achieves minimum overhead in terms of resources and time. Our CP is 1.8 and 3.5 times smaller than the PowerPC and the area-optimized implementation of the MicroBlaze, respectively. Moreover, the use of the CP enables a more compact representation for the parameterized configuration in comparison to both the PowerPC and the MicroBlaze processors. For instance, in the FIR, the parameterized configuration compiled for our CP is 6--7 times smaller than that for the embedded processors.
reconfigurable computing and fpgas | 2012
Tom Davidson; Fatma Mostafa Mohamed Ahmed Abouelella; Karel Bruneel; Dirk Stroobandt
Parameterised reconfiguration is a method for dynamic circuit specialization on FPGAs. The main advantage of this new concept is the high resource efficiency. Additionally, there is an automated tool flow, TMAP, that converts a hardware design into a more resource-efficient run-time reconfigurable design without a large design effort. We will start by explaining the core principles behind the dynamic circuit specialization technique. Next, we show the possible gains in encryption applications using an AES encoder. Our AES design shows a 20.6% area gain compared to an unoptimized hardware implementation and a 5.3% gain compared to a manually optimized third-party hardware implementation. We also used TMAP on a Triple-DES and an RC6 implementation, where we achieve a 27.8% and a 72.7% LUT-area gain. In addition, we discuss a run-time reconfigurable DNA aligner. We focus on the optimizations to the dynamic specialization overhead. Our final design is up to 2.80-times more efficient on cheaper FPGAs than the original DNA aligner when at least one DNA sequence is longer than 758 characters. Most sequences in DNA alignment are of the order 213.
reconfigurable computing and fpgas | 2014
Amit Kulkarni; Tom Davidson; Karel Heyse; Dirk Stroobandt
Dynamic Circuit Specialization (DCS) is an optimization technique used for implementing a parameterized application on an FPGA. The application is said to be parameterized when some of its inputs, called parameters, are infrequently changing compared to the other inputs. Instead of implementing these parameter inputs as regular inputs, in the DCS approach these inputs are implemented as constants and the design is optimized for these constants. When the parameter values change, the design is re-optimized for the new constant values by reconfiguring the FPGA. It has been investigated that run-time reconfiguration speed is the limiting factor of the DCS implementations on Xilinx FPGAs. We propose an idea to constrain the designs placement and use the custom Xilinx HWICAP driver to improve reconfiguration speed at the cost of a small reduction in design performance. We use Xilinx Virtex-5 and Zynq-SoC as experimental platforms and we have used an 8-bit FIR filter with different tap configurations as our parameterized design whose filter coefficient values are infrequently changing inputs. A drastic improvement in the reconfiguration speed with a factor of 14 is achieved with only a ≈ 6% decrease in performance.
reconfigurable communication centric systems on chip | 2012
Marco D. Santambrogio; Dionisios N. Pnevmatikatos; Kyprianos Papadimitriou; Christian Pilato; Georgi Gaydadjiev; Dirk Stroobandt; Tom Davidson; Tobias Becker; Tim Todman; Wayne Luk; Alessandra Bonetto; Andrea Cazzaniga; Gianluca Durelli; Donatella Sciuto
Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics. Therefore, extending the functionality and the lifetime of products requires the addition of new functionality to track and satisfy the customers needs and market and technology trends. Many contemporary products along with the software part incorporate hardware accelerators for reasons of performance and power efficiency. While adaptivity of software is straightforward, adaptation of the hardware to changing requirements constitutes a challenging problem requiring delicate solutions. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform which includes a general purpose processor combined with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design time and at run time, the capabilities of partial dynamic reconfiguration. The goal is that for selected application domains, the FASTER toolchain will be able to reduce the design and verification time of complex reconfigurable systems providing additional novel verification features that are not available in existing tool flows.
parallel computing | 2012
Tom Davidson; Mattias Merlier; Karel Bruneel; Dirk Stroobandt
In this article we describe how to expand a partially dynamic reconfig- urable pattern matcher for regular expressions presented in previous work by Di- vyasree and Rajashekar [2]. The resulting, extended, pattern matcher is fully dynamically reconfigurable. First, the design is adapted for use with parameterisable configurations, a method for Dynamic Circuit Specialization. Using parameteris- able configurations allows us to achieve the same area gains as the hand crafted reconfigurable design, with the benefit that parameterisable configurations can be applied automatically. This results in a design that is more easily adaptable to spe- cific applications and allows for an easier design exploration. Additionally, the pa- rameterisable configuration implementation is also generated automatically, which greatly reduces the design overhead of using dynamic reconfiguration. Secondly, we propose a number of expansions to the original design to overcome several limitations in the original design that constrain the dynamic reconfigurability of the pattern matcher. We propose two different solutions to dynamically change the character that is matched in a certain block. The resulting pattern matcher, after these changes, is fully dynamically reconfigurable, all aspects of the implemented regular expression can be changed at run-time.
computational science and engineering | 2012
Kyprianos Papadimitriou; Christian Pilato; Dionisios N. Pnevmatikatos; Marco D. Santambrogio; Catalin Bogdan Ciobanu; Tim Todman; Tobias Becker; Tom Davidson; Xinyu Niu; Georgi Gaydadjiev; Wayne Luk; Dirk Stroobandt
During the last few years, there is an increasing interest in mixing software and hardware to serve efficiently different applications. This is due to the heterogeneity characterizing the tasks of an application which require the presence of resources from both worlds, software and hardware. Controlling effectively these resources through an integrated tool flow is a challenging problem and towards this direction only a few efforts exist. In fact, a framework that seamlessly exploits both resources of a platform for executing efficiently an application has not yet come into existence. Moreover, reconfigurable computing often incorporated in such platforms due to its high flexibility and customization, has not yet taken off due to the lack of exploiting its full capabilities. Thus, the capability of reconfigurable devices such as Field Programmable Gate Arrays (FPGAs) to be dynamically reconfigured, i.e. reprogramming part of the chip while other parts of the same chip remain functional, has not yet taken off even in small-scale basis. The inherent difficulty in using the tools to control this technology has kept it back from being adopted by academia and industry alike. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a design methodology and a tool flow that will enable designers to implement effectively and easily a system specification on a platform combining software and reconfigurable resources. The FASTER framework accepts as input a high-level description of the application and the architectural details of the target platform, and through certain steps it can enable the full use of the capabilities of the platform, while at the same time it should be flexible enough so as to balance efficiently performance, power and area. One of the main novelties is the incorporation of partial reconfiguration as an explicit design concept at an early stage of the design flow. We target different applications from the embedded, desktop and high-performance computing domains. In all cases we will demonstrate the effectiveness of the proposed framework in exploiting the inherent parallelism of applications and enabling the runtime adaptation of the platforms to the changing needs of the applications.
reconfigurable computing and fpgas | 2010
Tom Davidson; Karel Bruneel; Dirk Stroobandt
Parameterisable configurations allow very fast run-time reconfiguration in FPGAs. The main advantage of this new concept is the automated tool flow that converts a hardware design into a more resource-efficient run-time reconfigurable design without a large design effort. In this paper, we show that the automated tool flow for run-time reconfiguration can be used to easily optimize a full hardware implementation for area by converting it automatically to a hardware/software implementation. This tool flow can partition the design in a very short time and, at the same time, result in significant area gains. The usage of run time reconfiguration allows us to extend the hardware/software boundary so more functionality can be moved to software. We will explain the core principles behind the run-time reconfiguration technique using the AES encoder as an example. For the AES encoder the manual hardware/software partitioning is clear. This manual partitioning will serve as a comparison to the automated partitioning that uses parameterisable configurations. Several possible AES encoder implementations are compared. Our automatically partitioned AES design shows a 20.6 % area gain compared to an unoptimized hardware implementation and a 5.3 % gain compared to a manually optimized 3rd party hardware implementation. In addition, we discuss the results of our technique on other applications, where the hardware/software partitioning is less clear. Among these, a TripleDES implementation shows a 29.3 % area gain using our technique. Based on our AES encoder results, we derive some guidelines for optimizing the impact of parameterisable configurations in general designs.
ACM Transactions on Reconfigurable Technology and Systems | 2015
Tom Davidson; Elias Vansteenkiste; Karel Heyse; Karel Bruneel; Dirk Stroobandt
Dynamic Circuit Specialization (DCS) optimizes a Field-Programmable Gate Array (FPGA) design by assuming a set of its input signals are constant for a reasonable amount of time, leading to a smaller and faster FPGA circuit. When the signals actually change, a new circuit is loaded into the FPGA through runtime reconfiguration. The signals the design is specialized for are called parameters. For certain designs, parameters can be selected so the DCS implementation is both smaller and faster than the original implementation. However, DCS also introduces an overhead that is difficult for the designer to take into account, making it hard to determine whether a design is improved by DCS or not. This article presents extensive results on a profiling methodology that analyses Register-Transfer Level (RTL) implementations of applications to check if DCS would be beneficial. It proposes to use the functional density as a measure for the area efficiency of an implementation, as this measure contains both the overhead and the gains of a DCS implementation. The first step of the methodology is to analyse the dynamic behaviour of signals in the design, to find good parameter candidates. The overhead of DCS is highly dependent on this dynamic behaviour. A second stage calculates the functional density for each candidate and compares it to the functional density of the original design. The profiling methodology resulted in three implementations of a profiling tool, the DCS-RTL profiler. The execution time, accuracy, and the quality of each implementation is assessed based on data from 10 RTL designs. All designs, except for the two 16-bit adaptable Finite Impulse Response (FIR) filters, are analysed in 1 hour or less.