Koen Bertels | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Koen Bertels is active.

Explore More

Publication

Featured researches published by Koen Bertels.

IEEE Transactions on Computers | 2004

The MOLEN polymorphic processor

Stamatis Vassiliadis; Stephan Wong; Georgi Gaydadjiev; Koen Bertels; Georgi Kuzmanov; Elena Moscu Panainte

In this paper, we present a polymorphic processor paradigm incorporating both general-purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program: In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions, is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.

ACM Transactions on Reconfigurable Technology and Systems | 2011

The Instruction-Set Extension Problem: A Survey

Carlo Galuzzi; Koen Bertels

The extension of a given instruction-set with specialized instructions has become a common technique used to speed up the execution of applications. By identifying computationally intensive portions of an application to be partitioned in segments of code to execute in software and segments of code to execute in hardware, the execution of an application can be considerably speeded up. Each segment of code implemented in hardware can then be seen as a specialized application-specific instruction extending a given instruction-set. Although a number of approaches exist in literature proposing different methodologies to customize an instruction-set, the description of the problem consists only of sporadic comparisons limited to isolated problems. This survey presents a unique detailed description of the problem and provides an exhaustive overview of the research in the past years in instruction-set extension. This article presents a thorough analysis of the issues involved during the customization of an instruction-set by means of a set of specialized application-specific instructions. The investigation of the problem covers both instruction generation and instruction selection and different kinds of customizations are analyzed in a great detail.

field-programmable logic and applications | 2007

DWARV: Delftworkbench Automated Reconfigurable VHDL Generator

Yana Yankova; Georgi Kuzmanov; Koen Bertels; Georgi Gaydadjiev; Yi Lu; Stamatis Vassiliadis

In this paper, we present the DWARV C-to-VHDL generation toolset. The toolset provides support for broad range of application domains. It exploits the operation parallelism, available in the algorithms. Our designs are generated with a view of actual hardware/software co-execution on a real hardware platform. The carried experiments on the MOLEN polymorphic processor prototype suggest overall application speedups between 1.4x and 6.8x, corresponding to 13% to 94% of the theoretically achievable maximums, constituted by Amdahls law.

field-programmable logic and applications | 2007

MORPHEUS: Heterogeneous Reconfigurable Computing

Florian Thoma; Matthias Kühnle; Philippe Bonnot; Elena Moscu Panainte; Koen Bertels; Sebastian Goller; Axel Schneider; Stephane Guyetant; Eberhard Schüler; Klaus D. Müller-Glaser; Jürgen Becker

Reconfigurable architectures and NoC (network-on-chip) communication systems have introduced new research directions for technology and flexibility issues, which have been largely investigated in the last decades. Exploiting the flexibility of reconfigurable architectures, the run-time adap-tivity through run-time reconfiguration, opens a new area of research by considering dynamic reconfiguration. Since software parts of an embedded system can also be included into reconfigurable hardware by integration of an IP-based microcontroller, the reconfigurable architecture provides a flexible, multi-adaptive heterogeneous platform forHW/SW co-design. In this paper, we present the European integrated project MORPHEUS (1ST 027342). Its goal is to develop new heterogeneous reconfigurable SoCs with various sizes of reconfiguration granularity and to provide an integrated toolset of spatial and sequential design that can be used for mapping and execution of the target applications. Additionally a NoC approach is included in order to demonstrate the mentioned benefits and scalability for actual and future SoC design. The power of this approach will be demonstrated with four applications from the industrial environment.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2016

A Survey and Evaluation of FPGA High-Level Synthesis Tools

Razvan Nane; Vlad Mihai Sima; Christian Pilato; Jongsok Choi; Blair Fort; Andrew Canis; Yu Ting Chen; Hsuan Hsiao; Stephen Dean Brown; Fabrizio Ferrandi; Jason Helge Anderson; Koen Bertels

High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing todays system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.

ACM Transactions in Embedded Computing Systems | 2007

The Molen compiler for reconfigurable processors

Elena Moscu Panainte; Koen Bertels; Stamatis Vassiliadis

In this paper, we describe the compiler developed to target the Molen reconfigurable processor and programming paradigm. The compiler automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on the reconfigurable hardware. For the IBM PowerPC 405 processor included in the Virtex II Pro platform FPGA, we implemented code generation, register, and stack frame allocation following the PowerPC EABI (embedded application binary interface). The PowerPC backend has been extended to generate the appropriate instructions for the reconfigurable hardware and data transfer, taking into account the information of the specific hardware implementations and system. Starting with an annotated C application, a complete design flow has been integrated to generate the executable bitstream for the reconfigurable processor. The flexible design of the proposed infrastructure allows to consider the special features of the reconfigurable architectures. In order to hide the reconfiguration latencies, we implemented an instruction-scheduling algorithm for the dynamic hardware configuration instructions. The algorithm schedules, in advance, the hardware configuration instructions, taking into account the conflicts for the reconfigurable hardware resources (FPGA area) between the hardware operations. To verify the Molen compiler, we used the multimedia video frame M-JPEG encoder of which the extended discrete cosine transform (DCT*) function was mapped on the FPGA. We obtained an overall speedup of 2.5 (about 84% efficiency over the maximal theoretical speedup of 2.96). The performance efficiency is achieved using automatically generated nonoptimized DCT* hardware implementation. The instruction-scheduling algorithm has been tested for DCT, quantization, and VLC operations. Based on simulation results, we determine that, while a simple scheduling produces a significant performance decrease, our proposed scheduling contributes for up to 16x M-JPEG encoder speedup.

international conference on e science | 2006

Market-Based Resource Allocation in Grids

Behnaz Pourebrahimi; Koen Bertels; G. M. Kandru; Stamatis Vassiliadis

The core goal of resource management is to establish a mutual agreement between a resource producer and a resource consumer by which the provider agrees to supply a capability that can be used to perform some tasks on behalf of the consumer. Market-based approaches introduce money and pricing as the technique for coordination between consumers and producers of resources. In this paper, we propose a market-based mechanism to allocate computational resources (CPU time) with a single central Market in a local Grid. In such a network whenever any node can offer idle CPU time to the Grid and whenever a node has some tasks waiting for free CPU, it may request the resource from the Grid. In our approach, consumers and producers are autonomous agents that make their own decisions according to their capabilities and their local knowledge. Continuous Double Auction model is used as a technique using which these selfish agents can coordinate their work and make their decision. The performance of this mechanism is evaluated and is compared with the simple FCFS mechanism.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2004

The Molen Programming Paradigm

Stamatis Vassiliadis; Georgi Gaydadjiev; Koen Bertels; Elena Moscu Panainte

In this paper we present the Molen programming paradigm, which is a sequential consistency paradigm for programming Custom Computing Machines (CCM). The programming paradigm allows for modularity and provides mechanisms for explicit parallel execution. Furthermore it requires only few instructions to be added in an architectural instruction set while allowing an almost arbitrary number of op-codes per user to be used in a CCM. A number of programming examples and discussion is provided in order to clarify the operation, sequence control and parallelism of the proposed programming paradigm.

international conference on hardware/software codesign and system synthesis | 2006

Automatic selection of application-specific instruction-set extensions

Carlo Galuzzi; Elena Moscu Panainte; Yana Yankova; Koen Bertels; Stamatis Vassiliadis

In this paper, we present a general and an efficient algorithm for automatic selection of new application-specific instructions under hardware resources constraints. The instruction selection is formulated as an ILP problem and efficient solvers can be used for finding the optimal solution. An important feature of our algorithm is that it is not restricted to basic-block level nor does it impose any limitation on the number of the newly added instructions or on the number of the inputs/outputs of these instructions. The presented results show that a significant overall application speedup is achieved even for large kernels (for ADPCM decoder the speedup ranges from times1.2 to times3.7) and that our algorithm compares well with other state-of-art algorithms for automatic instruction set extensions.

design, automation, and test in europe | 2015

Memristor based computation-in-memory architecture for data-intensive applications

Said Hamdioui; Lei Xie; Hoang Anh Du Nguyen; Mottaqiallah Taouil; Koen Bertels; Henk Corporaal; Hailong Jiao; Francky Catthoor; Dirk Wouters; Linn Eike; Jan van Lunteren

One of the most critical challenges for todays and future data-intensive and big-data problems is data storage and analysis. This paper first highlights some challenges of the new born Big Data paradigm and shows that the increase of the data size has already surpassed the capabilities of todays computation architectures suffering from the limited bandwidth, programmability overhead, energy inefficiency, and limited scalability. Thereafter, the paper introduces a new memristor-based architecture for data-intensive applications. The potential of such an architecture in solving data-intensive problems is illustrated by showing its capability to increase the computation efficiency, solving the communication bottleneck, reducing the leakage currents, etc. Finally, the paper discusses why memristor technology is very suitable for the realization of such an architecture; using memristors to implement dual functions (storage and logic) is illustrated.

Explore More