Andreas Kanstein | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreas Kanstein is active.

Explore More

Publication

Featured researches published by Andreas Kanstein.

applied reconfigurable computing | 2007

Architectural exploration of the ADRES coarse-grained reconfigurable array

Frank Bouwens; Mladen Berekovic; Andreas Kanstein; Georgi Gaydadjiev

Reconfigurable computational architectures are envisioned to deliver power efficient, high performance, flexible platforms for embedded systems design. The coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) and its compiler offer a tool flow to design sparsely interconnected 2D array processors with an arbitrary number of functional units, register files and interconnection topologies. This article presents an architectural exploration methodology and its results for the first implementation of the ADRES architecture on a 90nm standard-cell technology. We analyze performance, energy and power trade-offs for two typical kernels from the multimedia and wireless domains: IDCT and FFT. Architecture instances of different sizes and interconnect structures are evaluated with respect to their power versus performance trade-offs. An optimized architecture is derived. A detailed power breakdown for the individual components of the selected architecture is presented.

design, automation, and test in europe | 2007

Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter

C. Arbelo; Andreas Kanstein; Sebastián López; José Francisco López; Mladen Berekovic; Roberto Sarmiento; Jean-Yves Mignolet

Deblocking filtering represents one of the most compute intensive tasks in an H.264/AVC standard video decoder due to its demanding memory accesses and irregular data flow. For these reasons, an efficient implementation poses big challenges, especially for programmable platforms. In this sense, the mapping of this decoders functionality onto a C-programmable coarse-grained reconfigurable architecture named ADRES (architecture for dynamically reconfigurable embedded systems) is presented in this paper, including results from the evaluation of different topologies. The results obtained show a considerable reduction in the number of cycles and memory accesses needed to perform the filtering as well as an increase in the degree of instruction parallelism (ILP) when compared with an implementation on a very long instruction word (VLIW) dedicated processor. This demonstrates that high ILP is achievable on the ADRES even for irregular, data-dependent kernels

applied reconfigurable computing | 2007

Mt-ADRES: multithreading on coarse-grained reconfigurable architecture

Kehuai Wu; Andreas Kanstein; Jan Madsen; Mladen Berekovic

The coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) and its compiler offer high instruction-level parallelism (ILP) to applications by means of a sparsely interconnected array of functional units and register files. As high-ILP architectures achieve only low parallelism when executing partially sequential code segments, which is also known as Amdahls law, this paper proposes to extend ADRES to MT-ADRES (Multi-Threaded ADRES) to also exploit thread-level parallelism. On MT-ADRES architectures, the array can be partitioned in multiple smaller arrays that can execute threads in parallel. Because the partition can be changed dynamically, this extension provides more flexibility than a multi-core approach. This article presents details of the enhanced architecture and results obtained from an MPEG-2 decoder implementation that exploits a mix of thread-level parallelism and instruction-level parallelism.

International Journal of Electronics | 2008

MT-ADRES : multi-threading on coarse-grained reconfigurable architecture

Kehuai Wu; Andreas Kanstein; Jan Madsen; Mladen Berekovic

The coarse-grained reconfigurable architecture ADRES (architecture for dynamically reconfigurable embedded systems) and its compiler offer high instruction-level parallelism (ILP) to applications by means of a sparsely interconnected array of functional units and register files. As high-ILP architectures achieve only low parallelism when executing partially sequential code segments, which is also known as Amdahls law, this article proposes to extend ADRES to MT-ADRES (multi-threaded ADRES) to also exploit thread-level parallelism. On MT-ADRES architectures, the array can be partitioned in multiple smaller arrays that can execute threads in parallel. Because the partition can be changed dynamically, this extension provides more flexibility than a multi-core approach. This article presents details of the enhanced architecture and results obtained from an MPEG-2 decoder implementation that exploits a mix of thread-level parallelism and instruction-level parallelism.

rapid system prototyping | 2009

High-Level System Modeling for Rapid HW/SW Architecture Exploration

Chafic Jaber; Andreas Kanstein; Ludovic Apvrille; Amer Baghdadi; Patricia Le Moenner; Renaud Pacalet

The increasing complexity of system-on-chip design – especially the software part of those systems – has stimulated much research work on design space exploration at the early stages of system development. In this paper we propose a new methodology for system modeling based on a specific UML profile. It defines a high design abstraction level for modeling and analyzing hardware resource sharing between system elements. Additionally, a SystemC-based simulator is developed in order to simulate modeled systems and evaluate their performance. Due to the high level of abstraction, the developed simulator enables fast exploration of design solutions.First promising results are presented and discussed over a mobile platform for the 3GPP LTE protocol stack.

Microprocessors and Microsystems | 2009

Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor

Mladen Berekovic; Andreas Kanstein; Bingfeng Mei; Bjorn De Sutter

This paper introduces the mapping of MPEG video decoders on ADRES, IMECs new coarse-grain reconfigurable and fully C-programmable array processor that targets nomadic devices. ADRES is a flexible template that allows the instantiation of many different processor versions. An XML-based architecture description language allows a designer to easily generate different processor instances with full compiler support by specifying different values for the communication topology, the number and size of local register files and functional units and supported instruction set. ADRES supports a VLIW-like programming model with a pure VLIW mode for legacy code, and a (coarse-grain reconfigurable) array mode with very high parallelism for the processing of compute intensive loops. We demonstrate the mapping of two video decoders MPEG-2 and AVC, and discuss the performance trade-offs for two critical kernels: IDCT and integer transform. As a result, an ADRES based system can perform AVC decoding in CIF resolution with less then 50MHz on a 4x4 array processor.

Proceedings of SPIE | 2007

Optimizing coarse-grain reconfigurable hardware utilization through multiprocessing: an H.264/AVC decoder example

Andreas Kanstein; Sebastian López Suárez; Bjorn De Sutter

Coarse-grained reconfigurable architectures offer high execution acceleration for code which has high instruction-level parallelism (ILP), typically for large kernels in DSP applications. However for applications with a larger part of control code and many smaller kernels, as present in modern video compression algorithms, the achievable acceleration through ILP is significantly reduced. We introduce a multi-processing extension to the coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) to deal with this kind of applications, by enabling it to exploit thread-level parallelism (TLP). This extension consists of a partitioning of an ADRES array into non-overlapping parts, where every partition can execute a processing thread independently, or a processing thread can be assigned to hierarchically combined partitions which provide a larger number of resources. Because the combining of partitions can be changed dynamically, this extension provides more flexibility than a multi-core approach. This paper discusses the architecture and an exploration into how to potentially partition a given array for executing an H.264/AVC baseline decoder.

applied reconfigurable computing | 2006

Hardware and a Tool Chain for ADRES

Bjorn De Sutter; Bingfeng Mei; Andrei Bartic; Tom Vander Aa; Mladen Berekovic; Jean-Yves Mignolet; Kris Croes; Paul Coene; Miro Cupac; Aı̈ssa Couvreur; Andy Folens; Steven Dupont; Bert Van Thielen; Andreas Kanstein; Hong-seok Kim; Suk Jin Kim

Until recently, only a compiler and a high-level simulator of the reconfigurable architecture ADRES existed. This paper focuses on the problems that needed to be solved when moving from a software-only view on the architecture to a real hardware implementation, as well as on the verification process of all involved tools.

2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference | 2009

Shared resources high-level modeling in embedded systems using virtual nodes

Chafic Jaber; Andreas Kanstein; Ludovic Apvrille; Amer Baghdadi; Renaud Pacalet

The increasing complexity of system-on-chip design and shorter time to market constraints has stimulated systems designers to investigate performance characteristics of the final system implementation in the early design stages, by means of modeling the design at a high level of abstraction. This paper presents the virtual node concept for modeling the shared resources of a system-on-chip, therefore specifically dedicated to the study of the impact of shared resources contention on the overall systems performance, which is often defined by concurrent use cases. The overall approach is based on using a specific UML modeling profile and a SystemC-based simulator to execute models and analyze their performance.

Proceedings of SPIE | 2007

Toward the implementation of a baseline H.264/AVC decoder onto a reconfigurable architecture

Sebastián López; Andreas Kanstein; J.F. Lopez; Mladen Berekovic; Roberto Sarmiento; Jean-Yves Mignolet

The decoding of a H.264/AVC bitstream represents a complex and time-consuming task. Due to this reason, efficient implementations in terms of performance and flexibility are mandatory for real time applications. In this sense, the mapping of the motion compensation and deblocking filtering stages onto a coarse-grained reconfigurable architecture named ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) is presented in this paper. The results obtained show a considerable reduction in the number of cycles and memory accesses needed to perform the motion compensation as well as an increase in the degree of parallelism when compared with an implementation on a Very Long Instruction Word (VLIW) dedicated processor.

Explore More