Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Koen Danckaert is active.

Publication


Featured researches published by Koen Danckaert.


ACM Transactions on Design Automation of Electronic Systems | 2001

Data and memory optimization techniques for embedded systems

Preeti Ranjan Panda; Francky Catthoor; Nikil D. Dutt; Koen Danckaert; Erik Brockmeyer; Chidamber Kulkarni; A Vandercappelle; Per Gunnar Kjeldsberg

We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation. We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.


IEEE Design & Test of Computers | 2001

Code transformations for data transfer and storage exploration preprocessing in multimedia processors

Francky Catthoor; Koen Danckaert; Sven Wuytack; Nikil D. Dutt

Platform-independent source code transformations can greatly help alleviate the data-transfer and storage bottleneck. This article covers global data-flow, loop, and data-reuse-related transformations, and discusses their effect on data transfer and storage, processor partitioning, and parallelization.


IEEE Transactions on Very Large Scale Integration Systems | 1999

Strategy for power-efficient design of parallel systems

Koen Danckaert; Kostas Masselos; F. Cathoor; H.J. De Man; Costas E. Goutis

Application studies in the areas of image- and video-processing indicate that between 50%-80% of the power cost in these systems is due to data storage and transfers. This is especially true for multiprocessor realizations because conventional parallelization methods ignore the power cost and focus only on performance. However, the power consumption also heavily depends on the way a system is parallelized. To reduce this dominant cost, we propose to address the system-level storage organization for the multidimensional signals as a first step in mapping these applications, before the parallelization or partitioning decisions (in particular, before the hardware/software (HW/SW) partitioning, which is traditionally done too early in the design trajectory). Our methodology is illustrated on a parallel quadtree-structured difference pulse-code modulation video codec.


compilers, architecture, and synthesis for embedded systems | 2000

A preprocessing step for global loop transformations for data transfer optimization

Koen Danckaert; Francky Catthoor; Hugo De Man

We show a new approach for globally applied automatic loop transformations, to optimize data transfer and storage in embedded multi-media applications. The approach makes use of an extended polytope model, in which loop nests are modeled by polytopes, and all polytopes are considered at the same time to perform global loop transformations. Transformations are done in two separate steps: rst all polytopes are placed in a common iteration space, and afterwards an ordering (for single-processor target architectures) or a space-time mapping (for parallel target architectures) is de ned in that common space. The methodology is illustrated on a simple example, and preliminary results for some representative applications are given.


international symposium on low power electronics and design | 1999

A methodology for power efficient partitioning of data-dominated algorithm specifications within performance constraints

Kostas Masselos; Koen Danckaert; Francky Catthoor; Costas E. Goutis; H. DeMan

A methodology for power efficient partitioning of real-time data-dominated system specifications is presented. The proposed methodology aims at reducing the memory requirements in realizations of such applications by applying extensive code transformations in the initial system specification before partitioning over processors. This reorganization basically aligns the data production and consumption between the different procedures of the initial specification thus reducing the memory size requirements (and the resulting power) of the systems realizations especially those in the interfaces between different processors. The main novel contribution is that performance issues are explicitly taken into account during power oriented system-level transformations. The proposed methodology can be applied both in a parallel (programmable) processor context and also in heterogeneous hardware-software architectures.


Journal of Systems Architecture | 1999

Strategy for power efficient combined task and data parallelism exploration illustrated on a QSDPCM video codec

Koen Danckaert; Kostas Masselos; Francky Catthoor; Hugo De Man

Application studies indicate that between 50% and 80% of the power cost in image and video processing systems is due to data storage and transfers. This is especially true for multi-processor realizations, because conventional parallelization strategies ignore this cost and focus only on the performance, whereas the power consumption also depends heavily on the way a system is parallelized. We will demonstrate the impact of processor partitioning on the memory requirements by exploring a QSDPCM video codec realization. Furthermore, we show that a strategy for combined task and data parallelism exploration leads to a significant power reduction.


Vlsi Design | 2001

A systematic approach to reduce the system bus load and power in multimedia algorithms

Koen Danckaert; Chidamber Kulkarni; Francky Catthoor; Hugo De Man; Vivek Tiwari

Multimedia algorithms deal with enormous amounts of data transfers and storage, resulting in huge bandwidth requirements at the off-chip memory and system bus level. As a result the related energy consumption becomes critical. Even for execution time the bottleneck can shift from the CPU to the external bus load. This paper demonstrates a systematic software approach to reduce this system bus load. It consists of source-to-source code transformations, that have to be applied before the conventional ILP compilation. To illustrate this we use a cavity detection algorithm for medical imaging, that is mapped on an Intel Pentium® II processor.


signal processing systems | 2000

A Specification Refinement Methodology for Power Efficient Partitioning of Data-Dominated Algorithms Within Performance Constraints

Kostas Masselos; Koen Danckaert; Francky Catthoor; N. Zervas; Costas E. Goutis; H. De Man

A specification refinement methodology for the power efficient partitioning of real-time data-dominated algorithms is presented. The main idea of the proposed methodology is the reorganization with respect to data transfer and storage of the initial description of the target algorithm before conventional partitioning. This is achieved through the application of data transfer and storage optimizing high-level code transformations to the initial description of the target algorithm. These transformations basically align the data production and consumption between the different procedures of the initial specification thus reducing the memory size requirements of the systems realizations especially those in the interfaces between different processors. In this way the data transfer and storage related power consumption which forms an important part of the total power budget of a data dominated system is significantly reduced. Performance issues are explicitly taken into account during the application of the data transfer and storage high-level transformations. The proposed methodology can be applied both in a parallel (programmable) processor context and also in heterogeneous hardware-software architectures. The proposed methodology can be also used for the power efficient implementation of data dominated algorithms on architectures based on programmable cores and application specific memory hierarchies. Experimental results from real life applications prove the impact of the proposed methodology.


The Electrical Engineering Handbook | 2005

2 – Custom Memory Organization and Data Transfer: Architectural Issues and Exploration Methods

Francky Catthoor; Erik Brockmeyer; Koen Danckaert; Chidamber Kulkani; Lode Nachtergaele; Arnout Vandecappelle

This chapter investigates several building blocks for memory storage, with the emphasis on their internal architectural organization. It presents a general classification of the main memory components for customized organizations, including register files and on-chip SRAM and DRAMs and explains off-chip and global hierarchical memory organization issues. Apart from the storage architecture itself, the way data are mapped to these architecture components is important for a good overall memory management solution. In current practice, designers usually go for the highest speed implementation for most submodules of a complex system, even when real-time constraints apply for the global design. Moreover, the design tools for exploration support (e.g., compilers and system synthesis tools) focus mainly on the performance aspect. The system cost, however can often be significantly reduced by system-level code transformations or trading-off cycles spent in different submodules. Therefore, this chapter also discusses different aspects of data transfer and storage exploration. The main emphasis lies on custom processor contexts. Realistic multimedia and telecom applications are used to demonstrate the impressive effects of such techniques.


international conference on vlsi design | 2001

A systematic approach for system bus load reduction applied to medical imaging

Koen Danckaert; Chidamber Kulkarni; Francky Catthoor; H. De Man; Vivek Tiwari

Multimedia algorithms deal with enormous amounts of data transfers and storage, resulting in huge bandwidth requirements at the off-chip memory and system bus level. As a result the related energy consumption becomes critical. Even for execution time the bottleneck can shift from the CPU to the external bus load. This paper demonstrates a systematic software approach to reduce this system bus load. It consists of source-to-source code transformations, that have to be applied before the conventional ILP compilation. To illustrate this we use a cavity detection algorithm for medical imaging, that is mapped on a general-purpose programmable processor.

Collaboration


Dive into the Koen Danckaert's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Erik Brockmeyer

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hugo De Man

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Kostas Masselos

University of Peloponnese

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nikil D. Dutt

University of California

View shared research outputs
Top Co-Authors

Avatar

Arnout Vandecappelle

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

H. De Man

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge