Koen Danckaert
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Koen Danckaert.
ACM Transactions on Design Automation of Electronic Systems | 2001
Preeti Ranjan Panda; Francky Catthoor; Nikil D. Dutt; Koen Danckaert; Erik Brockmeyer; Chidamber Kulkarni; A Vandercappelle; Per Gunnar Kjeldsberg
We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation. We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.
IEEE Design & Test of Computers | 2001
Francky Catthoor; Koen Danckaert; Sven Wuytack; Nikil D. Dutt
Platform-independent source code transformations can greatly help alleviate the data-transfer and storage bottleneck. This article covers global data-flow, loop, and data-reuse-related transformations, and discusses their effect on data transfer and storage, processor partitioning, and parallelization.
IEEE Transactions on Very Large Scale Integration Systems | 1999
Koen Danckaert; Kostas Masselos; F. Cathoor; H.J. De Man; Costas E. Goutis
Application studies in the areas of image- and video-processing indicate that between 50%-80% of the power cost in these systems is due to data storage and transfers. This is especially true for multiprocessor realizations because conventional parallelization methods ignore the power cost and focus only on performance. However, the power consumption also heavily depends on the way a system is parallelized. To reduce this dominant cost, we propose to address the system-level storage organization for the multidimensional signals as a first step in mapping these applications, before the parallelization or partitioning decisions (in particular, before the hardware/software (HW/SW) partitioning, which is traditionally done too early in the design trajectory). Our methodology is illustrated on a parallel quadtree-structured difference pulse-code modulation video codec.
compilers, architecture, and synthesis for embedded systems | 2000
Koen Danckaert; Francky Catthoor; Hugo De Man
We show a new approach for globally applied automatic loop transformations, to optimize data transfer and storage in embedded multi-media applications. The approach makes use of an extended polytope model, in which loop nests are modeled by polytopes, and all polytopes are considered at the same time to perform global loop transformations. Transformations are done in two separate steps: rst all polytopes are placed in a common iteration space, and afterwards an ordering (for single-processor target architectures) or a space-time mapping (for parallel target architectures) is de ned in that common space. The methodology is illustrated on a simple example, and preliminary results for some representative applications are given.
international symposium on low power electronics and design | 1999
Kostas Masselos; Koen Danckaert; Francky Catthoor; Costas E. Goutis; H. DeMan
A methodology for power efficient partitioning of real-time data-dominated system specifications is presented. The proposed methodology aims at reducing the memory requirements in realizations of such applications by applying extensive code transformations in the initial system specification before partitioning over processors. This reorganization basically aligns the data production and consumption between the different procedures of the initial specification thus reducing the memory size requirements (and the resulting power) of the systems realizations especially those in the interfaces between different processors. The main novel contribution is that performance issues are explicitly taken into account during power oriented system-level transformations. The proposed methodology can be applied both in a parallel (programmable) processor context and also in heterogeneous hardware-software architectures.
Journal of Systems Architecture | 1999
Koen Danckaert; Kostas Masselos; Francky Catthoor; Hugo De Man
Application studies indicate that between 50% and 80% of the power cost in image and video processing systems is due to data storage and transfers. This is especially true for multi-processor realizations, because conventional parallelization strategies ignore this cost and focus only on the performance, whereas the power consumption also depends heavily on the way a system is parallelized. We will demonstrate the impact of processor partitioning on the memory requirements by exploring a QSDPCM video codec realization. Furthermore, we show that a strategy for combined task and data parallelism exploration leads to a significant power reduction.
Vlsi Design | 2001
Koen Danckaert; Chidamber Kulkarni; Francky Catthoor; Hugo De Man; Vivek Tiwari
Multimedia algorithms deal with enormous amounts of data transfers and storage, resulting in huge bandwidth requirements at the off-chip memory and system bus level. As a result the related energy consumption becomes critical. Even for execution time the bottleneck can shift from the CPU to the external bus load. This paper demonstrates a systematic software approach to reduce this system bus load. It consists of source-to-source code transformations, that have to be applied before the conventional ILP compilation. To illustrate this we use a cavity detection algorithm for medical imaging, that is mapped on an Intel Pentium® II processor.
signal processing systems | 2000
Kostas Masselos; Koen Danckaert; Francky Catthoor; N. Zervas; Costas E. Goutis; H. De Man
A specification refinement methodology for the power efficient partitioning of real-time data-dominated algorithms is presented. The main idea of the proposed methodology is the reorganization with respect to data transfer and storage of the initial description of the target algorithm before conventional partitioning. This is achieved through the application of data transfer and storage optimizing high-level code transformations to the initial description of the target algorithm. These transformations basically align the data production and consumption between the different procedures of the initial specification thus reducing the memory size requirements of the systems realizations especially those in the interfaces between different processors. In this way the data transfer and storage related power consumption which forms an important part of the total power budget of a data dominated system is significantly reduced. Performance issues are explicitly taken into account during the application of the data transfer and storage high-level transformations. The proposed methodology can be applied both in a parallel (programmable) processor context and also in heterogeneous hardware-software architectures. The proposed methodology can be also used for the power efficient implementation of data dominated algorithms on architectures based on programmable cores and application specific memory hierarchies. Experimental results from real life applications prove the impact of the proposed methodology.
The Electrical Engineering Handbook | 2005
Francky Catthoor; Erik Brockmeyer; Koen Danckaert; Chidamber Kulkani; Lode Nachtergaele; Arnout Vandecappelle
This chapter investigates several building blocks for memory storage, with the emphasis on their internal architectural organization. It presents a general classification of the main memory components for customized organizations, including register files and on-chip SRAM and DRAMs and explains off-chip and global hierarchical memory organization issues. Apart from the storage architecture itself, the way data are mapped to these architecture components is important for a good overall memory management solution. In current practice, designers usually go for the highest speed implementation for most submodules of a complex system, even when real-time constraints apply for the global design. Moreover, the design tools for exploration support (e.g., compilers and system synthesis tools) focus mainly on the performance aspect. The system cost, however can often be significantly reduced by system-level code transformations or trading-off cycles spent in different submodules. Therefore, this chapter also discusses different aspects of data transfer and storage exploration. The main emphasis lies on custom processor contexts. Realistic multimedia and telecom applications are used to demonstrate the impressive effects of such techniques.
international conference on vlsi design | 2001
Koen Danckaert; Chidamber Kulkarni; Francky Catthoor; H. De Man; Vivek Tiwari
Multimedia algorithms deal with enormous amounts of data transfers and storage, resulting in huge bandwidth requirements at the off-chip memory and system bus level. As a result the related energy consumption becomes critical. Even for execution time the bottleneck can shift from the CPU to the external bus load. This paper demonstrates a systematic software approach to reduce this system bus load. It consists of source-to-source code transformations, that have to be applied before the conventional ILP compilation. To illustrate this we use a cavity detection algorithm for medical imaging, that is mapped on a general-purpose programmable processor.