Arnout Vandecappelle
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arnout Vandecappelle.
ACM Transactions on Design Automation of Electronic Systems | 2009
Stefan Valentin Gheorghita; Martin Palkovic; Juan Hamers; Arnout Vandecappelle; Stelios Mamagkakis; Twan Basten; Lieven Eeckhout; Henk Corporaal; Francky Catthoor; Frederik Vandeputte; Koen De Bosschere
In the past decade, real-time embedded systems have become much more complex due to the introduction of a lot of new functionality in one application, and due to running multiple applications concurrently. This increases the dynamic nature of todays applications and systems, and tightens the requirements for their constraints in terms of deadlines and energy consumption. State-of-the-art design methodologies try to cope with these novel issues by identifying several most used cases and dealing with them separately, reducing the newly introduced complexity. This article presents a generic and systematic design-time/run-time methodology for handling the dynamic nature of modern embedded systems, which can be utilized by existing design methodologies to increase their efficiency. It is based on the concept of system scenarios, which group system behaviors that are similar from a multidimensional cost perspective—such as resource requirements, delay, and energy consumption—in such a way that the system can be configured to exploit this cost similarity. At design-time, these scenarios are individually optimized. Mechanisms for predicting the current scenario at run-time, and for switching between scenarios, are also derived. This design trajectory is augmented with a run-time calibration mechanism, which allows the system to learn on-the-fly during its execution, and to adapt itself to the current input stimuli, by extending the scenario set, changing the scenario definitions, and both the prediction and switching mechanisms. To show the generality of our methodology, we show how it has been applied on four very different real-life design problems. In all presented case studies, substantial energy reductions were obtained by exploiting scenarios.
IEEE Design & Test of Computers | 2001
P. Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau; Francky Catthoor; Arnout Vandecappelle; Erik Brockmeyer; Chidamber Kulkarni; E. De Greef
In application-specific designs, customized memory organization expands the search space for cost-optimized solutions. Several optimization strategies can be applied to embedded systems with several different memory architectures: data cache, scratch-pad memory, custom memory architectures, and dynamic random-access memory (DRAM).
design automation conference | 1999
Arnout Vandecappelle; Miguel Miranda; Erik Brockmeyer; Francky Catthoor; Diederik Verkest
Successful exploration of system-level design decisions is impossible without fast and accurate estimation of the impact on the system cost. In most multimedia applications, the dominant cost factor is related to the organization of the memory architecture. This paper presents a systematic approach which allows-effective system-level exploration of memory organization design alternatives, based on accurate feedback by using our earlier developed tools. The effectiveness of this approach is illustrated on an industrial application. Applying our approach, a substantial, part of the design search space has been explored in a very short time, resulting in a cost-efficient solution which meets all design constraints.
international symposium on low power electronics and design | 2004
Edgar G. Daylight; David Atienza; Arnout Vandecappelle; Francky Catthoor; José M. Mendías
Embedded systems are evolving from traditional, stand-alone devices to devices that participate in Internet activity. The days of simple, manifest embedded software [e.g. a simple finite-impulse response (FIR) algorithm on a digital signal processor (DSP] are over. Complex, nonmanifest code, executed on a variety of embedded platforms in a distributed manner, characterizes next generation embedded software. One dominant niche, which we concentrate on, is embedded, multimedia software. The need is present to map large scale, dynamic, multimedia software onto an embedded system in a systematic and highly optimized manner. The objective of this paper is to introduce high-level, systematically applicable, data structure transformations and to show in detail the practical feasibility of our optimizations on three real-life multimedia case studies. We derive Pareto tradeoff points in terms of accesses versus memory footprint and obtain significant gains in execution time and power consumption with respect to the initial implementation choices. Our approach is a first step to systematically applying high-level data structure transformations in the context of memory-efficient and low-power multimedia systems.
international symposium on low power electronics and design | 2000
Erik Brockmeyer; Arnout Vandecappelle; Francky Catthoor
In contrast to current design practice for (programmable) processor mapping, which mainly targets performance, we focus on a systematic trade-off between cycle budget and energy consumed in the background memory organization. The latter is a crucial component in many of todays designs, including multimedia, network protocols and telecom signal processing. We have a systematic way and tool to explore both freedoms and to arrive at Pareto charts, in which for a given application the lowest cost implementation of the memory organization is plotted against the available cycle budget per submodule. This by making optimal usage of a parallelized memory architecture. We indicate, with results on a digital audio broadcasting receiver and an image compression demonstrator, how to effectively use the Pareto plot to gain significantly in overall system energy consumption within the global real-time constraints.
signal processing systems | 2000
Cédric Ghez; Miguel Miranda; Arnout Vandecappelle; Francky Catthoor; Diederik Verkest
Exploring data transfer and storage issues is crucial to efficiently map data intensive applications (e.g., multimedia) onto programmable processors. Code transformations are used to minimise main memory bus load and hence also power and system performance, However this typically incurs a considerable arithmetic overhead in the addressing and local control. For instance, memory optimising in-place and data-layout transformations add costly module and integer division operations to the initial addressing code. In this paper, we show how the cycle overhead can be almost completely removed. This is done according to a systematic methodology which is a combination of an algebraic transformation exploration approach for the (non)linear arithmetic on top of an efficient transformation technique for reducing the piece-wise linear indexing to linear pointer arithmetic. The approach is illustrated on a real-life medical application, using a variety of programmable processor architectures. Total gains in cycle count ranging between a factor 5 and 25 are obtained compared to conventional compilers.
asia and south pacific design automation conference | 2006
Qubo Hu; Arnout Vandecappelle; Martin Palkovic; Per Gunnar Kjeldsberg; Erik Brockmeyer; Francky Catthoor
Loop fusion and loop shifting are important transformations for improving data locality to reduce the number of costly accesses to off-chip memories. Since exploring the exact platform mapping for all the loop transformation alternatives is a time consuming process, heuristics steered by improved data locality are generally used. However, pure locality estimates do not sufficiently take into account the hierarchy of the memory platform. This paper presents a fast, incremental technique for hierarchical memory size requirement estimation for loop fusion and loop shifting at the early loop transformations design stage. As the exact memory platform is often not yet defined at this stage, we propose a platform-independent approach which reports the Pareto-optimal trade-off points for scratch-pad memory size and off-chip memory accesses. The estimation comes very close to the actual platform mapping. Experiments on realistic test-vehicles confirm that. It helps the designer or a tool to find the interesting loop transformations that should then be investigated in more depth afterward
international symposium on systems synthesis | 2001
Tycho van Meeuwen; Arnout Vandecappelle; Allert van Zelst; Francky Catthoor; Diederik Verkest
For data dominated applications, power consumption and memory bandwidth bottlenecks can be significantly alleviated with a custom memory organization. However, this potentially entails complex memory interconnections and a large routing overhead. This is undesirable for area cost, power consumption, and layout design complexity. By exploiting time-multiplexing opportunities over the long memory buses, this overhead can be significantly reduced. This paper proposes a system-level methodology for automated exploration of the interconnect architecture, which finds the optimal trade-off points for memory bus time-multiplexing. Experiments performed on real-life applications using our prototype tool show that even for very distributed memory organizations, the interconnect complexity can be significantly reduced to a cost-efficient, manageable level.
ACM Transactions on Design Automation of Electronic Systems | 2007
Qubo Hu; Per Gunnar Kjeldsberg; Arnout Vandecappelle; Martin Palkovic; Francky Catthoor
Modern embedded multimedia and telecommunications systems need to store and access huge amounts of data. This becomes a critical factor for the overall energy consumption, area, and performance of the systems. Loop transformations are essential to improve the data access locality and regularity in order to optimally design or utilize a memory hierarchy. However, due to abstract high-level cost functions, current loop transformation steering techniques do not take the memory platform sufficiently into account. They usually also result in only one final transformation solution. On the other hand, the loop transformation search space for real-life applications is huge, especially if the memory platform is still not fully fixed. Use of existing loop transformation techniques will therefore typically lead to suboptimal end-products. It is critical to find all interesting loop transformation instances. This can only be achieved by performing an evaluation of the effect of later design stages at the early loop transformation stage. This article presents a fast incremental hierarchical memory-size requirement estimation technique. It estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto a hierarchical memory platform. As the exact memory platform instantiation is often not yet defined at this high-level design stage, a platform-independent estimation is introduced with a Pareto curve output for each loop transformation instance. Comparison among the Pareto curves helps the designer, or a steering tool, to find all interesting loop transformation instances that might later lead to low-power data mapping for any of the many possible memory hierarchy instances. Initially, the source code is used as input for estimation. However, performing the estimation repeatedly from the source code is too slow for large search space exploration. An incremental approach, based on local updating of the previous result, is therefore used to handle sequences of different loop transformations. Experiments show that the initial approach takes a few seconds, which is two orders of magnitude faster than state-of-the-art solutions but still too costly to be performed interactively many times. The incremental approach typically takes just a few milliseconds, which is another two orders of magnitude faster than the initial approach. This huge speedup allows us for the first time to handle real-life industrial-size applications and get realistic feedback during loop transformation exploration.
signal processing systems | 2008
Florin Balasa; Per Gunnar Kjeldsberg; Arnout Vandecappelle; Martin Palkovic; Qubo Hu; Hongwei Zhu; Francky Catthoor
The storage requirements in data-dominated signal processing systems, whose behavior is described by array-based, loop-organized algorithmic specifications, have an important impact on the overall energy consumption, data access latency, and chip area. This paper gives a tutorial overview on the existing techniques for the evaluation of the data memory size, which is an important step during the early stage of system-level exploration. The paper focuses on the most advanced developments in the field, presenting in more detail (1) an estimation approach for non-procedural specifications, where the reordering of the loop execution within loop nests can yield significant memory savings, and (2) an exact computation approach for procedural specifications, with relevant memory management applications – like, measuring the impact of loop transformations on the data storage, or analyzing the performance of different signal-to-memory mapping models. Moreover, the paper discusses typical memory management trade-offs – like, for instance, between storage requirement and number of memory accesses – taken into account during the exploration of the design space by loop transformations in the system specification.