Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Per Gunnar Kjeldsberg is active.

Publication


Featured researches published by Per Gunnar Kjeldsberg.


ACM Transactions on Design Automation of Electronic Systems | 2001

Data and memory optimization techniques for embedded systems

Preeti Ranjan Panda; Francky Catthoor; Nikil D. Dutt; Koen Danckaert; Erik Brockmeyer; Chidamber Kulkarni; A Vandercappelle; Per Gunnar Kjeldsberg

We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation. We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2003

Data dependency size estimation for use in memory optimization

Per Gunnar Kjeldsberg; Francky Catthoor; Einar J. Aas

A novel storage requirement estimation methodology is presented for use in the early system design phases when the data transfer ordering is only partly fixed. At that stage, none of the existing estimation tools are adequate, as they either assume a fully specified execution order or ignore it completely. This paper presents an algorithm for automated estimation of strict upper and lower bounds on the individual data dependency sizes in high-level application code given a partially fixed execution ordering. In the overall estimation technique, this is followed by a detection of the maximally combined size of simultaneously alive dependencies, resulting in the overall storage requirement of the application. Using representative application demonstrators, we show how our techniques can effectively guide the designer to achieve a transformed specification with low storage requirement.


asia and south pacific design automation conference | 2006

Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications

Qubo Hu; Arnout Vandecappelle; Martin Palkovic; Per Gunnar Kjeldsberg; Erik Brockmeyer; Francky Catthoor

Loop fusion and loop shifting are important transformations for improving data locality to reduce the number of costly accesses to off-chip memories. Since exploring the exact platform mapping for all the loop transformation alternatives is a time consuming process, heuristics steered by improved data locality are generally used. However, pure locality estimates do not sufficiently take into account the hierarchy of the memory platform. This paper presents a fast, incremental technique for hierarchical memory size requirement estimation for loop fusion and loop shifting at the early loop transformations design stage. As the exact memory platform is often not yet defined at this stage, we propose a platform-independent approach which reports the Pareto-optimal trade-off points for scratch-pad memory size and off-chip memory accesses. The estimation comes very close to the actual platform mapping. Experiments on realistic test-vehicles confirm that. It helps the designer or a tool to find the interesting loop transformations that should then be investigated in more depth afterward


ACM Transactions on Design Automation of Electronic Systems | 2007

Incremental hierarchical memory size estimation for steering of loop transformations

Qubo Hu; Per Gunnar Kjeldsberg; Arnout Vandecappelle; Martin Palkovic; Francky Catthoor

Modern embedded multimedia and telecommunications systems need to store and access huge amounts of data. This becomes a critical factor for the overall energy consumption, area, and performance of the systems. Loop transformations are essential to improve the data access locality and regularity in order to optimally design or utilize a memory hierarchy. However, due to abstract high-level cost functions, current loop transformation steering techniques do not take the memory platform sufficiently into account. They usually also result in only one final transformation solution. On the other hand, the loop transformation search space for real-life applications is huge, especially if the memory platform is still not fully fixed. Use of existing loop transformation techniques will therefore typically lead to suboptimal end-products. It is critical to find all interesting loop transformation instances. This can only be achieved by performing an evaluation of the effect of later design stages at the early loop transformation stage. This article presents a fast incremental hierarchical memory-size requirement estimation technique. It estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto a hierarchical memory platform. As the exact memory platform instantiation is often not yet defined at this high-level design stage, a platform-independent estimation is introduced with a Pareto curve output for each loop transformation instance. Comparison among the Pareto curves helps the designer, or a steering tool, to find all interesting loop transformation instances that might later lead to low-power data mapping for any of the many possible memory hierarchy instances. Initially, the source code is used as input for estimation. However, performing the estimation repeatedly from the source code is too slow for large search space exploration. An incremental approach, based on local updating of the previous result, is therefore used to handle sequences of different loop transformations. Experiments show that the initial approach takes a few seconds, which is two orders of magnitude faster than state-of-the-art solutions but still too costly to be performed interactively many times. The incremental approach typically takes just a few milliseconds, which is another two orders of magnitude faster than the initial approach. This huge speedup allows us for the first time to handle real-life industrial-size applications and get realistic feedback during loop transformation exploration.


Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518) | 2000

Storage requirement estimation for data intensive applications with partially fixed execution ordering

Per Gunnar Kjeldsberg; Francky Catthoor; Einar J. Aas

In this paper, we propose a novel storage requirement estimation methodology for use in the early system design phases when the data transfer ordering is only partly fixed. At that stage, none of the existing estimation tools are adequate, as they either assume a fully specified execution order or ignore it completely. Using a representative application demonstrator, we show how our technique can effectively guide the designer to achieve a transformed specification with low storage requirement.


signal processing systems | 2008

Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications

Florin Balasa; Per Gunnar Kjeldsberg; Arnout Vandecappelle; Martin Palkovic; Qubo Hu; Hongwei Zhu; Francky Catthoor

The storage requirements in data-dominated signal processing systems, whose behavior is described by array-based, loop-organized algorithmic specifications, have an important impact on the overall energy consumption, data access latency, and chip area. This paper gives a tutorial overview on the existing techniques for the evaluation of the data memory size, which is an important step during the early stage of system-level exploration. The paper focuses on the most advanced developments in the field, presenting in more detail (1) an estimation approach for non-procedural specifications, where the reordering of the loop execution within loop nests can yield significant memory savings, and (2) an exact computation approach for procedural specifications, with relevant memory management applications – like, measuring the impact of loop transformations on the data storage, or analyzing the performance of different signal-to-memory mapping models. Moreover, the paper discusses typical memory management trade-offs – like, for instance, between storage requirement and number of memory accesses – taken into account during the exploration of the design space by loop transformations in the system specification.


international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2009

Scenario Based Mapping of Dynamic Applications on MPSoC: A 3D Graphics Case Study

Narasinga Rao Miniskar; Elena Hammari; Satyakiran Munaga; Per Gunnar Kjeldsberg; Francky Catthoor

Modern multimedia applications are becoming increasingly dynamic. The state-of-the-art scalable 3D graphics algorithms are able to adapt at run-time their hardware resource allocation requests according to input, resource availability and a number of quality metrics. Additionally, the resource management mechanisms are becoming more dynamic themselves and are able to cope efficiently at run-time with these varying resource requests, available hardware resources and competing requests from other applications. In this paper, we study the dynamic resource requests of the Wavelet Subdivision Surfaces (WSS) based scalable 3D graphics application. We also show how to schedule its computational resources at run-time with the use of the Task Concurrency Management (TCM) methodology and the System Scenario based approach on MPSoC platform with very heterogeneous Processing Elements (including RISC, VLIW and FPGA accelerator resources).


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2007

Bit-Width Constrained Memory Hierarchy Optimization for Real-Time Video Systems

Benny Thörnberg; Martin Palkovic; Qubo Hu; Leif Olsson; Per Gunnar Kjeldsberg; Mattias O'Nils; Francky Catthoor

The great variety of pixel dynamics of real-time video-processing systems (RTVPS), ranging from color, grayscale, or binary pixels, means that a careful design and specification of bit widths is required. It is obvious that the bit-width specification will affect the total memory storage requirement. However, what is not so obvious is that the bit-width specification will also affect the design of the memory hierarchy, an impact similar for both hardware and software implementations. We have developed an integer-nonlinear-program formulation for the optimization of the memory hierarchy of RTVPS. An active surveillance video camera is introduced as a test case. We demonstrate how the optimization model can reduce the on-chip memory storage by 61% compared to a nonoptimal memory hierarchy


application-specific systems, architectures, and processors | 2006

Loop Transformation Methodologies for Array-Oriented Memory Management

Florin Balasa; Per Gunnar Kjeldsberg; Martin Palkovic; Arnout Vandecappelle; Francky Catthoor

The storage requirements in data-dominant signal processing systems, whose behavior is described by arraybased, loop-organized algorithmic specifications, have an important impact on the overall energy consumption, data access latency, and chip area. Applying different loop transformations on the specification code can significantly enhance the memory management of such VLSI systems, improving all the major parameters of the design space - power, area, and performance. This paper gives a global view on existing and recently proposed memory size evaluation approaches for procedural and non-procedural specifications. Moreover, it discusses typical memory management trade-offs taken into account during the exploration of system specifications by loop transformations, that can exploit these early size evaluations.


international conference on computer aided design | 2000

Automated data dependency size estimation with a partially fixed execution ordering

Per Gunnar Kjeldsberg; Francky Catthoor; Einar J. Aas

For data dominated applications, the system level design trajectory should first focus on finding a good data transfer and storage solution. Since no realization details are available at this level, estimates are needed to guide the designer. This paper presents an algorithm for automated estimation of strict upper and lower bounds on the individual data dependency sizes in high level application code given a partially fixed execution ordering. Previous work has either not taken execution ordering into account at all, resulting in large overestimates, or required a fully specified ordering which is usually not available at this high level. The usefulness of the methodology is illustrated on representative application demonstrators.

Collaboration


Dive into the Per Gunnar Kjeldsberg's collaboration.

Top Co-Authors

Avatar

Francky Catthoor

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Qubo Hu

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Martin Palkovic

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Einar J. Aas

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Arnout Vandecappelle

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar

Elena Hammari

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Iason Filippopoulos

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yahya H. Yassin

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Asghar Havashki

Norwegian University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge