Peter Bertels | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Bertels is active.

Explore More

Publication

Featured researches published by Peter Bertels.

ACM Sigbed Review | 2009

Teaching skills and concepts for embedded systems design

Peter Bertels; Michiel D'Haene; Tom Degryse; Dirk Stroobandt

Smart devices are omnipresent today and the design of these embedded systems requires a multidisciplinary approach. It is important that students in electrical engineering and computer science learn these different aspects of embedded systems design. Our course on Complex Systems Design Methodology presents an overview of embedded systems design with a strong focus on the main concepts, preparing the students for more detailed follow-up courses on specific topics. Imparting the theoretical concepts to the students is not sufficient, however. Hands-on sessions are indispensable for the students to acquire the necessary skills. In this article we present our approach for these hands-on sessions, which is to pose relatively small problems in separate sessions, each focusing on a single design aspect. Five years after the introduction of this new course at Ghent University, we can conclude that students not only like this course, but that their design skills have also improved by our new, aspect-focused, approach.

field-programmable logic and applications | 2007

A Method for Fast Hardware Specialization at Run-Time

Karel Bruneel; Peter Bertels; Dirk Stroobandt

Dynamic hardware generation is a powerful technique that can substantially reduce both the required hardware resources and the time needed to perform a calculation, reflected in an improved functional density. This performance improvement is a result of additional run-time optimizations enabled by the knowledge of values at certain inputs at runtime. However, due to the large overhead conventional hardware generation tools incur, the usability of dynamic hardware generation is limited. We present a dual approach that combines compile-time generation of generic hardware and run-time specialization. This drastically decreases the dynamic generation overhead. Our approach is used for dynamic generation of FIR filters and compared to a static and a conventional dynamic implementation. The experiments clearly show that the dual approach improves the usability of dynamic hardware generation.

complex, intelligent and software intensive systems | 2008

Java and the Power of Multi-Core Processing

Peter Bertels; Dirk Stroobandt

The new era of multi-core processing challenges software designers to efficiently exploit the parallelism that is now massively available. Programmers have to exchange the conventional sequential programming paradigm for parallel programming: single-threaded designs must be decomposed into dependent, interacting tasks. The Java programming language has built-in thread support and is therefore suitable for the development of parallel software, but programming multi-threaded applications is a tedious task. Therefore we are working on a framework and tool support to alleviate the burden of threads, synchronisation and locking, based on process networks. This paper describes our initial ideas for this new programming model.

Design Automation for Embedded Systems | 2009

Using method interception for hardware/software co-development

Philippe Faes; Peter Bertels; Jan Van Campenhout; Dirk Stroobandt

In many embedded systems, the computational power of an instruction set processor is combined with hardware accelerators. Building such combined systems implies co-design of the software that runs on the processor and the hardware that accelerates the embedded application. During the co-design process, the application is partitioned into a software part (running on the processor) and a hardware part (running on the accelerator). In order to ease the iterative process of partitioning, we introduce a novel design methodology. In our methodology, the interface between hardware and software is transparent to the software designer, and is based on dynamic method interception. Because the interface is transparent and generated automatically, the initial all-software prototype of the system can more easily be refined and partitioned. We show that method interception is inexpensive, and we demonstrate method interception in a real-life application.Using our methodology, embedded systems can be designed fast, reducing time-to-market, while still achieving a high run-time performance.

ACM Transactions on Design Automation of Electronic Systems | 2009

Efficient memory management for hardware accelerated Java Virtual Machines

Peter Bertels; Wim Heirman; Erik H. D'Hollander; Dirk Stroobandt

Application-specific hardware accelerators can significantly improve a systems performance. In a Java-based system, we then have to consider a hybrid architecture that consists of a Java Virtual Machine running on a general-purpose processor connected to the hardware accelerator. In such a hybrid architecture, data communication between the accelerator and the general-purpose processor can incur a significant cost, which may even annihilate the original performance improvement of adding the accelerator. A careful layout of the data in the memory structure is therefore of major importance to maintain the acceleration performance benefits. This article addresses the reduction of the communication cost in a distributed shared memory consisting of the main memory of the processor and the accelerators local memory, which are unified in the Java heap. Since memory access times are highly nonuniform, a suitable allocation of objects in either main memory or the accelerators local memory can significantly reduce the communication cost. We propose several techniques for finding the optimal location for each Java objects data, either statically through profiling or dynamically at runtime. We show how we can reduce communication cost by up to 86% for the SPECjvm and DaCapo benchmarks. We also show that the best strategy is application dependent and also depends on the relative cost of remote versus local accesses. For a relative cost higher than 10, a self-learning dynamic approach often results in the best performance.

Proceedings of the 1st international forum on Next-generation multicore/manycore technologies | 2008

Efficient measurement of data flow enabling communication-aware parallelisation

Peter Bertels; Wim Heirman; Dirk Stroobandt

As multicore chips scale to higher processor counts, communication between cores becomes more and more important. Indeed, when a single application is split up among multiple cores, which are connected through a relatively slow network, the amount of communication that is required will have an essential effect on performance. Therefore, if the application can be partitioned in such a way that communication between threads is minimised, or that placement on non-uniform networks can be performed with regards to communication, a significant performance boost can be obtained. But to do this effectively, communication streams inside the application must be known. In this paper, we introduce a profiling tool for Java that can measure data flows between methods. It constructs a communication graph, which combines a traditional call graph with data flow information. The overhead of profiling is brought down by a factor of 15 through the use of reservoir sampling. We prove that this can be done with a limited decrease in accuracy. This way, we can quickly estimate communication flows, which forms the critical information that allows an efficient communication-aware parallelisation to be made.

computing frontiers | 2009

Strategies for dynamic memory allocation in hybrid architectures

Peter Bertels; Wim Heirman; Dirk Stroobandt

Hybrid architectures combining the strengths of general-purpose processors with application-specific hardware accelerators can lead to a significant performance improvement. Our hybrid architecture uses a Java Virtual Machine as an abstraction layer to hide the complexity of the hardware/software interface between processor and accelerator from the programmer. The data communication between the accelerator and the processor often incurs a significant cost, which sometimes annihilates the original speedup obtained by the accelerator. This article shows how we minimise this communication cost by dynamically chosing an optimal data layout in the Java heap memory which is distributed over both the accelerator and the processor memory. The proposed self-learning memory allocation strategy finds the optimal location for each Java objects data by means of runtime profiling. The communication cost is effectively reduced by up to 86% for the benchmarks in the DaCapo suite (51% on average).

robotics education | 2011