Erik Brockmeyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik Brockmeyer is active.

Explore More

Publication

Featured researches published by Erik Brockmeyer.

ACM Transactions on Design Automation of Electronic Systems | 2001

Data and memory optimization techniques for embedded systems

Preeti Ranjan Panda; Francky Catthoor; Nikil D. Dutt; Koen Danckaert; Erik Brockmeyer; Chidamber Kulkarni; A Vandercappelle; Per Gunnar Kjeldsberg

We present a survey of the state-of-the-art techniques used in performing data and memory-related optimizations in embedded systems. The optimizations are targeted directly or indirectly at the memory subsystem, and impact one or more out of three important cost metrics: area, performance, and power dissipation of the resulting implementation. We first examine architecture-independent optimizations in the form of code transoformations. We next cover a broad spectrum of optimization techniques that address memory architectures at varying levels of granularity, ranging from register files to on-chip memory, data caches, and dynamic memory (DRAM). We end with memory addressing related issues.

design, automation, and test in europe | 2003

Layer assignment techniques for low energy in multi-layered memory organisations

Erik Brockmeyer; Miguel Miranda; Henk Corporaal; Francky Catthoor

Nearly all platforms use a multi-layer memory hierarchy to bridge the enormous latency gap between the large off-chip memories and local register files. However, most of the previous work on HW or SW controlled techniques for layer assignment have been mainly focussed on performance. As a result, the intermediate layers have been assigned too large sizes, leading to energy inefficiency. In this paper, we present a technique that takes advantage of both the temporal locality and limited lifetime of the arrays of the application for minimum energy consumption under layer size constraints. A prototype tool has been developed and tested using two real-life applications of industrial relevance. Following this approach, we have been able to halve the energy consumed by the memory hierarchy for each of our drivers.

IEEE Design & Test of Computers | 2001

Data memory organization and optimizations in application-specific systems

P. Ranjan Panda; Nikil D. Dutt; Alexandru Nicolau; Francky Catthoor; Arnout Vandecappelle; Erik Brockmeyer; Chidamber Kulkarni; E. De Greef

In application-specific designs, customized memory organization expands the search space for cost-optimized solutions. Several optimization strategies can be applied to embedded systems with several different memory architectures: data cache, scratch-pad memory, custom memory architectures, and dynamic random-access memory (DRAM).

ACM Transactions on Design Automation of Electronic Systems | 2007

DRDU: A data reuse analysis technique for efficient scratch-pad memory management

Ilya Issenin; Erik Brockmeyer; Miguel Miranda; Nikil D. Dutt

In multimedia and other streaming applications, a significant portion of energy is spent on data transfers. Exploiting data reuse opportunities in the application, we can reduce this energy by making copies of frequently used data in a small local memory and replacing speed- and power-inefficient transfers from main off-chip memory by more efficient local data transfers. In this article we present an automated approach for analyzing these opportunities in a program that allows modification of the program to use custom scratch-pad memory configurations comprising a hierarchical set of buffers for local storage of frequently reused data. Using our approach we are able to both reduce energy consumption of the memory subsystem when using a scratch-pad memory by about a factor of two, on average, and improve memory system performance compared to a cache of the same size.

design automation conference | 2006

Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies

Ilya Issenin; Erik Brockmeyer; Bart Durinck; Nikil D. Dutt

The increasing use of multiprocessor systems-on-chip (MPSoCs) for high performance demands of embedded applications results in high power dissipation. The memory subsystem is a large and critical contributor to both energy and performance, requiring system designers to perform exploration of low power memory organizations. In this paper we present a novel multiprocessor data reuse analysis technique that allows the system designer to explore a wide range of customized memory hierarchy organizations with different size and energy profiles. Our technique enables the system designer to explore feasible memory subsystem solutions that meet power and area constraints while maintaining the necessary performance level. Our experiments on the complex QSDPCM benchmark illustrate the exploration of a wide range of customized memory hierarchies for an MPSoC implementation

design automation conference | 1999

Global multimedia system design exploration using accurate memory organization feedback

Arnout Vandecappelle; Miguel Miranda; Erik Brockmeyer; Francky Catthoor; Diederik Verkest

Successful exploration of system-level design decisions is impossible without fast and accurate estimation of the impact on the system cost. In most multimedia applications, the dominant cost factor is related to the organization of the memory architecture. This paper presents a systematic approach which allows-effective system-level exploration of memory organization design alternatives, based on accurate feedback by using our earlier developed tools. The effectiveness of this approach is illustrated on an industrial application. Applying our approach, a substantial, part of the design search space has been explored in a very short time, resulting in a cost-efficient solution which meets all design constraints.

international conference on embedded computer systems: architectures, modeling, and simulation | 2006

Pareto-Based Application Specification for MP-SoC Customized Run-Time Management

Chantal Ykman-Couvreur; Vincent Nollet; Théodore Marescaux; Erik Brockmeyer; Francky Catthoor; Henk Corporaal

In an MP-SoC environment, a customized run-time management should be incorporated on top of the basic OS services to globally optimize costs (e.g. energy consumption) across all active applications, according to constraints (e.g. performance, user requirements) and available platform resources. To that end, we have proposed a Pareto-based approach combining a design-time application mapping and platform exploration with a low-complexity run-time manager. This allows to alleviate the OS in its run-time decision making and to avoid conservative worst-case assumptions. In this paper, we focus on the characterization of the Pareto-based application specification, resulting from our design-time exploration. This specification is essential as input for our run-time manager. A representative video codec multimedia application, simulated on our MP-SoC platform simulator, is used as case study. For the resulting Pareto-based specification, both binary size and performance overhead is negligible

IEEE Transactions on Very Large Scale Integration Systems | 2006

A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

Minas Dasygenis; Erik Brockmeyer; Bart Durinck; Francky Catthoor; Dimitrios Soudris; A. Thanailakis

Memory latency has always been a major issue in embedded systems that execute memory-intensive applications. This is even more true as the gap between processor and memory speed continues to grow. Hardware and software prefetching have been shown to be effective in tolerating the large memory latencies inherit in large off-chip memories; however, both types of prefetching have their shortcomings. Hardware schemes are more complex and require extra circuitry to compute data access strides, while software schemes generate prefetch instructions, which if not computed carefully may hamper performance. On the other hand, some applications domains (such as multimedia) have a uniform and known a priori memory access pattern, that if exploited, could yield significant application performance improvement. With this characteristic in mind, we present our findings on hiding memory latency using the direct memory access (DMA) mode, which is present in all modern systems, combined with a software prefetch mechanism, and a customized on-chip memory hierarchy mapping. Compared to previous approaches, we are able to estimate the performance and power metrics, without actually implementing the embedded system. Experimental results on nine well known multimedia and imaging applications prove the efficiency of our technique. Finally, we verify the performance estimations by implementing and simulating the algorithms on the TI C6201 processor.

international symposium on low power electronics and design | 2000

Systematic cycle budget versus system power trade-off: a new perspective on system exploration of real-time data-dominated applications

Erik Brockmeyer; Arnout Vandecappelle; Francky Catthoor

In contrast to current design practice for (programmable) processor mapping, which mainly targets performance, we focus on a systematic trade-off between cycle budget and energy consumed in the background memory organization. The latter is a crucial component in many of todays designs, including multimedia, network protocols and telecom signal processing. We have a systematic way and tool to explore both freedoms and to arrive at Pareto charts, in which for a given application the lowest cost implementation of the memory organization is plotted against the available cycle budget per submodule. This by making optimal usage of a parallelized memory architecture. We indicate, with results on a digital audio broadcasting receiver and an image compression demonstrator, how to effectively use the Pareto plot to gain significantly in overall system energy consumption within the global real-time constraints.

asia and south pacific design automation conference | 2006

Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications

Qubo Hu; Arnout Vandecappelle; Martin Palkovic; Per Gunnar Kjeldsberg; Erik Brockmeyer; Francky Catthoor

Loop fusion and loop shifting are important transformations for improving data locality to reduce the number of costly accesses to off-chip memories. Since exploring the exact platform mapping for all the loop transformation alternatives is a time consuming process, heuristics steered by improved data locality are generally used. However, pure locality estimates do not sufficiently take into account the hierarchy of the memory platform. This paper presents a fast, incremental technique for hierarchical memory size requirement estimation for loop fusion and loop shifting at the early loop transformations design stage. As the exact memory platform is often not yet defined at this stage, we propose a platform-independent approach which reports the Pareto-optimal trade-off points for scratch-pad memory size and off-chip memory accesses. The estimation comes very close to the actual platform mapping. Experiments on realistic test-vehicles confirm that. It helps the designer or a tool to find the interesting loop transformations that should then be investigated in more depth afterward

Explore More