Stefan Metzlaff | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Metzlaff is active.

Explore More

Publication

Featured researches published by Stefan Metzlaff.

international symposium on microarchitecture | 2010

Merasa: Multicore Execution of Hard Real-Time Applications Supporting Analyzability

Theo Ungerer; Francisco J. Cazorla; Pascal Sainrat; Guillem Bernat; Zlatko Petrov; Christine Rochange; Eduardo Quiñones; Mike Gerdes; Marco Paolieri; Julian Wolf; Hugues Cassé; Sascha Uhrig; Irakli Guliashvili; Michael Houston; Florian Kluge; Stefan Metzlaff; Jörg Mische

The Merasa project aims to achieve a breakthrough in hardware design, hard real-time support in system software, and worst-case execution time analysis tools for embedded multicore processors. The project focuses on developing multicore processor designs for hard real-time embedded systems and techniques to guarantee the analyzability and timing predictability of every feature provided by the processor.

international symposium on object/component/service-oriented real-time distributed computing | 2010

RTOS Support for Parallel Execution of Hard Real-Time Applications on the MERASA Multi-core Processor

Julian Wolf; Mike Gerdes; Florian Kluge; Sascha Uhrig; Jörg Mische; Stefan Metzlaff; Christine Rochange; Hugues Cassé; Pascal Sainrat; Theo Ungerer

Multi-cores are the contemporary solution to satisfy high performance and low energy demands in general and embedded computing domains. However, currently available multi-cores are not feasible to be used in safety-critical environments with hard real-time constraints. Hard real-time tasks running on different cores must be executed in isolation or their interferences must be time-bounded. Thus, new requirements also arise for a real-time operating system (RTOS), in particular if the parallel execution of hard real-time applications should be supported. In this paper we focus on the MERASA system software as an RTOS developed on top of the MERASA multi-core processor. The MERASA system software fulfils the requirements for time-bounded execution of parallel hard real-time tasks. In particular we focus on thread control with synchronisation mechanisms, memory management and resource management requirements. Our evaluations show that all system software functions are time-bounded by a worst-case execution time (WCET) analysis.

memory performance dealing with applications systems and architecture | 2008

Predictable dynamic instruction scratchpad for simultaneous multithreaded processors

Stefan Metzlaff; Sascha Uhrig; Jörg Mische; Theo Ungerer

For precise timing analysis of hard-real applications a predictable memory system is of particular importance. Caches have a great impact on performance, but at the cost of reduced timing predictability. Conventional scratchpads, i.e. statically managed on-chip memories, provide predictable memory accesses, but they are usually badly utilized. Better memory utilization is allowed by dynamically managed scratchpads that are designed for predictability. In this paper we propose a function scratchpad that is dynamically managed in hardware and provides a predictable timing behavior. The function scratchpad exploits a simultaneous multithreaded architecture to increase the pipeline and memory bandwidth utilization while preserving predictability.

automation, robotics and control systems | 2011

A dynamic instruction scratchpad memory for embedded processors managed by hardware

Stefan Metzlaff; Irakli Guliashvili; Sascha Uhrig; Theo Ungerer

This paper proposes a hardware managed instruction scratchpad on the granularity of functions which is designed for realtime systems. It guarantees that every instruction will be fetched from the local, fast and timing predictable scratchpad memory. Thus, a predictable behavior is reached that eases a precise timing analysis of the system. We estimate the hardware resources required to implement the dynamic instruction scratchpad for an FPGA. An evaluation quantifies the impact of our scratchpad on average case performance. It shows that the dynamic instruction scratchpad compared to standard instruction memories has a reasonable performance - while providing predictable behavior and easing timing analysis.

ACM Transactions in Embedded Computing Systems | 2013

A hard real-time capable multi-core SMT processor

Marco Paolieri; Joerg Mische; Stefan Metzlaff; Mike Gerdes; Eduardo Quiñones; Sascha Uhrig; Theo Ungerer; Francisco J. Cazorla

Hard real-time applications in safety critical domains require high performance and time analyzability. Multi-core processors are an answer to these demands, however task interferences make multi-cores more difficult to analyze from a worst-case execution time point of view than single-core processors. We propose a multi-core SMT processor that ensures a bounded maximum delay a task can suffer due to inter-task interferences. Multiple hard real-time tasks can be executed on different cores together with additional non real-time tasks. Our evaluation shows that the proposed MERASA multi-core provides predictability for hard real-time tasks and also high performance for non hard real-time tasks.

Journal of Systems Architecture | 2014

A comparison of instruction memories from the WCET perspective

Stefan Metzlaff; Theo Ungerer

Abstract Hard real-time systems demand high performance in combination with a timing predictable program execution. The performance of a system in the worst-case, represented by its worst case execution time (WCET), highly depends on the design of the memory subsystem. In this paper we focus on the instruction memory hierarchy and quantify the impact of different on-chip instruction memories on the worst-case timing of the system. A function-based dynamic instruction scratchpad (D-ISP), an instruction cache, and static instruction scratchpads using basic-block-based and function-based assignment algorithms are compared. Therefore, we provide WCET bounds for systems with different on-chip instruction memories and different off-chip memory timings. We show that for small memory sizes a static instruction scratchpad usually outperforms the other memories in terms of the WCET estimate. However, with increasing memory sizes the D-ISP is able to reach lower WCET bounds. An instruction cache can only provide lower WCET bounds than the other memories, if no suitable assignment for the static instruction scratchpads is found or if the D-ISP suffers from thrashing or frequently loads unused code.

defect and fault tolerance in vlsi and nanotechnology systems | 2014

Exploiting Intel TSX for fault-tolerant execution in safety-critical systems

Florian Haas; Sebastian Weis; Stefan Metzlaff; Theo Ungerer

Safety-critical systems demand increasing computational power, which requests high-performance embedded systems. While commercial-of-the-shelf (COTS) processors offer high computational performance for a low price, they do not provide hardware support for fault-tolerant execution. However, pure software-based fault-tolerance methods entail high design complexity and runtime overhead. In this paper, we present an efficient software/hardware-based redundant execution scheme for a COTS ×86 processor, which exploits the Transactional Synchronization Extensions (TSX) introduced with the Intel Haswell microarchitecture. Our approach extends a static binary instrumentation tool to insert fault-tolerant transactions and fault-detection instructions at function granularity. TSX hardware support is used for error containment and recovery. The average runtime overhead for selected SPEC2006 benchmarks was only 49% compared to a non-fault-tolerant execution.

ieee international conference on high performance computing data and analytics | 2012

Impact of Instruction Cache and Different Instruction Scratchpads on the WCET Estimate

Stefan Metzlaff; Theo Ungerer

Hard real-time systems demand high performance, but also tight WCET estimates. The tightness of the WCET estimates strictly depends on the WCET analysis of the memory system. In this paper we quantify the impact of different instruction memories on the WCET estimates. A function-based dynamic scratchpad, a cache, and static scratchpads are compared. Furthermore, we inspect the pessimism introduced by memory access interferences at the shared off-chip memory level. It is shown that the function-based dynamic instruction scratchpad provides lower WCET estimates, because it eliminates these interferences by design. Thus the function-based dynamic scratchpad eases the analysis while also provides tight WCET estimates.

euromicro conference on real-time systems | 2012

Replacement Policies for a Function-Based Instruction Memory: A Quantification of the Impact on Hardware Complexity and WCET Estimates

Stefan Metzlaff; Theo Ungerer

Instruction memories have a large influence on the timing behavior of hard real-time systems. Thus, to obtain safe and tight WCET estimates the instruction memory has to be predictable. Instruction memories in embedded real-time systems range from scratchpads with fixed content to dynamically managed fine-grained caches. In this paper we focus on a function-based dynamic instruction memory (D-ISP) and examine different replacement policies. We show their influence on the timing behavior of a hard real-time system and the complexity of a hardware implementation. A timing analysis unveils that a stack-based replacement policy reaches similar WCET estimates as LRU, especially for small scratchpad sizes. But in contrast to the stack-based replacement policy, LRU cannot be implemented with a reasonable amount of resources. Whereas, an experimental implementation of the proposed stack-based replacement policy needs only up to 23% more resources than a FIFO implementation.

real-time networks and systems | 2013

Leveraging transactional memory for a predictable execution of applications composed of hard real-time and best-effort tasks

Stefan Metzlaff; Sebastian Weis; Theo Ungerer

In this paper, we utilise transactional memory (TM) to limit interferences of concurrent hard real-time (HRT) and best-effort (BE) tasks in a shared memory multi-core. We first propose a way to calculate the worst-case execution time (WCET) bound of HRT transactions when the set of concurrent transactions is known. In the next step we enhance our TM contention manager to prioritise transactions depending on their real-time requirements. With our approach it is possible to bound the interferences of any BE transaction and thus ensure a predictable execution of concurrently running HRT transactions. Our evaluation shows that the impact of BE tasks on the WCET bound of HRT tasks is minimal, while allowing them to share data.

Explore More