Jörg Mische
University of Augsburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jörg Mische.
international symposium on microarchitecture | 2010
Theo Ungerer; Francisco J. Cazorla; Pascal Sainrat; Guillem Bernat; Zlatko Petrov; Christine Rochange; Eduardo Quiñones; Mike Gerdes; Marco Paolieri; Julian Wolf; Hugues Cassé; Sascha Uhrig; Irakli Guliashvili; Michael Houston; Florian Kluge; Stefan Metzlaff; Jörg Mische
The Merasa project aims to achieve a breakthrough in hardware design, hard real-time support in system software, and worst-case execution time analysis tools for embedded multicore processors. The project focuses on developing multicore processor designs for hard real-time embedded systems and techniques to guarantee the analyzability and timing predictability of every feature provided by the processor.
digital systems design | 2013
Theo Ungerer; Christian Bradatsch; Mike Gerdes; Florian Kluge; Ralf Jahr; Jörg Mische; Joao Fernandes; Pavel G. Zaykov; Zlatko Petrov; Bert Böddeker; Sebastian Kehr; Hans Regler; Andreas Hugl; Christine Rochange; Haluk Ozaktas; Hugues Cassé; Armelle Bonenfant; Pascal Sainrat; Ian Broster; Nick Lay; David George; Eduardo Quiñones; Miloš Panić; Jaume Abella; Francisco J. Cazorla; Sascha Uhrig; Mathias Rohde; Arthur Pyka
Engineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores.
automation, robotics and control systems | 2010
Jörg Mische; Irakli Guliashvili; Sascha Uhrig; Theo Ungerer
This paper describes how a superscalar in-order processor must be modified to support Simultaneous Multithreading (SMT) such that time-predictability is preserved for hard real-time applications. For superscalar in-order architectures the calculation of the Worst Case Execution Time (WCET) is much easier and tighter than for out-of-order architectures. By a careful enhancement that completely isolates the threads, this capability can be perpetuated to an in-order SMT architecture. Our design goal is to minimise the WCET of the highest priority thread, while releasing as many resources as possible for the execution of concurrent non critical threads. The resultant processor executes hard real-time threads at the same speed as its singlethreaded ancestor, but idle issue slots are dynamically used by non critical threads. The modifications to enable SMT are demonstrated by CarCore, a multithreaded embedded processor that implements the Infineon Tricore instruction set.
memory performance dealing with applications systems and architecture | 2008
Stefan Metzlaff; Sascha Uhrig; Jörg Mische; Theo Ungerer
For precise timing analysis of hard-real applications a predictable memory system is of particular importance. Caches have a great impact on performance, but at the cost of reduced timing predictability. Conventional scratchpads, i.e. statically managed on-chip memories, provide predictable memory accesses, but they are usually badly utilized. Better memory utilization is allowed by dynamically managed scratchpads that are designed for predictability. In this paper we propose a function scratchpad that is dynamically managed in hardware and provides a predictable timing behavior. The function scratchpad exploits a simultaneous multithreaded architecture to increase the pipeline and memory bandwidth utilization while preserving predictability.
international conference on computer design | 2008
Jörg Mische; Sascha Uhrig; Florian Kluge; Theo Ungerer
We developed an SMT processor that allows a static WCET analysis of several hard real-time threads and uses the remaining resources for soft or non real-time threads. The analysis is possible, because one Dominant Meta Thread (DMT) is executed as if it were the unique thread on the processor and thus single-threaded WCET techniques can be applied. To provide more than one hard real-time thread the execution time of the Dominant Meta Thread is distributed by time sharing whereby the length of the time slices and periods can be adjusted at runtime. Our technique, called Dominant Time Sharing (DTS), can be used to minimize the number of control units in embedded hard real-time systems and hence reduces the overall energy consumption and material demand. In contrast to many other studies we are able to handle multicycle memory latencies while preserving analyzability. The proposed technique can easily be extended to access other external resources like coprocessors or reconfigurable arrays.
software technologies for embedded and ubiquitous systems | 2008
Florian Kluge; Sascha Uhrig; Jörg Mische; Theo Ungerer
The concepts of Autonomic and Organic Computing (AC/OC) promise to make modern computer systems more secure and easier to manage. In this paper, we extend the observer/controller architecture typically used in AC/OC systems towards a new target area --- embedded real-time systems. As a result we present a two-layered management architecture. We discuss aspects of internal communication and design a communication model. Finally, we present a generic classification system for the upper layer of the management architecture.
software and compilers for embedded systems | 2009
Florian Kluge; Chenglong Yu; Jörg Mische; Sascha Uhrig; Theo Ungerer
The AUTOSAR specification provides a common standard for software development in the automotive domain. Its functional definition is based on the concept of single-threaded processors. Recent trends in embedded processors provide new possibilities for more powerful processors using parallel execution techniques like multithreading and multi-cores. We discuss the implementation of the AUTOSAR operating system interface on a modern simultaneous multithreaded (SMT) processor. Several problems in resource management arise when AUTOSAR tasks are executed concurrently on a multithreaded processor. Especially deadlocks, which should be averted through the priority ceiling protocol, can reoccur. We solve this problems by extending AUTOSAR OS by the Task Filtering Method to avoid deadlocks in multithreaded processors. Other synchronisation problems arising through the parallel execution of tasks are solved through the use of lock-free data structures. In the end, we propose some extensions to the AUTOSAR specification so it can be used in software development for SMT processors. We develop some additional requirements on such SMT processors to enable the use of the Task Filtering Method. Our work gives also perspectives for software development on upcoming multi-core processors in the automotive domain.
autonomic and trusted computing | 2008
Florian Kluge; Jörg Mische; Sascha Uhrig; Theo Ungerer
To overcome the rising complexity of computing systems, the paradigms of Autonomic Computing and Organic Computing have been introduced. By using an observer/controller architecture, Organic Computing aims to make embedded systems more life-like by providing them with so-called Self-X properties. Embedded real-time systems can also gain great benefit from these techniques. In this paper, we show what new requirements arise when introducing Autonomic/Organic Computing into the area of real-time applications. These requirements flow into the architecture of the real-time operating system CAROS. CAROS combines several concepts to provide a solid base for the implementation of Self-X techniques in embedded real-time systems. We show the practicability of our concepts with a prototypical implementation on the multithreaded CarCore microcontroller.
real-time networks and systems | 2014
Jörg Mische; Theo Ungerer
Hard real-time systems based on many cores connected by a Network-on-Chip (NoC) need Guaranteed Service (GS) for bounded communication latencies and bandwidths. Typically, GS is implemented by a Custom Schedule, a static periodic communication schedule that minimises network conflicts. It offers minimal latencies and maximum utilisation of the network, but requires the definition of all node-to-node connections at design time of the software. Therefore it is specific to a certain traffic pattern and placement of tasks to nodes. If the unused connections are not known or the schedule shall be independent of the task placement, all connections must be considered as equally possible, resulting in an All-To-All Schedule. The flexibility in communication and placement of the latter comes at the cost of rather long network latencies. This paper presents two alternatives that lie between these two extremes: In a One-To-One Schedule the latencies are longer than in a Custom Schedule, but the task mapping has no influence and a real-time system can be composed by independent multi-node software components whose timings were analysed individually. The One-To-All Schedule is an alternative to the All-To-All Schedule. It provides shorter latencies under most circumstances, especially from a timing analysis perspective. Furthermore, the paper describes how all four schedules can be implemented efficiently using decoupled semi-bufferless x-y-routing in a unidirectional torus.
network on chip architectures | 2012
Jörg Mische; Theo Ungerer
State-of-the-art Network on Chips (NoCs) provide a high throughput and low latency by sending packets of data through a mesh topology, using virtual channels and wormhole flow control. The downside of this technology is a high area and energy consumption due to many buffers, large crossbars and a complex arbitration logic within the routers. In our approach, we avoid flow control and complex analysis of the head flit by sending single standalone flits instead of large packets of flits. As the order of flits is preserved between sending and receiving node, large data blocks can be sent anyway. The complexity of the router is further reduced by using an unidirectional 2D torus instead of a mesh, which reduces the number of router ports from 5 to 3. The flits are X-Y-routed and transported bufferless, as long as they stay within one dimension. Consequently there is only one FIFO per router, which buffers flits when they turn from X to Y direction. In terms of throughput and latency the so-called paternoster router is comparable with a conventional router with two virtual channels, but it consumes 50% less energy and 60% less area.