Ralf Jahr
University of Augsburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ralf Jahr.
digital systems design | 2013
Theo Ungerer; Christian Bradatsch; Mike Gerdes; Florian Kluge; Ralf Jahr; Jörg Mische; Joao Fernandes; Pavel G. Zaykov; Zlatko Petrov; Bert Böddeker; Sebastian Kehr; Hans Regler; Andreas Hugl; Christine Rochange; Haluk Ozaktas; Hugues Cassé; Armelle Bonenfant; Pascal Sainrat; Ian Broster; Nick Lay; David George; Eduardo Quiñones; Miloš Panić; Jaume Abella; Francisco J. Cazorla; Sascha Uhrig; Mathias Rohde; Arthur Pyka
Engineers who design hard real-time embedded systems express a need for several times the performance available today while keeping safety as major criterion. A breakthrough in performance is expected by parallelizing hard real-time applications and running them on an embedded multi-core processor, which enables combining the requirements for high-performance with timing-predictable execution. parMERASA will provide a timing analyzable system of parallel hard real-time applications running on a scalable multicore processor. parMERASA goes one step beyond mixed criticality demands: It targets future complex control algorithms by parallelizing hard real-time programs to run on predictable multi-/many-core processors. We aim to achieve a breakthrough in techniques for parallelization of industrial hard real-time programs, provide hard real-time support in system software, WCET analysis and verification tools for multi-cores, and techniques for predictable multi-core designs with up to 64 cores.
MMB'12/DFT'12 Proceedings of the 16th international GI/ITG conference on Measurement, Modelling, and Evaluation of Computing Systems and Dependability and Fault Tolerance | 2012
Ralf Jahr; Horia Calborean; Lucian N. Vintan; Theo Ungerer
During development, processor architectures can be tuned and configured by many different parameters. For benchmarking, automatic design space explorations (DSEs) with heuristic algorithms are a helpful approach to find the best settings for these parameters according to multiple objectives, e.g. performance, energy consumption, or real-time constraints. But if the setup is slightly changed and a new DSE has to be performed, it will start from scratch, resulting in very long evaluation times. To reduce the evaluation times we extend the NSGA-II algorithm in this article, such that automatic DSEs can be supported with a set of transformation rules defined in a highly readable format, the fuzzy control language (FCL). Rules can be specified by an engineer, thereby representing existing knowledge. Beyond this, a decision tree classifying high-quality configurations can be constructed automatically and translated into transformation rules. These can also be seen as very valuable result of a DSE because they allow drawing conclusions on the influence of parameters and describe regions of the design space with high density of good configurations. Our evaluations show that automatically generated decision trees can classify near optimal configurations for the hardware parameters of the Grid ALU Processor (GAP) and M-Sim 2. Further evaluations show that automatically constructed transformation rules can reduce the number of evaluations required to reach the same quality of results as without rules by 43%, leading to a significant saving of time of about 25%. In the demonstrated example using rules also leads to better results.
international conference on high performance computing and simulation | 2011
Ralf Jahr; Theo Ungerer; Horia Calborean; Lucian N. Vintan
Recent computer architectures can be configured in lots of different ways. To explore this huge design space, system simulators are typically used. As performance is no longer the only decisive factor but also e.g. power usage or the resource usage of the system it became very hard for designers to select optimal configurations. In this article we use a multi-objective design space exploration tool called FADSE to explore the vast design space of the Grid Alu Processor (GAP) and its post-link optimizer called GAPtimize. We improved FADSE with techniques to make it more robust against failures and to speed up evaluations through parallel processing. For the GAP, we present an approximation of the hardware complexity as second objective besides execution time. Inlining of functions applied as a whole program optimization with GAPtimize is used as example for a code optimization. We show that FADSE is able to thoroughly explore the design space for both GAP and GAPtimize and it can find an approximation of the Pareto frontier consisting of near-optimal individuals in moderate time.
programming models and applications for multicores and manycores | 2013
Ralf Jahr; Mike Gerdes; Theo Ungerer
In the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced here, eases the transition from sequential to parallel software. It is a novel model-based approach with clear methodology and the use of parallel design patterns as known building blocks. First the Activity and Pattern Diagram is created revealing the maximum degree of parallelism expressed by parallel design patterns. Second the degree of parallelism is reduced to the optimal level providing best performance by agglomeration of activities and patterns. By this, trade-offs are respected that are caused by the target platform, e.g. the computation-communication-ratio. As implementation for the parallel design patterns a library with algorithmic skeletons can be used. This leverages development effort and simplifies the transition from sequential to parallel code effectively.
Concurrency and Computation: Practice and Experience | 2015
Ralf Jahr; Horia Calborean; Lucian N. Vintan; Theo Ungerer
In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi‐objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near‐optimal configurations in the vast design space of a processor architecture together with a tool for code optimizations and hence evaluate both automatically. As example, we use the Grid ALU Processor (GAP) and its postlink optimizer called GAPtimize, which can apply feedback‐directed and platform‐specific code optimizations. Our results show that FADSE is able to cope with both design spaces. Less than 25% of the maximal reasonable hardware effort for the scalable elements of the GAP is enough to achieve the processors performance maximum. With a performance reduction tolerance of 10%, the necessary hardware complexity can be further reduced by about two‐thirds. The found high‐quality configurations are analyzed, exhibiting strong relationships between the parameters of the GAP, the distribution of complexity, and the total performance. These performance numbers can be improved by applying code optimizations concurrently to optimizing the hardware parameters. FADSE can find near‐optimal configurations by effectively combining and selecting parameters for hardware and code optimizations in a short time. The maximum observed speedup is 15%. With the use of code optimizations, the maximum possible reduction of the hardware resources, while sustaining the same performance level, is 50%.Copyright
digital systems design | 2010
Basher Shehan; Ralf Jahr; Sascha Uhrig; Theo Ungerer
Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable processor designed for the acceleration of a conventional sequential instruction stream without the need of recompilation. It comprises a super scalar processor front-end, a configuration unit, and an array of reconfigurable functional units (FUs), which is fully integrated into the pipeline. The configuration unit maps data dependent and independent instructions simultaneously at runtime into the array of FUs. This paper evaluates the GAP architecture and optimizes the architecture, the number of FUs, and the configuration layers implemented in the array. The simulations show a significant speed-up for sequential applications on GAP in comparison to an out-of-order super scalar simulator (Simple Scalar). The GAP simulator outperforms Simple Scalar on average by about 50% on the basic architecture and about 100% with an extended version including configuration layers.
Archive | 2013
Horia Calborean; Ralf Jahr; Theo Ungerer; Lucian N. Vintan
In today’s computer architectures the design spaces are huge, thus making it very difficult to find optimal configurations. One way to cope with this problem is to use Automatic Design Space Exploration (ADSE) techniques. We developed the Framework for Automatic Design Space Exploration (FADSE) which is focused on microarchitectural optimizations. This framework includes several state-of-the art heuristic algorithms.
ubiquitous intelligence and computing | 2013
Rolf Kiefhaber; Ralf Jahr; Nizar Msadek; Theo Ungerer
Trust is an important aspect in human societies. It enables cooperation and provides means to estimate potential cooperation partners. Several works have addressed how the concept of trust can be transferred to computer systems. In this paper, we present an approach to calculate trust, including direct trust, confidence, and reputation, in a network consisting of agents with changing behavior. Our metrics are highly configurable for an adaption to a wide variety of systems and situations, especially Organic Computing Systems can benefit from trust by integrating it in their algorithms implementing self-organizational behavior. We evaluate the effect of direct trust and confidence together with reputation (DTCR) in comparison with using only direct trust (DT) or direct trust with confidence (DTC). Because these metrics can be configured with many parameters leading to an immense number of possible configurations we apply a heuristic optimization algorithm to find very good setups showing the highest benefits. For this evaluation, an abstract scenario is developed and applied, it consists of unreliable components from different classes of defined mean behavior. This general scenario could model many possible industrial settings out of which a few are introduced, too. Our evaluations show that reputation and direct trust are best used together with a fluent transition between them defined by the confidence. In all cases, reputation works as a corrective when direct trust information is not optimal and potentially misleading. This leads to very good results with very limited variance, particularly we show that a small number of interactions are sufficient to obtain the best results.
ACM Transactions in Embedded Computing Systems | 2016
Theo Ungerer; Christian Bradatsch; Martin Frieb; Florian Kluge; Jörg Mische; Alexander Stegmeier; Ralf Jahr; Mike Gerdes; Pavel G. Zaykov; Lucie Matusova; Zai Jian Jia Li; Zlatko Petrov; Bert Böddeker; Sebastian Kehr; Hans Regler; Andreas Hugl; Christine Rochange; Haluk Ozaktas; Hugues Cassé; Armelle Bonenfant; Pascal Sainrat; Nick Lay; David George; Ian Broster; Eduardo Quiñones; Miloš Panić; Jaume Abella; Carles Hernandez; Francisco J. Cazorla; Sascha Uhrig
The EC project parMERASA (Multicore Execution of Parallelized Hard Real-Time Applications Supporting Analyzability) investigated timing-analyzable parallel hard real-time applications running on a predictable multicore processor. A pattern-supported parallelization approach was developed to ease sequential to parallel program transformation based on parallel design patterns that are timing analyzable. The parallelization approach was applied to parallelize the following industrial hard real-time programs: 3D path planning and stereo navigation algorithms (Honeywell International s.r.o.), control algorithm for a dynamic compaction machine (BAUER Maschinen GmbH), and a diesel engine management system (DENSO AUTOMOTIVE Deutschland GmbH). This article focuses on the parallelization approach, experiences during parallelization with the applications, and quantitative results reached by simulation, by static WCET analysis with the OTAWA tool, and by measurement-based WCET analysis with the RapiTime tool.
embedded and real-time computing systems and applications | 2014
Ralf Jahr; Mike Gerdes; Theo Ungerer; Haluk Ozaktas; Christine Rochange; Pavel G. Zaykov
Parallel multi-threaded applications are needed to gain advantage from multi- and many-core processors. Such processors are more frequently considered for embedded hard real-time with defined timing guarantees, too. The static timing analysis, which is one way to calculate the worst-case execution time (WCET) of parallel applications, is complex and time-consuming due to the difficulty to analyze the interferences of threads and the high annotation effort to resolve it.