Mirko Rahn
Fraunhofer Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mirko Rahn.
Facing the Multicore-Challenge | 2013
Thomas Alrutz; Jan Backhaus; Thomas Brandes; Vanessa End; Thomas Gerhold; Alfred Geiger; Daniel Grünewald; Vincent Heuveline; Jens Jägersküpper; Andreas Knüpfer; Olaf Krzikalla; Edmund Kügeler; Carsten Lojewski; Guy Lonsdale; Ralph Müller-Pfefferkorn; Wolfgang E. Nagel; Lena Oden; Franz-Josef Pfreundt; Mirko Rahn; Michael Sattler; Mareike Schmidtobreick; Annika Schiller; Christian Simmendinger; Thomas Soddemann; Godehard Sutmann; Henning Weber; Jan-Philipp Weiss
At the threshold to exascale computing, limitations of the MPI programming model become more and more pronounced. HPC programmers have to design codes that can run and scale on systems with hundreds of thousands of cores. Setting up accordingly many communication buffers, point-to-point communication links, and using bulk-synchronous communication phases is contradicting scalability in these dimensions. Moreover, the reliability of upcoming systems will worsen.
international conference on cluster computing | 2015
I. B. Ivanov; Jing Gong; Dana Akhmetova; Ivy Bo Peng; Stefano Markidis; Erwin Laure; Rui Machado; Mirko Rahn; Valeria Bartsch; Alistair Hart; Paul Fischer
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.
Archive | 2015
Christian Simmendinger; Mirko Rahn; Daniel Gruenewald
The Global Address Space Programming Interface (GASPI) is a Partitioned Global Address Space (PGAS) API specification. The GASPI API specification is focused on three key objectives: scalability, flexibility and fault tolerance. It offers a small, yet powerful API composed of synchronization primitives, synchronous and asynchronous collectives, fine-grained control over one-sided read and write communication primitives, global atomics, passive receives, communication groups and communication queues. GASPI has been designed for one-sided RDMA-driven communication in a PGAS environment. As such, GASPI aims to initiate a paradigm shift from bulk-synchronous two-sided communication patterns towards an asynchronous communication and execution model. In order to achieve its much improved scaling behaviour GASPI leverages request based asynchronous dataflow with remote completion. In GASPI request based remote completion indicates that the operation has completed at the target window. The target hence can (on a per request basis) establish whether a one sided operation is complete at the target. A correspondingly implemented fine-grain asynchronous dataflow model can achieve a largely improved scaling behaviour relative to MPI.
ieee international conference on high performance computing, data, and analytics | 2016
Stefano Markidis; Ivy Bo Peng; Jesper Larsson Träff; Antoine Rougier; Valeria Bartsch; Rui Machado; Mirko Rahn; Alistair Hart; Daniel J. Holmes; Mark Bull; Erwin Laure
EPiGRAM is a European Commission funded project to improve existing parallel programming models to run efficiently large scale applications on exascale supercomputers. The EPiGRAM project focuses on the two current dominant petascale programming models, message-passing and PGAS, and on the improvement of two of their associated programming systems, MPI and GASPI. In EPiGRAM, we work on two major aspects of programming systems. First, we improve the performance of communication operations by decreasing the memory consumption, improving collective operations and introducing emerging computing models. Second, we enhance the interoperability of message-passing and PGAS by integrating them in one PGAS-based MPI implementation, called EMPI4Re, implementing MPI endpoints and improving GASPI interoperability with MPI. The new EPiGRAM concepts are tested in two large-scale applications, iPIC3D, a Particle-in-Cell code for space physics simulations, and Nek5000, a Computational Fluid Dynamics code.
european conference on parallel processing | 2013
Tiberiu Rotaru; Mirko Rahn; Franz-Josef Pfreundt
The computing power of modern high performance systems cannot be fully exploited using traditional parallel programming models. On the other hand, the growing demand for processing big data volumes requires a better control of the workflows, an efficient storage management, as well as a fault-tolerant runtime system. Trying to offer our proper solution to these problems, we designed and developed GPI-Space, a complex but flexible software development and execution platform, in which the data coordination of an application is decoupled from the programming of the algorithms. This allows the domain user to focus on the implementation of its problem only, while the fault tolerant runtime framework automatically runs the application in parallel in complex environments. We discuss the advantages and the disadvantages of our approach by comparison with the most popular MapReduce implementation, Hadoop. The tests performed on a multicore cluster with the wordcount use case showed that GPI-Space is almost three times faster than Hadoop when strictly the execution times are considered, and more than six times faster when the data loading time is also considered.
international conference on parallel processing | 2017
Dana Akhmetova; Luis Cebamanos; Roman Iakymchuk; Tiberiu Rotaru; Mirko Rahn; Stefano Markidis; Erwin Laure; Valeria Bartsch; Christian Simmendinger
One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs.
european conference on parallel processing | 2017
Valeria Bartsch; Rui Machado; Dirk Merten; Mirko Rahn; Franz-Josef Pfreundt
Fault tolerance becomes an important feature at large computer systems where the mean time between failure decreases. Checkpointing is a method often used to provide resilience. We present an in-memory checkpointing library based on a PGAS API implemented with GASPI/GPI. It offers a substantial benefit when recovering from failure and leverages existing fault tolerance features of GASPI/GPI. The overhead of the library is negligible when testing it with a simple stencil code and a real life seismic imaging method.
ieee international conference on high performance computing data and analytics | 2014
D. Gruenewald; N. Ettrich; Mirko Rahn; F.J. Pfreundt
We have identified challenges upcoming hardware development is going to impose on RTM implementations. The increasing heterogeneity and complexity of target machines needs to be transparently mapped into the software layer. An efficient fault tolerance mechanism needs to be provided and I/O latencies need to be efficiently hidden. We have introduced a framework for RTM which is able to solve these problems. The framework is data dependency driven on two granularity levels. On the coarse level, concurrent computation of shots is powered by GPI-Space, a parallel development and execution framework. GPI-Space boosts our RTM framework by introducing a fault tolerant execution layer, an efficient topology mapping and an on the fly resource management. On the fine level, the computation of one shot is handled by domain decomposition in a task based model. The tight coupling between neighbouring domains is efficiently relaxed by the one sided asynchronous communication API GPI-2.0. Weak synchronization primitives allow for a fine granular and application specific breakup of data synchronization points with optimal overlap of communication by computation. Our framework has an inherent separation of parallelization and computation. Domain experts concentrate on the implementation of domain knowledge. Computer scientist can simultaneously do the parallelization and optimization.
arXiv: Algebraic Geometry | 2018
Janko Boehm; Wolfram Decker; Anne Frühbis-Krüger; Franz-Josef Pfreundt; Mirko Rahn; Lukas Ristau
5th International Conference on Exascale Applications and Software | 2018
Tiberiu Rotaru; Bernd Lörwald; Nicholas Brown; Mirko Rahn; Olivier Aumage; Vicenç Beltran; Xavier Teruel; Jan Ciesko; Jakub Šístek