Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Schliephake is active.

Publication


Featured researches published by Michael Schliephake.


ieee international conference on high performance computing data and analytics | 2015

OpenACC acceleration of the Nek5000 spectral element code

Stefano Markidis; Jing Gong; Michael Schliephake; Erwin Laure; Alistair Hart; David Henty; Katherine Heisey; Paul Fischer

We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.


international conference on conceptual structures | 2011

Design and Implementation of a Runtime System for Parallel Numerical Simulations on Large-Scale Clusters

Michael Schliephake; Xavier Aguilar; Erwin Laure

Abstract The execution of scientific codes will introdu ce a number of new challenges and intensify some old ones on new high-performance computing infrastructures. Petascale computers are large systems with complex designs using heterogeneous technologies that make the programming and porting of applications difficult, particularly if one wants to use the maximum peak performance of the system. In this paper we present the design and first prototype of a runtime system for parallel numerical simulations on large-scale systems. The proposed runtime system addresses the challenges of performance, scalability, and programmability of large-scale HPC systems. We also present initial results of our prototype implementation using a molecular dynamics application kernel.


Future Generation Computer Systems | 2013

Scalability analysis of Dalton, a molecular structure program

Xavier Aguilar; Michael Schliephake; Olav Vahtras; Judit Gimenez; Erwin Laure

Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.


ieee international conference on escience | 2011

Scaling Dalton, A Molecular Electronic Structure Program

Xavier Aguilar; Michael Schliephake; Olav Vahtras; Judit Gimenez; Erwin Laure

Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose a solution that helps to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers and increase the parallel efficiency.


International Conference on Exascale Applications and Software, EASC 2014 | 2015

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Jing Gong; Stefano Markidis; Michael Schliephake; Erwin Laure; Dan S. Henningson; Philipp Schlatter; Adam Peplinski; Alistair Hart; Jens Doleschal; David Henty; Paul Fischer

Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flows. We follow up on an earlier study which ported the simplified version of Nek5000 to a GPU-accelerated system by presenting the hybrid CPU/GPU implementation of the full Nek5000 code using OpenACC. The matrix-matrix multiplication, the Nek5000 gather-scatter operator and a preconditioned Conjugate Gradient solver have implemented using OpenACC for multi-GPU systems. We report an speed-up of 1.3 on single node of a Cray XK6 when using OpenACC directives in Nek5000. On 512 nodes of the Titan supercomputer, the speed-up can be approached to 1.4. A performance analysis of the Nek5000 code using Score-P and Vampir performance monitoring tools shows that overlapping of GPU kernels with host-accelerator memory transfers would considerably increase the performance of the OpenACC version of Nek5000 code.


arXiv: Distributed, Parallel, and Cluster Computing | 2014

Performance Analysis of Irregular Collective Communication with the Crystal Router Algorithm

Michael Schliephake; Erwin Laure

In order to achieve exascale performance it is important to detect potential bottlenecks and identify strategies to overcome them. For this, both applications and system software must be analysed and potentially improved. The EU FP7 project Collaborative Research into Exascale Systemware, Tools & Applications (CRESTA) chose the approach to co-design advanced simulation applications and system software as well as development tools. In this paper, we present the results of a co-design activity focused on the simulation code NEK5000 that aims at performance improvements of collective communication operations. We have analysed the algorithms that form the core of NEK5000’s communication module in order to assess its viability on recent computer architectures before starting to improve its performance. Our results show that the crystal router algorithm performs well in sparse, irregular collective operations for medium and large processor number but improvements for even larger system sizes of the future will be needed. We sketch the needed improvements, which will make the communication algorithms also beneficial for other applications that need to implement latency-dominated communication schemes with short messages. The latency-optimised communication operations will also become used in a runtime-system providing dynamic load balancing, under development within CRESTA.


International Conference on Exascale Applications and Software | 2014

Nek5000 with OpenACC

Jing Gong; Stefano Markidis; Michael Schliephake; Erwin Laure; Dan S. Henningson; Philipp Schlatter; Adam Peplinski; Alistair Hart; Jens Doleschal; David Henty; Paul Fischer

Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flows. We follow up on an earlier study which ported the simplified version of Nek5000 to a GPU-accelerated system by presenting the hybrid CPU/GPU implementation of the full Nek5000 code using OpenACC. The matrix-matrix multiplication, the Nek5000 gather-scatter operator and a preconditioned Conjugate Gradient solver have implemented using OpenACC for multi-GPU systems. We report an speed-up of 1.3 on single node of a Cray XK6 when using OpenACC directives in Nek5000. On 512 nodes of the Titan supercomputer, the speed-up can be approached to 1.4. A performance analysis of the Nek5000 code using Score-P and Vampir performance monitoring tools shows that overlapping of GPU kernels with host-accelerator memory transfers would considerably increase the performance of the OpenACC version of Nek5000 code.


the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) | 2015

NekBone with Optimizaed OpenACC directives

Jing Gong; Stefano Markidis; Michael Schliephake; Erwin Laure; Luis Cebamanos; Alistair Hart; Misun Min; Paul F. Fischer


Exascale Software and Applications Conference | 2013

Paving the path to exascale computing with CRESTA development environment

Stefano Markidis; Michael Schliephake; Xavier Aguilar; David Henty; Harvey Richardson; Alistair Hart; Alan Gray; David Lecomber; Tobias Hilbrich; Jens Doleschal; Erwin Laure


Exascale Applications and Software Conference; Edinburgh, Scotland, UK, 9-11 April 2013 | 2013

OpenACC Acceleration of Nek5000 : a Spectral Element Code

Jing Gong; Alistair Hart; David Henty; Stefano Markidis; Michael Schliephake; Paul F. Fischer; Katherine Heisey

Collaboration


Dive into the Michael Schliephake's collaboration.

Top Co-Authors

Avatar

Erwin Laure

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Stefano Markidis

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jing Gong

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David Henty

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Xavier Aguilar

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Katherine Heisey

Washington University in St. Louis

View shared research outputs
Top Co-Authors

Avatar

Paul F. Fischer

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jens Doleschal

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Adam Peplinski

Royal Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge