Christoph Niethammer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christoph Niethammer is active.

Explore More

Publication

Featured researches published by Christoph Niethammer.

Journal of Chemical Theory and Computation | 2014

ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems

Christoph Niethammer; Stefan Becker; Martin Bernreuther; Martin Buchholz; Wolfgang Eckhardt; Alexander Heinecke; Stephan Werth; Hans-Joachim Bungartz; Colin W. Glass; Hans Hasse; Jadran Vrabec; Martin Horsch

The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.

international supercomputing conference | 2013

591 TFLOPS Multi-trillion Particles Simulation on SuperMUC

Wolfgang Eckhardt; Alexander Heinecke; Reinhold Bader; Matthias Brehm; Nicolay Hammer; Herbert Huber; Hans-Georg Kleinhenz; Jadran Vrabec; Hans Hasse; Martin Horsch; Martin Bernreuther; Colin W. Glass; Christoph Niethammer; Arndt Bode; Hans-Joachim Bungartz

Anticipating large-scale molecular dynamics simulations (MD) in nano-fluidics, we conduct performance and scalability studies of an optimized version of the code ls1 mardyn. We present our implementation requiring only 32 Bytes per molecule, which allows us to run the, to our knowledge, largest MD simulation to date. Our optimizations tailored to the Intel Sandy Bridge processor are explained, including vectorization as well as shared-memory parallelization to make use of Hyperthreading. Finally we present results for weak and strong scaling experiments on up to 146016 Cores of SuperMUC at the Leibniz Supercomputing Centre, achieving a speed-up of 133k times which corresponds to an absolute performance of 591.2 TFLOPS.

Journal of Computational Science | 2013

Programmability and portability for exascale: Top down programming methodology and tools with StarSs

Vladimir Subotic; Steffen Brinkmann; Vladimir Marjanovic; Rosa M. Badia; José Gracia; Christoph Niethammer; Eduard Ayguadé; Jesús Labarta; Mateo Valero

Abstract StarSs is a task-based programming model that allows to parallelize sequential applications by means of annotating the code with compiler directives. The model further supports transparent execution of designated tasks on heterogeneous platforms, including clusters of GPUs. This paper focuses on the methodology and tools that complements the programming model forming a consistent development environment with the objective of simplifying the live of application developers. The programming environment includes the tools TAREADOR and TEMANEJO, which have been designed specifically for StarSs. TAREADOR, a Valgrind-based tool, allows a top-down development approach by assisting the programmer in identifying tasks and their data-dependencies across all concurrency levels of an application. TEMANEJO is a graphical debugger supporting the programmer by visualizing the task dependency tree on one hand, but also allowing to manipulate task scheduling or dependencies. These tools are complemented with a set of performance analysis tools (Scalasca, Cube and Paraver) that enable to fine tune StarSs application.

international symposium on parallel and distributed processing and applications | 2012

Avoiding Serialization Effects in Data / Dependency Aware Task Parallel Algorithms for Spatial Decomposition

Christoph Niethammer; Colin W. Glass; José Gracia

Spatial decomposition is a popular basis for parallelising code. Cast in the frame of task parallelism, calculations on a spatial domain can be treated as a task. If neighbouring domains interact and share results, access to the specific data needs to be synchronized to avoid race conditions. This is the case for a variety of applications, like most molecular dynamics and many computational fluid dynamics codes. Here we present an unexpected problem which can occur in dependency-driven task parallelization models like StarSs: the tasks accessing a specific spatial domain are treated as interdependent, as dependencies are detected automatically via memory addresses. Thus, the order in which tasks are generated will have a severe impact on the dependency tree. In the worst case, a complete serialization is reached and no two tasks can be calculated in parallel. We present the problem in detail based on an example from molecular dynamics, and introduce a theoretical framework to calculate the degree of serialization. Furthermore, we present strategies to avoid this unnecessary problem. We recommend treating these strategies as best practice when using dependency-driven task parallel programming models like StarSs on such scenarios.

international conference on parallel processing | 2015

A Bandwidth-Saving Optimization for MPI Broadcast Collective Operation

Huan Zhou; Vladimir Marjanović; Christoph Niethammer; José Gracia

The efficiency and scalability of MPI collective operations, in particular the broadcast operation, plays an integral part in high performance computing applications. MPICH, as one of the contemporary widely-used MPI software stacks, implements the broadcast operation based on point-to-point operation. Depending on the parameters, such as message size and process count, the library chooses to use different algorithms, as for instance binomial dissemination, recursive-doubling exchange or ring all-to-all broadcast (all-gather). However, the existing broadcast design in latest release of MPICH does not provide good performance for large messages (lmsg) or medium messages with non-power-of-two process counts (mmsg-npof2) due to the inner suboptimal ring allgather algorithm. In this paper, based on the native broadcast design in MPICH, we propose a tuned broadcast approach with bandwidth-saving in mind catering to the case of lmsg and mmsg-npof2. Several comparisons of the native and tuned broadcast designs are made for different data sizes and program sizes on Cray XC40 cluster. The results show that the performance of the tuned broadcast design can get improved by a range from 2% to 54% for lmsg and mmsg-npof2 in terms of user-level testing.

Parallel Tools Workshop | 2013

Task Debugging with TEMANEJO

Steffen Brinkmann; José Gracia; Christoph Niethammer

In recent years memory layouts have become more and more complex and bandwidth turned out to be the crucial performance parameter. This reflects in new programming paradigms which focus on data flow rather than instruction sequence. A very successful approach is StarSs, where the parallel programme consists of small computing units called tasks and dependencies between these tasks which are defined by the programmer. At runtime a dependency graph is created which determines the parallel or sequential execution of the tasks. When it comes to debugging StarSs applications, traditional debuggers such as gdb don’t provide enough information and control to uncover shortcomings of the program. We present a new type of debugger which acts on the task level giving the user access to the dependency graph. Information is extracted from the running application with the lightweight library Ayudame and the information is passed to the remote client Temanejo which visualises the dependency graph and passes user requests, such as blocking or prioritising a task, to the application.

Information Technology | 2013

Computational Molecular Engineering as an Emerging Technology in Process Engineering

Martin Horsch; Christoph Niethammer; Jadran Vrabec; Hans Hasse

Abstract The present level of development of molecular force field methods is assessed from the point of view of simulation-based engineering, outlining the immediate perspective for further development and highlighting the newly emerging discipline of “Computational Molecular Engineering (CME)” which makes basic research in soft matter physics fruitful for industrial applications. Within the coming decade, major breakthroughs can be reached if a research focus is placed on processes at interfaces, combining aspects where an increase in the accessible length and time scales due to massively parallel high-performance computing will lead to particularly significant improvements. Zusammenfassung Der aktuelle Entwicklungsstand molekularer Kraftfeldmethoden wird vom Standpunkt des simulationsgestützten Ingenieurwesens beurteilt, indem Perspektiven für die unmittelbare Zukunft herausgearbeitet werden. Dabei ist insbesondere die neu entstehende Disziplin des “Computational Molecular Engineering” (CME) zu beachten, die Ergebnisse aus der physikalischen Grundlagenforschung für die industrielle Anwendung nutzbar macht. Im kommenden Jahrzehnt sind hier größere Durchbrüche zu erwarten, wenn in der Forschung ein Schwerpunkt auf Prozesse an Grenzflächen gesetzt wird. Die Kombination aus den größeren Längen- und Zeitskalen, die durch massiv-paralleles Höchstleistungsrechnen erschlossen werden, wird auf diesem Gebiet zu besonders maßgeblichen Fortschritten führen.

international symposium on parallel and distributed processing and applications | 2012

Hybrid MPI/StarSs -- A Case Study

José Gracia; Christoph Niethammer; Manuel Hasert; Steffen Brinkmann; Rainer Keller; Colin W. Glass

Hybrid parallel programming models combining distributed and shared memory paradigms are well established in high-performance computing. The classical prototype of hybrid programming in HPC is MPI/OpenMP, but many other combinations are being investigated. Recently, the data-dependency driven, task parallel model for shared memory parallelisation named StarSs has been suggested for usage in combination with MPI. In this paper we apply hybrid MPI/StarSs to a Lattice-Boltzmann code. In particular, we present the hybrid programming model, the benefits we expect, the challenges in porting, and finally a comparison of the performance of MPI/StarSs hybrid, MPI/OpenMP hybrid and the original MPI-only versions of the same code.

Parallel Tools Workshop | 2012

Temanejo: Debugging of Thread-Based Task-Parallel Programs in StarSS

Rainer Keller; Steffen Brinkmann; Jos ´ e Gracia; Christoph Niethammer

To make use of manycore processors and even accelerators, several parallel programming paradigms exist, such as OpenMP, CAPS HMPP and the StarSs programming model. All of these programming models provide the means for programmers to express parallelism in the source code, identifying tasks and for all but OpenMP the dependency between those, allowing the compiler and the runtime to schedule tasks onto multiple concurrent executing entities, like threads in a many-core systems. While the programmer may have a good overview of which parts of the code may be run independently as separate tasks on a fine granular level, the overall execution behavior may not be obvious at first. This paper describes the usability features of the newly developed Temanejo debugger.

arXiv: Computational Physics | 2011

Static and Dynamic Properties of Curved Vapour-Liquid Interfaces by Massively Parallel Molecular Dynamics Simulation

Martin Thomas Horsch; Svetlana Miroshnichenko; Jadran Vrabec; Colin W. Glass; Christoph Niethammer; Martin Bernreuther; Erich A. Müller; George Jackson

Curved fluid interfaces are investigated on the nanometre length scale by molecular dynamics simulation. Thereby, droplets surrounded by a metastable vapour phase are stabilized in the canonical ensemble. Analogous simulations are conducted for cylindrical menisci separating vapour and liquid phases under confinement in planar nanopores. Regarding the emergence of nanodroplets during nucleation, a non-equilibrium phenomenon, both the non-steady dynamics of condensation processes and stationary quantities related to supersaturated vapours are considered. Results for the truncated and shifted Lennard-Jones fluid and for mixtures of quadrupolar fluids confirm the applicability of the capillarity approximation and the classical nucleation theory.

Explore More