Is this you? Create Your Porfile

Matthias Brehm

Bavarian Academy of Sciences and Humanities

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthias Brehm is active.

Explore More

Publication

Featured researches published by Matthias Brehm.

international supercomputing conference | 2013

591 TFLOPS Multi-trillion Particles Simulation on SuperMUC

Wolfgang Eckhardt; Alexander Heinecke; Reinhold Bader; Matthias Brehm; Nicolay Hammer; Herbert Huber; Hans-Georg Kleinhenz; Jadran Vrabec; Hans Hasse; Martin Horsch; Martin Bernreuther; Colin W. Glass; Christoph Niethammer; Arndt Bode; Hans-Joachim Bungartz

Anticipating large-scale molecular dynamics simulations (MD) in nano-fluidics, we conduct performance and scalability studies of an optimized version of the code ls1 mardyn. We present our implementation requiring only 32 Bytes per molecule, which allows us to run the, to our knowledge, largest MD simulation to date. Our optimizations tailored to the Intel Sandy Bridge processor are explained, including vectorization as well as shared-memory parallelization to make use of Hyperthreading. Finally we present results for weak and strong scaling experiments on up to 146016 Cores of SuperMUC at the Leibniz Supercomputing Centre, achieving a speed-up of 133k times which corresponds to an absolute performance of 591.2 TFLOPS.

information and communication on technology for the fight against global warming | 2011

Principles of energy efficiency in high performance computing

Axel Auweter; Arndt Bode; Matthias Brehm; Herbert Huber; Dieter Kranzlmüller

High Performance Computing (HPC) is a key technology for modern researchers enabling scientific advances through simulation where experiments are either technically impossible or financially not feasible to conduct and theory is not applicable. However, the high degree of computational power available from todays supercomputers comes at the cost of large quantities of electrical energy being consumed. This paper aims to give an overview of the current state of the art and future techniques to reduce the overall power consumption of HPC systems and sites. We believe that a holistic approach for monitoring and operation at all levels of a supercomputing site is necessary. Thus, we do not only concentrate on the possibility of improving the energy efficiency of the compute hardware itself, but also of site infrastructure components for power distribution and cooling. Since most of the energy consumed by supercomputers is converted into heat, we also outline possible technologies to re-use waste heat in order to increase the Power Usage Effectiveness (PUE) of the entire supercomputing site.

european conference on parallel processing | 2000

Pseudovectorization, SMP, and Message Passing on the Hitachi SR8000-F1

Matthias Brehm; Reinhold Bader; Helmut Heller; Ralf Ebner

The Leibniz-Rechenzentrum in Munich has started operating a 112-node Hitachi SR8000-F1 with a peak performance of 1.3 Teraflops in the second quarter of 2000, the fastest computer in Europe. In order to make use of the full memory bandwidth and hence to obtain a significant fraction of the peak performance for memory intensive applications, the compilers offer preload and prefetch optimization strategies to pipeline load/store operations, as well as automatic parallelization across the 8 processors contained in every node. The nodes are connected by a conflict-free crossbar, enabling efficient communication via standard message-passing interfaces. An overview of the innovative architectural concepts is given. We demonstrate to which extent the capabilities of the compiler to automatically pseudovectorize/parallelize typical application code are sufficient to produce well-performing code.

ieee international conference on high performance computing data and analytics | 2016

DVFS automatic tuning plugin for energy related tuning objectives

Carla Guillén; Carmen B. Navarrete; David Brayford; Wolfram Hesse; Matthias Brehm

Energy consumption will become one of the dominant cost factors that will govern the next generation of large HPC centers. In this paper we present the Dynamic Voltage Frequency Scaling (DVFS) Plugin to automatically tune several energy related tuning objectives at a region-level of HPC applications. This plugin works with the Periscope Tuning Framework which provides an automatic tuning framework including analysis, experiment creation, and evaluation. The tuning actions are based on changes in the frequency via the DVFS. The tuning objectives include the tuning of energy consumption, total cost of ownership, energy delay product and power capping. The tuning is based on a model that relies on performance data and predicts energy consumption, time, and power consumption at different CPU frequencies.

Archive | 2011

A New Scalable Monitoring Tool Using Performance Properties of HPC Systems

Carla Guillén; Wolfram Hesse; Matthias Brehm

We present a monitoring and analysis tool prototype for system wide monitoring of High Performance Computers. The tool uses formal specification of properties which are based on hardware counters. These evaluate the performance at different granularities, namely at core, application and partition graininess. The information obtained is aimed at detecting single node performance as well as parallel execution performance. The goal is to identify performance bottlenecks in running applications as well as the general system behaviour. The scalability in our prototype for highly parallel machines is achieved through a distributed software architecture. We use an analysis agent at each partition. These agents communicate to a high level agent using a communication protocol based on TCP/IP. The high level agent has as a main task the synchronisation of the rest of the agents. Moreover, the analysis agents have the capability to use OpenMP within each partition to parallelise their monitoring tasks. Our approach used to tackle the storing of large amounts of information is achieved by data reduction. Only the properties that detect a bottleneck are stored, thus we don’t compromise the quality of the needed monitoring information.

parallel computing | 2016

Extreme Scale-out SuperMUC Phase 2 - lessons learned.

Nicolay Hammer; Ferdinand Jamitzky; Helmut Satzger; Momme Allalen; Alexander Block; Anupam Karmakar; Matthias Brehm; Reinhold Bader; Luigi Iapichino; Antonio Ragagnin; Vasilios Karakasis; Dieter Kranzlmüller; Arndt Bode; Herbert Huber; Martin Kühn; Rui Machado; Daniel Grünewald; P. V. F. Edelmann; F. K. Röpke; Markus Wittmann; Thomas Zeiser; Gerhard Wellein; Gerald Mathias; Magnus Schwörer; Konstantin Lorenzen; Christoph Federrath; Ralf S. Klessen; Karl-Ulrich Bamberg; H. Ruhl; Florian Schornbaum

In spring 2015, the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, LRZ), installed their new Peta-Scale System SuperMUC Phase2. Selected users were invited for a 28 day extreme scale-out block operation during which they were allowed to use the full system for their applications. The following projects participated in the extreme scale-out workshop: BQCD (Quantum Physics), SeisSol (Geophysics, Seismics), GPI-2/GASPI (Toolkit for HPC), Seven-League Hydro (Astrophysics), ILBDC (Lattice Boltzmann CFD), Iphigenie (Molecular Dynamic), FLASH (Astrophysics), GADGET (Cosmological Dynamics), PSC (Plasma Physics), waLBerla (Lattice Boltzmann CFD), Musubi (Lattice Boltzmann CFD), Vertex3D (Stellar Astrophysics), CIAO (Combustion CFD), and LS1-Mardyn (Material Science). The projects were allowed to use the machine exclusively during the 28 day period, which corresponds to a total of 63.4 million core-hours, of which 43.8 million core-hours were used by the applications, resulting in a utilization of 69%. The top 3 users were using 15.2, 6.4, and 4.7 million core-hours, respectively.

european conference on parallel processing | 2014

The PerSyst Monitoring Tool

Carla Guillén; Wolfram Hesse; Matthias Brehm

This paper presents a systemwide monitoring and analysis tool for high performance computers with several features aimed at minimizing the transport of performance data along a network of agents. The aim of the tool is to do a preliminary detection of performance bottlenecks on user applications running in HPC systems with a negligible impact on production runs. Continuous systemwide monitoring can lead to large volumes of data, if the data is required to be stored permanently to be available for queries. For system monitoring level we require to store the monitoring data synchronously. We retain the descriptive qualities by using quantiles; an aggregation with respect to the number of cores used by the application at every measuring interval. The optimization of the transport route for the performance data enables us to precisely calculate quantiles as opposed to quantile estimation.

Computing | 2017

Energy model derivation for the DVFS automatic tuning plugin: tuning energy and power related tuning objectives

Carla Guillén; Carmen B. Navarrete; David Brayford; Wolfram Hesse; Matthias Brehm

Archive | 2003

TeraFlops Computing with the Hitachi SR8000-F1: From Vision to Reality

Reinhold Bader; Matthias Brehm; Ralf Ebner; Helmut Heller; Ludger Palm; Frank Wagner

These proceedings are concerned with results from scientific investigations obtained with calculations done on the Hitachi SR8000-F1 supercomputer at Leibniz Computing Center. The following contribution gives an overview of the innovative Hitachi architecture and the configuration of the machine at LRZ.

international conference on supercomputing | 2014