Markus Stürmer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Markus Stürmer is active.

Explore More

Publication

Featured researches published by Markus Stürmer.

Concurrency and Computation: Practice and Experience | 2014

Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters

Björn Gmeiner; Harald Köstler; Markus Stürmer; Ulrich Rüde

This article studies the performance and scalability of a geometric multigrid solver implemented within the hierarchical hybrid grids (HHG) software package on current high performance computing clusters up to nearly 300,000 cores. HHG is based on unstructured tetrahedral finite elements that are regularly refined to obtain a block‐structured computational grid. One challenge is the parallel mesh generation from an unstructured input grid that roughly approximates a human head within a 3D magnetic resonance imaging data set. This grid is then regularly refined to create the HHG grid hierarchy. As test platforms, a BlueGene/P cluster located at Jülich supercomputing center and an Intel Xeon 5650 cluster located at the local computing center in Erlangen are chosen. To estimate the quality of our implementation and to predict runtime for the multigrid solver, a detailed performance and communication model is developed and used to evaluate the measured single node performance, as well as weak and strong scaling experiments on both clusters. Thus, for a given problem size, one can predict the number of compute nodes that minimize the overall runtime of the multigrid solver. Overall, HHG scales up to the full machines, where the biggest linear system solved on Jugene had more than one trillion unknowns. Copyright © 2012 John Wiley & Sons, Ltd.

ieee international conference on high performance computing data and analytics | 2010

Direct Numerical Simulation of Particulate Flows on 294912 Processor Cores

Jan Götz; Klaus Iglberger; Markus Stürmer; Ulrich Rüde

This paper describes computational models for particle-laden flows based on a fully resolved fluid-structure interaction. The flow simulation uses the Lattice Boltzmann method, while the particles are handled by a rigid body dynamics algorithm. The particles can have individual non-spherical shapes, creating the need for a non-trivial collision detection and special contact models. An explicit coupling algorithm transfers momenta from the fluid to the particles in each time step, while the particles impose moving boundaries for the flow solver. All algorithms and their interaction are fully parallelized. Scaling experiments and a careful performance analysis are presented for up to 294912 processor cores of the Blue Gene at the Jülich Supercomputing center. The largest simulations involve 264 million particles that are coupled to a fluid which is simultaneously resolved by 150 billion cells for the Lattice Boltzmann method. The paper will conclude with a computational experiment for the segregation of suspensions of particles of different density, as an example of the many industrial applications that are enabled by this new methodology.

Numerical Linear Algebra With Applications | 2008

A fast full multigrid solver for applications in image processing

Markus Stürmer; Harald Köstler; Ulrich Rüde

We present a fast, cell-centered multigrid solver and apply it to image denoising and non-rigid diffusion-based image registration. In both applications, real-time performance is required in 3D and the multigrid method has to be compared with solvers based on fast Fourier transform (FFT). The optimization of the underlying variational approach results for image denoising directly in one time step of a parabolic linear heat equation, for image registration a non-linear second-order system of partial differential equations is obtained. This system is solved by a fixpoint iteration using a semi-implicit time discretization, where each time step again results in an elliptic linear heat equation. The multigrid implementation comes close to real-time performance for medium size medical images in 3D for both applications and is compared with a solver based on FFT using available libraries. Copyright

Computers & Mathematics With Applications | 2009

Fluid flow simulation on the Cell Broadband Engine using the lattice Boltzmann method

Markus Stürmer; Jan Götz; Gregor Richter; Arnd Dörfler; Ulrich Rüde

In this paper we present a fast lattice Boltzmann fluid solver that has been performance optimized and tailored for the Cell Broadband Engine Architecture. Many design decisions were motivated by the long range objective to simulate blood flow in human blood vessels, especially in aneurysms, but have proven to be much more generally applicable. After explaining implementation details and how they were influenced by the target platform, the performance and memory requirements of this prototype solver are evaluated.

Archive | 2009

Challenges and Potentials of Emerging Multicore Architectures

Markus Stürmer; Gerhard Wellein; Georg Hager; Harald Köstler; Ulrich Rüde

We present performance results on two current multicore architectures, a STI (Sony, Toshiba, and IBM) Cell processor included in the new Playstation™ 3 and a Sun UltraSPARC T2 (“Niagara 2”) machine. On the Niagara 2 we analyze typical performance patterns that emerge from the peculiar way the memory controllers are activated on this chip using the standard STREAM benchmark and a shared-memory parallel lattice Boltzmann code. On the Cell processor we measure the memory bandwidth and run performance tests for LBM simulations. Additionally, we show results for an application in image processing on the Cell processor, where it is required to solve nonlinear anisotropic PDEs.

computational science and engineering | 2008

Optimising a 3D multigrid algorithm for the IA-64 architecture

Markus Stürmer; Jan Treibig; Ulrich Rüde

Multigrid methods are amongst the most efficient algorithms to numerically solve partial differential equations. However, standard implementations usually cannot exploit the potential of modern processors. The IA-64 architecture transferes most complexity to the software side to provide a highly superscalar design with large caches, leading to unique control over the actual execution. Exemplified on a simple multigrid solver equation in 3D and the Itanium 2 processor, we present how known performance optimisation techniques can be successfully combined. While implementation details are specific, the optimisation concept should be applicable for a wide range of numerical algorithm and CPUs.

Journal of Parallel and Distributed Computing | 2014

Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language

Richard Membarth; Oliver Reiche; Christian Schmitt; Frank Hannig; Jürgen Teich; Markus Stürmer; Harald Köstler

Abstract High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level CUDA and OpenCL code is generated from a high-level description of an algorithm specified in a Domain-Specific Language (DSL) instead of writing hand-tuned code for GPU accelerators. The DSL is part of the Heterogeneous Image Processing Acceleration ( HIPA cc ) framework and was extended in this work to handle grid hierarchies in order to model different cycle types. Language constructs are introduced to process and represent data at different resolutions. This allows to describe image processing algorithms that work on image pyramids as well as multigrid methods in the stencil domain. By decoupling the algorithm from its schedule, the proposed approach allows to generate efficient stencil code implementations. Our results show that similar performance compared to hand-tuned codes can be achieved.

Journal of Real-time Image Processing | 2016

Performance engineering to achieve real-time high dynamic range imaging

Harald Köstler; Markus Stürmer; Thomas Pohl

Image-processing applications like high dynamic range imaging can be done efficiently in the gradient space. For it, the image has to be transformed to gradient space and back. While the forward transformation to gradient space is fast by using simple finite differences, the backward transformation requires the solution of a partial differential equation. Although one can use an efficient multigrid solver for the backward transformation, it shows that a straightforward implementation of the standard algorithm does not lead to satisfactory runtime results for real-time high dynamic range compression of larger 2D X-ray images even on GPUs. Therefore, we do a rigorous performance analysis and derive a performance model for our multigrid algorithm that guides us to an improved implementation, where we achieve an overall performance of more than 25 frames per second for 16.8 Megapixel images doing full high dynamic range compression including data transfers between CPU and GPU. Together with a simple OpenGL visualization it becomes possible to perform real-time parameter studies on medical data sets.

Journal of Computational Science | 2014

Real-time simulation of temperature in hot rolling rolls

Markus Stürmer; Johannes Dagner; Paul Manstetten; Harald Köstler

Abstract Model-based process control is widely used in metal working. Often, simplified real-time capable numerical models replace or support measurement. In this work, we investigate the simulation of temperature in hot rolling rolls based on a 3-D model of a roll which is fast enough to act as soft sensor during operation. The involved discretization of the heat equation in cylindrical space is done via simple finite volumes and propagation in time is done via a forward Euler method. This allows a parallel and efficient implementation for GPGPUs. We validate our code against analytic solutions and measurements, provide a detailed performance analysis, and show simulation results for realistic rolls.

Numerical Linear Algebra With Applications | 2010

A fast‐adaptive composite grid algorithm for solving the free‐space Poisson problem on the cell broadband engine

Daniel Ritter; Markus Stürmer; Ulrich Rüde

Fast solvers for Poissons equation with boundary conditions at infinity are an important building block for molecular dynamics. One issue that arises when this equation is solved numerically is the infinite size of the domain. This prevents a direct solution so that other concepts have to be considered. Within this paper a method is discussed that employs hierarchically coarsened grids to overcome this problem. Special attention has to be paid to the discretization at the grid interfaces. A finite volume approach is used for the same. The resulting set of linear equations is solved using a fast-adaptive composite grid algorithm. Emphasis is put on the implementation of the method on the STI cell broadband engine, a modern multi core processor, that is powerful in floating point operations and memory bandwidth. Code optimization techniques are applied as well as parallelization of the code to get maximum performance on this processor. For validation of the performance test runs are executed and the runtime is analyzed in detail. Copyright

Explore More