Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Markus Geveler is active.

Publication


Featured researches published by Markus Geveler.


Journal of Computational Physics | 2013

Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster

Dominik Göddeke; Dimitri Komatitsch; Markus Geveler; Dirk Ribbrock; Nikola Rajovic; Nikola Puzovic; Alex Ramirez

Power consumption and energy efficiency are becoming critical aspects in the design and operation of large scale HPC facilities, and it is unanimously recognised that future exascale supercomputers will be strongly constrained by their power requirements. At current electricity costs, operating an HPC system over its lifetime can already be on par with the initial deployment cost. These power consumption constraints, and the benefits a more energy-efficient HPC platform may have on other societal areas, have motivated the HPC research community to investigate the use of energy-efficient technologies originally developed for the embedded and especially mobile markets. However, lower power does not always mean lower energy consumption, since execution time often also increases. In order to achieve competitive performance, applications then need to efficiently exploit a larger number of processors. In this article, we discuss how applications can efficiently exploit this new class of low-power architectures to achieve competitive performance. We evaluate if they can benefit from the increased energy efficiency that the architecture is supposed to achieve. The applications that we consider cover three different classes of numerical solution methods for partial differential equations, namely a low-order finite element multigrid solver for huge sparse linear systems of equations, a Lattice-Boltzmann code for fluid simulation, and a high-order spectral element method for acoustic or seismic wave propagation modelling. We evaluate weak and strong scalability on a cluster of 96 ARM Cortex-A9 dual-core processors and demonstrate that the ARM-based cluster can be more efficient in terms of energy to solution when executing the three applications compared to an x86-based reference machine.


Journal of Computational Science | 2011

A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on modern Multi- and Many-Core Architectures

Markus Geveler; Dirk Ribbrock; Sven Mallach; Dominik Göddeke

We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published set of open-source libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise e ciency, we exploit all levels of parallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared memory thread-level parallelism between cores, and parallelism between heterogeneous distributed memory resources in clusters. To evaluate and validate our approach, we implement a collection of modular building blocks for the easy and fast assembly and development of CFD applications based on the shallow water equations: We combine the Lattice-Boltzmann method with fluid-structure interaction techniques in order to achieve real-time simulations targeting interactive virtual environments. Our results demonstrate that recent multi-core CPUs outperform the Cell BE, while GPUs are significantly faster than conventional multi-threaded SSE code. In addition, we verify good scalability properties of our application on small clusters.


Computer Physics Communications | 2009

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

Danny van Dyk; Markus Geveler; Sven Mallach; Dirk Ribbrock; Dominik Göddeke; Carsten Gutwenger

Abstract We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEIs libraries, we achieve a two-fold speedup over straight forward C++ code using HONEIs SSE backend, and additional 3–4 and 4–16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summary Program title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queens University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the underlying hardware towards heterogeneity and parallelism. This is particularly relevant for data-intensive problems stemming from discretisations with local support, such as finite differences, volumes and elements. Solution method: To address these issues, we present a hardware aware collection of libraries combining the advantages of modern software techniques and hardware oriented programming. Applications built on top of these libraries can be configured trivially to execute on CPUs, GPUs or the Cell processor. In order to evaluate the performance and accuracy of our approach, we provide two domain specific applications; a multigrid solver for the Poisson problem and a fully explicit solver for 2D shallow water equations. Restrictions: HONEI is actively being developed, and its feature list is continuously expanded. Not all combinations of operations and architectures might be supported in earlier versions of the code. Obtaining snapshots from http://www.honei.org is recommended. Unusual features: The considered applications as well as all library operations can be run on NVIDIA GPUs and the Cell BE. Running time: Depending on the application, and the input sizes. The Poisson solver executes in few seconds, while the SWE solver requires up to 5 minutes for large spatial discretisations or small timesteps. References: [1] http://www.nvidia.com/cuda . [2] http://www.ibm.com/developerworks/power/cell .


Facing the multicore-challenge | 2010

Lattice-Boltzmann simulation of the shallow-water equations with fluid-structure interaction on multi- and manycore processors

Markus Geveler; Dirk Ribbrock; Dominik Göddeke; Stefan Turek

We present an efficient method for the simulation of laminar fluid flows with free surfaces including their interaction with moving rigid bodies, based on the two-dimensional shallow water equations and the Lattice-Boltzmann method. Our implementation targets multiple fundamentally different architectures such as commodity multicore CPUs with SSE, GPUs, the Cell BE and clusters. We show that our code scales well on an MPI-based cluster; that an eightfold speedup can be achieved using modern GPUs in contrast to multithreaded CPU code and, finally, that it is possible to solve fluid-structure interaction scenarios with high resolution at interactive rates.


Software for Exascale Computing | 2016

Hardware-Based Efficiency Advances in the EXA-DUNE Project

Peter Bastian; Christian Engwer; Jorrit Fahlke; Markus Geveler; Dominik Göddeke; Oleg Iliev; Olaf Ippisch; René Milk; Jan Mohring; Steffen Müthing; Mario Ohlberger; Dirk Ribbrock; Stefan Turek

We present advances concerning efficient finite element assembly and linear solvers on current and upcoming HPC architectures obtained in the frame of the Exa-Dune project, part of the DFG priority program 1648 Software for Exascale Computing (SPPEXA). In this project, we aim at the development of both flexible and efficient hardware-aware software components for the solution of PDEs based on the DUNE platform and the FEAST library. In this contribution, we focus on node-level performance and accelerator integration, which will complement the proven MPI-level scalability of the framework. The higher-level aspects of the Exa-Dune project, in particular multiscale methods and uncertainty quantification, are detailed in the companion paper (Bastian et al., Advances concerning multiscale methods and uncertainty quantification in Exa-Dune. In: Proceedings of the SPPEXA Symposium, 2016).


international conference on conceptual structures | 2010

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Dirk Ribbrock; Markus Geveler; Dominik Göddeke; Stefan Turek

We present di erent kernels based on Lattice-Boltzmann methods for the solution of the twodimensional Shallow Water and Navier-Stokes equations on fully structured lattices. The functionality ranges from simple scenarios like open-channel flows with planar beds to simulations with complex scene geometries like solid obstacles and non-planar bed topography with drystates and even interaction of the fluid with floating objects. The kernels are integrated into a hardware-oriented collection of libraries targeting multiple fundamentally di erent parallel hardware architectures like commodity multicore CPUs, the Cell BE, NVIDIA GPUs and clusters. We provide an algorithmic study which compares the di erent solvers in terms of performance and numerical accuracy in view of their capabilities and their specific implementation and optimisation on the di erent architectures. We show that an eightfold speedup over optimised multithreaded CPU code can be obtained with the GPU using basic methods and that even very complex flow phenomena can be simulated with significant speedups without loss of accuracy.


Software for Exascale Computing | 2016

Advances Concerning Multiscale Methods and Uncertainty Quantification in EXA-DUNE

Peter Bastian; Christian Engwer; Jorrit Fahlke; Markus Geveler; Dominik Göddeke; Oleg Iliev; Olaf Ippisch; René Milk; Jan Mohring; Steffen Müthing; Mario Ohlberger; Dirk Ribbrock; Stefan Turek

In this contribution we present advances concerning efficient parallel multiscale methods and uncertainty quantification that have been obtained in the frame of the DFG priority program 1648 Software for Exascale Computing (SPPEXA) within the funded project Exa-Dune. This project aims at the development of flexible but nevertheless hardware-specific software components and scalable high-level algorithms for the solution of partial differential equations based on the DUNE platform. While the development of hardware-based concepts and software components is detailed in the companion paper (Bastian et al., Hardware-based efficiency advances in the Exa-Dune project. In: Proceedings of the SPPEXA Symposium 2016, Munich, 25–27 Jan 2016), we focus here on the development of scalable multiscale methods in the context of uncertainty quantification. Such problems add additional layers of coarse grained parallelism, as the underlying problems require the solution of many local or global partial differential equations in parallel that are only weakly coupled.


Archive | 2017

Fundamentals of a numerical cloud computing for applied sciences

Markus Geveler; Stefan Turek

This paper is supposed to be used as a contribution for the `Consultation on Cloud Computing Research Innovation Challenges for WP 2018-2020 ́ as called for by the European Commission (DG CONNECT, unit `Cloud and software ́). We propose to encourage and support fundamental interdisciplinary research for making the benefits generated by cloud computing accessible to the applied science community. Introduction: Why cloud computing and high performance computing are contradicting The basic idea of cloud computing (CC) is to abstract from an IT infrastructure including compute-, memory-, networkingand software resources by virtualization. These resources are made accessible to the user in a dynamic and adaptive way. The major resulting advantages compared to a specially tailored `in-house solution ́ can be found in a transparent and simple usage, enhanced flexibility due to scalability and adaptivity to a specific need and finally in the increased efficiency due to savings in energy and money spent. The latter is due to scaling effects, operational efficiency, consolidation of resources and reduction of risks. The application is literally independent from any (local) data and compute resources as these can be concentrated effectively. All together, these advantages may some day supersede the traditional local / regional data center approach which can be found on the level of modern universities and research centers. From the point of view of data center management and operations, CC leads to a higher occupancy and therefore efficiency: The inevitable granularity effects that occur with medium or large workloads can be tackled with a backfilling of many small jobs. In addition, due to the fact that a specific application runs need for resources may vary from time to time, left-over capacities can be provided in a profitable `pay per use ́ style. In High Performance Computing (HPC) on the other hand, virtualization and abstraction concepts contradict the usual approaches especially in the simulation of technical processes: Here, the focus is put on enhancing the performance of an application by explicitly optimize for a certain type of hardware. This requires an a priori knowledge of the hardware which usually is given by the fact, that universities and regional research facilities have their own local or regional compute centers with comparatively static hardware components. This point of view can in some cases generate several orders of magnitude of performance gains and we call this concept hardware-oriented numerics. This paradigm comprises the simultaneous optimization for hardware, numerical and energy efficiency on all levels of application development [1,2,3,4]. One effort in hardware-oriented numerics is to optimize code and develop or choose numerical methods with respect to a heterogeneous hardware ecosystem: Multicore CPUs are as straight-forward as hardware accelerators like GPUs, FPGAs, Xeon Phi processors and system on a chip designs such as ARM-based CPUs with integrated GPUs. In addition, there are non-uniform memory architectures on the device level as well as heterogeneous communication infrastructures on the cluster level. The usual design pattern however is to optimize code for a (single) given hardware configuration where the simulation code is then optimized in a comparatively expensive way due to this proximity to hardware details. This development process is therefore the complete opposite of relying on a virtualization approach. Todays scientific cloud computing is not feasible for numerical simulation Up to today all efforts to make use of CC techniques in the science community can be characterized by what we call scientific cloud computing (SCC), which basically has been very successful for a specific type of application: In the scope of Big Data often a direct projection of a problem to a bag of tasks programming model can be found. Also other problems that are constituted by smaller independent tasks, where the coupling and therefore communication is minimal or zero can be coped with easily in a cloud environment. In numerical simulation on the other hand a strong coupling of the very computationally intense subproblems is the standard case. This induces a comparatively high synchronization need, requiring low communication latencies. The execution models of CC are literally blind for this type of strong coupling because the virtualization shuts down any attempt to optimize inter process communication. We believe, that the development of numerical simulation software should be characterized by the synthesis of hardware, numerical, and energy efficiency. Hence for this type of application a CC concept which takes into account the heterogeneity of compute hardware would be most feasible: According to our vision in future scenarios the user of such codes might want to choose for run time optimization in different metrics: Flexibility in the selection in which way a specific run should be allocated to a certain type(s) of compute node(s) are required. This flexibility has not been accounted for in the development of numerical code frameworks yet. A direct result of the service providers internalizing the concept of hardware-oriented numerics would be that the user of the service would be able to make an a priori choice for the core requirements for the run. For instance it could be decided whether an allocation of hardware should be made in order to minimize wall clock time or minimize energy to solution. Other hardware specifics could be made allocateable such as the type and properties of the communication links between nodes. The service would then return a number of allocations based upon available hardware. After selection, a complex optimization problem then has to be solved: The simulation software has to be able to select numerical algorithmic components that fit to this allocation and finally, a load balancing has to be performed for the individual problem to be solved. Towards a numerical cloud computing In order to realize this vision, there are two fundamental problems to solve: (1) Specially tailored numerics as well as load balancing strategies as well as (2) mapping, scheduling and operation strategies for numerical simulation have to be developed. In (1) numerical components in a code framework have to be revisited or developed from scratch with respect to (2) by adjusting them to the respective strategies. Such numerical alternatives range from preconditioners in linear solvers to whole discretization approaches on the model level. Different hardware specific implementations have to be provided and tuned in order to enable the optimizer in (2) to succeed, which is closely related to performance engineering. This has to be undergone with respect to all levels of parallelism in modern hardware architectures and on all levels of an application. On the other hand, the systems / strategies developed in (2) have to be sensitive for the effects of specific numerics on specific hardware. This problem is often closely related to numerical scaling, convergence and complexity theories. These theories and related skills are usually not addressed as an integral part of the training in computer science or service providers / operators. Here an automatic tuning system has to be developed that is capable of deciding what type of numerics is to be used for a given hardware allocation and which parts of the data are distributed to which part of the hardware by a static or even dynamic load balancing. The latter is an even more complex problem keeping in mind the heterogeneity even within one specific allocation, where CPUs are for instance to be saturated alongside GPUs. This optimization problem is very similar to how compilers schedule instructions on the processor level. It is also multi-dimensional, as not only raw performance has to be optimized for but also energy to solution as stated in the previous section. Hence we emphasize that these two components, (1) and (2), cannot be brought up independently: Specialists from the domain of applied mathematics, performance engineers and application specialists are required for the former, whereas the latter is to be coped with by computer sciences and service providers / specialists.


Archive | 2017

How applied sciences can accelerate the energy revolution

Markus Geveler; Stefan Turek

In this paper we propose a course of action towards a better understanding of energy consumption-related aspects in the development of scientific software as well as in the development and usage of ‘unconventional’compute hardware in applied sciences. We demonstrate how the applied sciences community can make a significant contribution in reducing the energy footprint of their computations.


Computers & Fluids | 2013

Towards a complete FEM-based simulation toolkit on GPUs: Unstructured grid finite element geometric multigrid solvers with strong smoothers based on sparse approximate inverses

Markus Geveler; Dirk Ribbrock; Dominik Göddeke; Peter Zajac; Stefan Turek

Collaboration


Dive into the Markus Geveler's collaboration.

Top Co-Authors

Avatar

Dirk Ribbrock

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar

Dominik Göddeke

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar

Stefan Turek

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar

Peter Zajac

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

René Milk

University of Münster

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge