Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Georg Hager is active.

Publication


Featured researches published by Georg Hager.


parallel, distributed and network-based processing | 2009

Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

Rolf Rabenseifner; Georg Hager; Gabriele Jost

Today most systems in high-performance computing (HPC) feature a hierarchical hardware design: Shared memory nodes with several multi-core CPUs are connected via a network infrastructure. Parallel programming must combine distributed memory parallelization on the node interconnect with shared memory parallelization inside each node. We describe potentials and challenges of the dominant programming models on hierarchically structured hardware: Pure MPI (Message Passing Interface), pure OpenMP (with distributed shared memory extensions) and hybrid MPI+OpenMP in several ¿avors. We pinpoint cases where a hybrid programming model can indeed be the superior solution because of reduced communication needs and memory consumption, or improved load balance. Furthermore we show that machine topology has a signi¿cant impact on performance for all parallelization strategies and that topology awareness should be built into all applications in the future. Finally we give an outlook on possible standardization goals and extensions that could make hybrid programming easier to do with performance in mind.


international conference on parallel processing | 2010

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

Jan Treibig; Georg Hager; Gerhard Wellein

Exploiting the performance of todays microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes a mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.Exploiting the performance of todays processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and toggling hardware prefetchers. An API for using the performance counting features from user code is also included. We clearly state the differences to the widely used PAPI interface. To demonstrate the capabilities of the tool set we show the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and use the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.


computer software and applications conference | 2009

Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization

Gerhard Wellein; Georg Hager; Thomas Zeiser; Markus Wittmann; H. Fehske

We present a pipelined wavefront parallelization approach for stencil-based computations. Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a multicore processor chip with a shared outer level cache. By re-using data from cache in the successive wavefronts this multicore-aware parallelization strategy employs temporal blocking in a simple and efficient way. We use the Jacobi algorithm in three dimensions as a prototype or stencil-based computations and prove the efficiency of our approach on the latest generations of Intels x86 quad- and hexa-core processors.


SIAM Journal on Scientific Computing | 2014

A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units

Moritz Kreutzer; Georg Hager; Gerhard Wellein; H. Fehske; A. R. Bishop

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-


Concurrency and Computation: Practice and Experience | 2016

Exploring performance and power properties of modern multi-core chips via simple machine models

Georg Hager; Jan Treibig; Johannes Habich; Gerhard Wellein

C


Journal of Computational Science | 2011

Efficient multicore-aware parallelization strategies for iterative stencil computations

Jan Treibig; Gerhard Wellein; Georg Hager

-


parallel processing and applied mathematics | 2009

Introducing a performance model for bandwidth-limited loop kernels

Jan Treibig; Georg Hager

\sigma


parallel computing | 2011

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Christian Feichtinger; Johannes Habich; Harald Köstler; Georg Hager; Ulrich Rüde; Gerhard Wellein

, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-


SIAM Journal on Scientific Computing | 2015

MULTICORE-OPTIMIZED WAVEFRONT DIAMOND BLOCKING FOR OPTIMIZING STENCIL UPDATES ∗

Tareq M. Malas; Georg Hager; Hatem Ltaief; Holger Stengel; Gerhard Wellein; David E. Keyes

C


ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

Markus Wittmann; Georg Hager; Gerhard Wellein

-

Collaboration


Dive into the Georg Hager's collaboration.

Top Co-Authors

Avatar

Gerhard Wellein

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

H. Fehske

University of Greifswald

View shared research outputs
Top Co-Authors

Avatar

Thomas Zeiser

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Moritz Kreutzer

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Jan Treibig

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Andreas Pieper

University of Greifswald

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Markus Wittmann

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Faisal Shahzad

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge