Gerhard Wellein | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gerhard Wellein is active.

Explore More

Publication

Featured researches published by Gerhard Wellein.

international conference on parallel processing | 2010

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

Jan Treibig; Georg Hager; Gerhard Wellein

Exploiting the performance of todays microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes a mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.Exploiting the performance of todays processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and toggling hardware prefetchers. An API for using the performance counting features from user code is also included. We clearly state the differences to the widely used PAPI interface. To demonstrate the capabilities of the tool set we show the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and use the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.

Reviews of Modern Physics | 2006

The kernel polynomial method

Alexander Weisse; Andreas Alvermann; H. Fehske; Gerhard Wellein

Efficient and stable algorithms for the calculation of spectral quantities and correlation functions are some of the key tools in computational condensed matter physics. In this article we review basic properties and recent developments of Chebyshev expansion based algorithms and the Kernel Polynomial Method. Characterized by a resource consumption that scales linearly with the problem dimension these methods enjoyed growing popularity over the last decade and found broad application not only in physics. Representative examples from the fields of disordered systems, strongly correlated electrons, electron-phonon interaction, and quantum spin systems we discuss in detail. In addition, we illustrate how the Kernel Polynomial Method is successfully embedded into other numerical techniques, such as Cluster Perturbation Theory or Monte Carlo simulation.

computer software and applications conference | 2009

Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization

Gerhard Wellein; Georg Hager; Thomas Zeiser; Markus Wittmann; H. Fehske

We present a pipelined wavefront parallelization approach for stencil-based computations. Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a multicore processor chip with a shared outer level cache. By re-using data from cache in the successive wavefronts this multicore-aware parallelization strategy employs temporal blocking in a simple and efficient way. We use the Jacobi algorithm in three dimensions as a prototype or stencil-based computations and prove the efficiency of our approach on the latest generations of Intels x86 quad- and hexa-core processors.

SIAM Journal on Scientific Computing | 2014

A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units

Moritz Kreutzer; Georg Hager; Gerhard Wellein; H. Fehske; A. R. Bishop

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-

Concurrency and Computation: Practice and Experience | 2016

Exploring performance and power properties of modern multi-core chips via simple machine models

Georg Hager; Jan Treibig; Johannes Habich; Gerhard Wellein

ieee international conference on high performance computing data and analytics | 2003

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Rolf Rabenseifner; Gerhard Wellein

Journal of Computational Science | 2011

Efficient multicore-aware parallelization strategies for iterative stencil computations

Jan Treibig; Gerhard Wellein; Georg Hager

\sigma

parallel computing | 2011

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Christian Feichtinger; Johannes Habich; Harald Köstler; Georg Hager; Ulrich Rüde; Gerhard Wellein

, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-

SIAM Journal on Scientific Computing | 2015