Gerhard Wellein
University of Erlangen-Nuremberg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gerhard Wellein.
international conference on parallel processing | 2010
Jan Treibig; Georg Hager; Gerhard Wellein
Exploiting the performance of todays microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes a mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.Exploiting the performance of todays processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command-line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and toggling hardware prefetchers. An API for using the performance counting features from user code is also included. We clearly state the differences to the widely used PAPI interface. To demonstrate the capabilities of the tool set we show the influence of thread pinning on performance using the well-known OpenMP STREAM triad benchmark, and use the affinity and hardware counter tools to study the performance of a stencil code specifically optimized to utilize shared caches on multicore chips.
Reviews of Modern Physics | 2006
Alexander Weisse; Andreas Alvermann; H. Fehske; Gerhard Wellein
Efficient and stable algorithms for the calculation of spectral quantities and correlation functions are some of the key tools in computational condensed matter physics. In this article we review basic properties and recent developments of Chebyshev expansion based algorithms and the Kernel Polynomial Method. Characterized by a resource consumption that scales linearly with the problem dimension these methods enjoyed growing popularity over the last decade and found broad application not only in physics. Representative examples from the fields of disordered systems, strongly correlated electrons, electron-phonon interaction, and quantum spin systems we discuss in detail. In addition, we illustrate how the Kernel Polynomial Method is successfully embedded into other numerical techniques, such as Cluster Perturbation Theory or Monte Carlo simulation.
computer software and applications conference | 2009
Gerhard Wellein; Georg Hager; Thomas Zeiser; Markus Wittmann; H. Fehske
We present a pipelined wavefront parallelization approach for stencil-based computations. Within a fixed spatial domain successive wavefronts are executed by threads scheduled to a multicore processor chip with a shared outer level cache. By re-using data from cache in the successive wavefronts this multicore-aware parallelization strategy employs temporal blocking in a simple and efficient way. We use the Jacobi algorithm in three dimensions as a prototype or stencil-based computations and prove the efficiency of our approach on the latest generations of Intels x86 quad- and hexa-core processors.
SIAM Journal on Scientific Computing | 2014
Moritz Kreutzer; Georg Hager; Gerhard Wellein; H. Fehske; A. R. Bishop
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-
Concurrency and Computation: Practice and Experience | 2016
Georg Hager; Jan Treibig; Johannes Habich; Gerhard Wellein
C
ieee international conference on high performance computing data and analytics | 2003
Rolf Rabenseifner; Gerhard Wellein
-
Journal of Computational Science | 2011
Jan Treibig; Gerhard Wellein; Georg Hager
\sigma
parallel computing | 2011
Christian Feichtinger; Johannes Habich; Harald Köstler; Georg Hager; Ulrich Rüde; Gerhard Wellein
, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-
SIAM Journal on Scientific Computing | 2015
Tareq M. Malas; Georg Hager; Hatem Ltaief; Holger Stengel; Gerhard Wellein; David E. Keyes
C
ieee international symposium on parallel distributed processing workshops and phd forum | 2010
Markus Wittmann; Georg Hager; Gerhard Wellein
-