Moritz Kreutzer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Moritz Kreutzer is active.

Explore More

Publication

Featured researches published by Moritz Kreutzer.

SIAM Journal on Scientific Computing | 2014

A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units

Moritz Kreutzer; Georg Hager; Gerhard Wellein; H. Fehske; A. R. Bishop

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-

SIAM Journal on Scientific Computing | 2015

Increasing the Performance of the Jacobi--Davidson Method by Blocking

Melven Röhrig-Zöllner; Jonas Thies; Moritz Kreutzer; Andreas Alvermann; Andreas Pieper; Achim Basermann; Georg Hager; Gerhard Wellein; H. Fehske

International Journal of Parallel Programming | 2017

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Moritz Kreutzer; Jonas Thies; Melven Röhrig-Zöllner; Andreas Pieper; Faisal Shahzad; Martin Galgon; Achim Basermann; H. Fehske; Georg Hager; Gerhard Wellein

Journal of Computational Physics | 2016

High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

Andreas Pieper; Moritz Kreutzer; Andreas Alvermann; Martin Galgon; H. Fehske; Georg Hager; Bruno Lang; Gerhard Wellein

\sigma

international parallel and distributed processing symposium | 2015

Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems

Moritz Kreutzer; Andreas Pieper; Georg Hager; Gerhard Wellein; Andreas Alvermann; H. Fehske

, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-

Parallel Processing Letters | 2013

A SURVEY OF CHECKPOINT/RESTART TECHNIQUES ON DISTRIBUTED MEMORY SYSTEMS

Faisal Shahzad; Markus Wittmann; Moritz Kreutzer; Thomas Zeiser; Georg Hager; Gerhard Wellein

International Journal of High Performance Computing Applications | 2018

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs

Hartwig Anzt; Moritz Kreutzer; Eduardo Ponce; Gregory D. Peterson; Gerhard Wellein; Jack J. Dongarra

international conference on cluster computing | 2015

Building a Fault Tolerant Application Using the GASPI Communication Layer

Faisal Shahzad; Moritz Kreutzer; Thomas Zeiser; Rui Machado; Andreas Pieper; Georg Hager; Gerhard Wellein

\sigma

european conference on parallel processing | 2014

ESSEX - Equipping Sparse Solvers for Exascale

Andreas Alvermann; Achim Basermann; H. Fehske; Martin Galgon; Georg Hager; Moritz Kreutzer; Lukas Krämer; Bruno Lang; Andreas Pieper; Melven Röhrig-Zöllner; Faisal Shahzad; Jonas Thies; Gerhard Wellein

compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wi...

parallel computing | 2017

Preconditioned Krylov Solvers on GPUs

Hartwig Anzt; Mark Gates; Jack J. Dongarra; Moritz Kreutzer; Gerhard Wellein; Martin Köhler

Block variants of the Jacobi--Davidson method for computing a few eigenpairs of a large sparse matrix are known to improve the robustness of the standard algorithm when it comes to computing multiple or clustered eigenvalues. In practice, however, they are typically avoided because the total number of matrix-vector operations increases. In this paper we present the implementation of a block Jacobi--Davidson solver. By detailed performance engineering and numerical experiments we demonstrate that the increase in operations is typically more than compensated by performance gains through better cache usage on modern CPUs, resulting in a method that is both more efficient and robust than its single vector counterpart. The steps to be taken to achieve a block speedup involve both kernel optimizations for sparse matrix and block vector operations, and algorithmic choices to allow using blocked operations in most parts of the computation. We discuss the aspect of avoiding synchronization in the algorithm and sho...

Explore More