Achim Basermann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Achim Basermann is active.

Explore More

Publication

Featured researches published by Achim Basermann.

european conference on parallel processing | 1999

DRAMA: A Library for Parallel Dynamic Load Balancing of Finite Element Applications

Bart Maerten; Dirk Roose; Achim Basermann; Jochen Fingberg; Guy Lonsdale

We describe a software library for dynamic load balancing of finite element codes. The application code has to provide the current distributed mesh and information on the calculation and communication requirements, and receives from the library all necessary information to re-allocate the application data. The library computes a new partitioning, either via direct mesh migration or via parallel graph re-partitioning, by interfacing to the ParMetis or Jostle package. We describe the functionality of the DRAMA library and we present some results.

Applied Mathematical Modelling | 2000

Dynamic load-balancing of finite element applications with the DRAMA library

Achim Basermann; Jean Clinckemaillie; Thierry Coupez; Jochen Fingberg; Hugues Digonnet; Richard Ducloux; Jean-Marc Gratien; Ulrich Hartmann; Guy Lonsdale; Bart Maerten; Dirk Roose; Chris Walshaw

The DRAMA library, developed within the European Commission funded (ESPRIT) project DRAMA, supports dynamic load-balancing for parallel (message-passing) mesh-based applications. The target applications are those with dynamic and solution-adaptive features. The focus within the DRAMA project was on finite element simulation codes for structural mechanics. An introduction to the DRAMA library will illustrate that the very general cost model and the interface designed specifically for application requirements provide simplified and effective access to a range of parallel partitioners. The main body of the paper will demonstrate the ability to provide dynamic load-balancing for parallel FEM problems that include: adaptive meshing, re-meshing, the need for multi-phase partitioning.

parallel computing | 1997

Preconditioned CG methods for sparse matrices on massively parallel machines

Achim Basermann; Björn Reichel; Christof Schelthoff

Abstract Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method, in particular on massively parallel machines. Here, the data distribution and the communication scheme for the sparse matrix operations of the preconditioned CG are based on the analysis of the indices of the non-zero elements. Polynomial preconditioning is shown to reduce global synchronizations considerably, and a fully local incomplete Cholesky preconditioner is presented. On a PARAGON XP/S 10 with 138 processors, the developed parallel methods outperform diagonally scaled CG markedly with respect to both scaling behavior and execution time for many matrices from real finite element applications.

parallel computing | 2001

Dynamic multi-partitioning for parallel finite element applications

Achim Basermann; Jochen Fingberg; Guy Lonsdale; Bart Maerten; Chris Walshaw

The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components.

SIAM Journal on Scientific Computing | 2015

Increasing the Performance of the Jacobi--Davidson Method by Blocking

Melven Röhrig-Zöllner; Jonas Thies; Moritz Kreutzer; Andreas Alvermann; Andreas Pieper; Achim Basermann; Georg Hager; Gerhard Wellein; H. Fehske

Block variants of the Jacobi--Davidson method for computing a few eigenpairs of a large sparse matrix are known to improve the robustness of the standard algorithm when it comes to computing multiple or clustered eigenvalues. In practice, however, they are typically avoided because the total number of matrix-vector operations increases. In this paper we present the implementation of a block Jacobi--Davidson solver. By detailed performance engineering and numerical experiments we demonstrate that the increase in operations is typically more than compensated by performance gains through better cache usage on modern CPUs, resulting in a method that is both more efficient and robust than its single vector counterpart. The steps to be taken to achieve a block speedup involve both kernel optimizations for sparse matrix and block vector operations, and algorithmic choices to allow using blocked operations in most parts of the computation. We discuss the aspect of avoiding synchronization in the algorithm and sho...

International Journal of Parallel Programming | 2017

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Moritz Kreutzer; Jonas Thies; Melven Röhrig-Zöllner; Andreas Pieper; Faisal Shahzad; Martin Galgon; Achim Basermann; H. Fehske; Georg Hager; Gerhard Wellein

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring “standard” as well as “accelerated” resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The “General, Hybrid, and Optimized Sparse Toolkit” (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the “MPI+X” paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack. The library code and several applications are available as open source.

parallel computing | 1992

A parallel algorithm for determining all eigenvalues of large real symmetric tridiagonal matrices

Achim Basermann; P. Weidner

Abstract A method for determining all eigenvalues of large real symmetric tridiagonal matrices on multiprocessor system with vector facilities is presented. For finding the eigenvalues of a tridiagonal matrix, the method of the Sturm sequence is a standard method. The method uses bisection first to isolate all eigenvalues, bisection is and then to extract the eigenvalues to a predefined accuracy. For extracting the eigenvalues, bisection is accelerated by a superlinearly convergent zero finder, the Pegasus method. The evaluation of the Sturm sequence is the central component for both isolation and extraction. Some new ideas are presented, such as a method for weighting the values of the characteristics polynomial to avoid under- or overflow, a method for combining the Pegasus method with preceding bisection steps and a vectorization and parallelization strategy over intervals. The method was implemented and the results were measured on a SUPRENUM multiprocessor system with 16 processors and on a CRAY Y-MP8/832 with 8 processors. On the latter machine, both the sequential and parallel execution time of our algorithm ALLEV (ALL Eigen Values) presented in this paper are considerably shorter than the execution times of the vectorized EISPACK-routine TQL1 which uses the QL method.

parallel computing | 2015

On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues

Martin Galgon; Lukas Krämer; Jonas Thies; Achim Basermann; Bruno Lang

Parallel iterative solution of linear systems from FEAST algorithm.Hybrid parallel implementation.CG variant with multi-coloring approach for better performance on hybrid systems. Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithms potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods.

Archive | 2011

HICFD: Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures

Achim Basermann; Hans-Peter Kersken; Andreas Schreiber; Thomas Gerhold; Jens Jägersküpper; Norbert Kroll; Jan Backhaus; Edmund Kügeler; Thomas Alrutz; Christian Simmendinger; Kim Feldhoff; Olaf Krzikalla; Ralph Müller-Pfefferkorn; Mathias Puetz; Petra Aumann; Olaf Knobloch; Jörg Hunger; Carsten Zscherp

The objective of the German BMBF research project Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures (HICFD) is to develop new methods and tools for the analysis and optimization of the performance of parallel computational fluid dynamics (CFD) codes on high performance computer systems with many-core processors. In the work packages of the project it is investigated how the performance of parallel CFD codes written in C can be increased by the optimal use of all parallelism levels. On the highest level Message Passing Interface (MPI) is utilized. Furthermore, on the level of the many-core architecture, highly scaling, hybrid OpenMP/MPI methods are implemented. On the level of the processor cores the parallel Single Instruction Multiple Data (SIMD) units provided by modern CPUs are exploited.

ieee international conference on high performance computing data and analytics | 2004

Parallel generalized finite element method for magnetic multiparticle problems

Achim Basermann; Igor Tsukerman

A parallel version of the Generalized Finite Element Method is applied to multiparticle problems. The main advantage of the method is that only a regular hexahedral grid is needed; the particles do not have to be meshed and are represented by special basis functions approximating the field behavior near the particles. A general-purpose parallel Schur complement solver with incomplete LU preconditioning (A. Basermann) showed excellent performance for the varying problem size, number of processors and number of particles. In fact, the scaling of the computational time with respect to the number of processors was slightly superlinear due to cache effects. Future research plans include parallel implementation of the new Flexible Local Approximation MEthod (FLAME) that incorporates desirable local approximating functions (e.g. dipole harmonics near particles) into the difference scheme.

Explore More