Gundolf Haase | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gundolf Haase is active.

Explore More

Publication

Featured researches published by Gundolf Haase.

IEEE Transactions on Biomedical Engineering | 2007

Algebraic Multigrid Preconditioner for the Cardiac Bidomain Model

Gernot Plank; Manfred Liebmann; R.W. dos Santos; Edward J. Vigmond; Gundolf Haase

The bidomain equations are considered to be one of the most complete descriptions of the electrical activity in cardiac tissue, but large scale simulations, as resulting from discretization of an entire heart, remain a computational challenge due to the elliptic portion of the problem, the part associated with solving the extracellular potential. In such cases, the use of iterative solvers and parallel computing environments are mandatory to make parameter studies feasible. The preconditioned conjugate gradient (PCG) method is a standard choice for this problem. Although robust, its efficiency greatly depends on the choice of preconditioner. On structured grids, it has been demonstrated that a geometric multigrid preconditioner performs significantly better than an incomplete LU (ILU) preconditioner. However, unstructured grids are often preferred to better represent organ boundaries and allow for coarser discretization in the bath far from cardiac surfaces. Under these circumstances, algebraic multigrid (AMG) methods are advantageous since they compute coarser levels directly from the system matrix itself, thus avoiding the complexity of explicitly generating coarser, geometric grids. In this paper, the performance of an AMG preconditioner (BoomerAMG) is compared with that of the standard ILU preconditioner and a direct solver. BoomerAMG is used in two different ways, as a preconditioner and as a standalone solver. Two 3-D simulation examples modeling the induction of arrhythmias in rabbit ventricles were used to measure performance in both sequential and parallel simulations. It is shown that the AMG preconditioner is very well suited for the solution of the bidomain equation, being clearly superior to ILU preconditioning in all regards, with speedups by factors in the range 5.9-7.7

Computing | 1991

The approximate Dirichlet domain decomposition method. Part I: an algebraic approach

Gundolf Haase; Ulrich Langer; Arnd Meyer

We present a new approach to the construction of Domain Decomposition (DD) preconditioners for the conjugate gradient method applied to the solution of symmetric and positive definite finite element equations. The DD technique is based on a non-overlapping decomposition of the domain Ω intop subdomains connected later with thep processors of a MIMD computer. The DD preconditioner derived contains three block matrices which must be specified for the specific problem considered. One of the matrices is used for the transformation of the nodal finite element basis into the approximate discrete harmonic basis. The other two matrices are block preconditioners for the Dirichlet problems arising on the subdomains and for a modified Schur complement defined over all nodes on the coupling boundaries between the subdomains. The relative spectral condition number is estimated. Relations to the additive Schwarz method are discussed. In the second part of this paper, we will apply the results of this paper to two-dimensional, symmetric, second-order, elliptic boundary value problems and present numerical results performed on a transputer-network.ZusammenfassungIn der vorliegenden Arbeit wird ein neuer Zugang zur Konstruktion von Vorkonditionierungsoperatoren auf der Basis von Gebietsdekompositionstechniken (DD Techniken) beschrieben. Anwendungen finden diese DD Vorkonditionierungen im Verfahren der konjugierten Gradienten zur iterativen Lösung von symmetrischen und positiv definiten Finiten-Elemente Gleichungen. Die DD Technik basiert auf einer Zerlegung des Gebietes Ω inp sich nicht überlappende Teilgebiete, die später denp Prozessoren eines MIMD Rechners zugeordnet sind. Die DD Vorkonditionierung enthält drei Blockmatrizen, die für ein konkretes Anwendungsproblem jeweils zu spezifizieren sind. Eine dieser Matrizen wird genutzt, um die Knotenbasis in eine näherungsweise diskret harmonische Basis zu transformieren. Die anderen beiden Matrizen können als Blockvorkonditionierungen für die in jedem Teilgebiet entstehenden Dirichlet-Probleme und für ein modifiziertes Schurkomplement auf den Knoten der Koppelränder zwischen den Teilgebieten interpretiert werden. Die relative spektrale Konditionszahl wird abgeschätzt. Eine direkte Verbindung der vorgeschlagenen DD Vorkonditionierung zu einer Additiven Schwarzschen Methode kann gezeigt werden. Im zweiten Teil dieser Artikelserie werden die Resultate dieser Arbeit auf ebene, symmetrische Randwertprobleme für partielle Differentialgleichungen zweiter Ordnung angewandt und die numerischen Resultate, die auf einem Transputer-Hypercube erzeugt wurden, diskutiert.

ieee international conference on high performance computing data and analytics | 2009

A parallel algebraic multigrid solver on graphics processing units

Gundolf Haase; Manfred Liebmann; Craig C. Douglas; Gernot Plank

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core.

Archive | 2003

Tutorial on Elliptic PDE Solvers and Their Parallelization

Craig C. Douglas; Gundolf Haase; Ulrich Langer

From the Publisher: This compact yet thorough tutorial is the perfect introduction to the basic concepts of solving partial differential equations (PDEs) using parallel numerical methods. In just eight short chapters, the authors provide readers with enough basic knowledge of PDEs, discretization methods, solution techniques, parallel computers, parallel programming, and the run-time behavior of parallel algorithms to allow them to understand, develop, and implement parallel PDE solvers. Examples throughout the book are intentionally kept simple so that the parallelization strategies are not dominated by technical details. A Tutorial on Elliptic PDE Solvers and Their Parallelization is a valuable aid for learning about the possible errors and bottlenecks in parallel computing. One of the highlights of the tutorial is that the course material can run on a laptop, not just on a parallel computer or cluster of PCs, thus allowing readers to experience their first successes in parallel computing in a relatively short amount of time. This tutorial is intended for advanced undergraduate and graduate students in computational sciences and engineering; however, it may also be helpful to professionals who use PDE-based parallel computer simulations in the field.

international conference on high performance computing and simulation | 2009

Comparing CUDA and OpenGL implementations for a Jacobi iteration

Ronan M. Amorim; Gundolf Haase; Manfred Liebmann; Rodrigo Weber dos Santos

The use of the GPU as a general purpose processor is becoming more popular and there are different approaches for this kind of programming. In this paper we present a comparison between different implementations of the OpenGL and CUDA approaches for solving our test case, a weighted Jacobi iteration with a structured matrix originating from a finite element discretization of the elliptic PDE part of the cardiac bidomain equations. The CUDA approach using textures showed to be the fastest with a speedup of 31 over a CPU implementation using one core and SSE. CUDA showed to be an efficient and easy way of programming GPU for general purpose problems, though it is also easier to write inefficient codes.

IEEE Transactions on Biomedical Engineering | 2012

Accelerating Cardiac Bidomain Simulations Using Graphics Processing Units

Aurel Neic; Manfred Liebmann; Elena Hoetzl; Lawrence Mitchell; Edward J. Vigmond; Gundolf Haase; Gernot Plank

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.

SIAM Journal on Scientific Computing | 2002

Parallel Algebraic Multigrid Methods on Distributed Memory Computers

Gundolf Haase; Michael Kuhn; Stefan Reitzinger

Algebraic multigrid methods are well suited as preconditioners for iterative solvers. We consider linear systems of equations which are sparse and symmetric positive definite and which stem from a finite element discretization of a second order self-adjoint elliptic partial differential equation or a system of them. Since preconditioners based on algebraic multigrid are very efficient, additional speedup can only be achieved by parallelization. In this paper, we propose a general parallel algebraic multigrid algorithm for finite element discretizations based on domain decomposition ideas which is well suited for distributed memory computers. This paper pays special attention to the coarsening strategy which has to be adapted in the parallel case. Moreover, a general framework of data distribution gives rise to a construction scheme for the prolongation operators. Results of numerical studies on parallel computers with distributed memory are presented which show the high efficiency of the approach.

International Journal of Parallel, Emergent and Distributed Systems | 2007

A Hilbert-order multiplication scheme for unstructured sparse matrices

Gundolf Haase; Manfred Liebmann; Gernot Plank

We investigate a new storage format for unstructured sparse matrices based on the space-filling Hilbert curve. Numerical tests with matrix-vector multiplication show the potential of the fractal storage (FS) format in comparison to the traditional compressed row storage (CRS) format. The FS format outperforms the CRS format by up to 50% for matrix-vector multiplications with multiple right hand sides.

parallel computing | 1998

Parallel incomplete Cholesky preconditioners based on the non-overlapping data distribution

Gundolf Haase

Abstract The paper analyses various parallel incomplete factorizations based on the non-overlapping domain decomposition. The general framework is applied to the investigation of the preconditioning step in cg-like methods. Under certain conditions imposed on the finite element mesh, all matrix and vector types given by the special data distribution can be used in the matrix-by-vector multiplications. Not only the well-known domain decomposition preconditioners fit into the concept but also parallelized global incomplete factorizations are feasible. Additionally, those global incomplete factorizations can be used as smoothers in parallel multigrid methods. Numerical results on a parallel machine with distributed memory are presented.

Computing | 1991

The approximate Dirichlet domain decomposition method. Part II: applications to 2nd-order elliptic B.V.P.s

Gundolf Haase; Ulrich Langer; Arnd Meyer

In the first part of this article series, we had derived Domain Decomposition (DD) preconditioners containing three block matrices which must be specified for specific applications. In the present paper, we consider finite element equations arising from the DD discretization of plane, symmetric, 2nd-order, elliptic b.v.p.s and specify the matrices involved in the preconditioner via multigrid and hierarchical techniques. The resulting DD-PCCG methods are asymptotically almost optimal with respect to the operation count and well suited for parallel computations on MIMD computers with local memory and message passing. The numerical experiments performed on a transputer hypercube confirm the efficiency of the DD preconditioners proposed.ZusammenfassungIm ersten Teil dieser Artikelserie haben wir auf Basis von Gebietsdekompositionstechniken (DD Techniken) Vorkonditionierungsoperatoren konstruiert. Diese DD Vorkonditionierungen enthalten drei Blockmatrizen, die für spezifische Anwendungsfälle zu konkretisieren sind. In der vorliegenden Arbeit betrachten wir Finite-Elemente-Gleichungen, die bei der DD Diskretisierung von ebenen, symmetrischen, elliptischen Randwertproblemen für partielle Differentialgleichungen zweiter Ordnung entstehen. Zur Definition der oben genannten Blockmatrizen werden Mehrgitter-und hierarchische Techniken herangezogen. Die entstehenden DD-PCCCG Verfahren sind bezüglich des arithmetischen Aufwands asymptotisch fast optimal und bestens zur Parallelrechnung auf MIMD-Computern mit lokalem Speicher und Botschaftenaustausch geeignet. Die auf einem Transputer-Hypercube durchgeführten numerischen Experimente belegen nachhaltig die Effektivität der vorgeschlagenen DD Vorkonditionierungen.

Explore More