Gundolf Haase
University of Graz
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gundolf Haase.
IEEE Transactions on Biomedical Engineering | 2007
Gernot Plank; Manfred Liebmann; R.W. dos Santos; Edward J. Vigmond; Gundolf Haase
The bidomain equations are considered to be one of the most complete descriptions of the electrical activity in cardiac tissue, but large scale simulations, as resulting from discretization of an entire heart, remain a computational challenge due to the elliptic portion of the problem, the part associated with solving the extracellular potential. In such cases, the use of iterative solvers and parallel computing environments are mandatory to make parameter studies feasible. The preconditioned conjugate gradient (PCG) method is a standard choice for this problem. Although robust, its efficiency greatly depends on the choice of preconditioner. On structured grids, it has been demonstrated that a geometric multigrid preconditioner performs significantly better than an incomplete LU (ILU) preconditioner. However, unstructured grids are often preferred to better represent organ boundaries and allow for coarser discretization in the bath far from cardiac surfaces. Under these circumstances, algebraic multigrid (AMG) methods are advantageous since they compute coarser levels directly from the system matrix itself, thus avoiding the complexity of explicitly generating coarser, geometric grids. In this paper, the performance of an AMG preconditioner (BoomerAMG) is compared with that of the standard ILU preconditioner and a direct solver. BoomerAMG is used in two different ways, as a preconditioner and as a standalone solver. Two 3-D simulation examples modeling the induction of arrhythmias in rabbit ventricles were used to measure performance in both sequential and parallel simulations. It is shown that the AMG preconditioner is very well suited for the solution of the bidomain equation, being clearly superior to ILU preconditioning in all regards, with speedups by factors in the range 5.9-7.7
Computing | 1991
Gundolf Haase; Ulrich Langer; Arnd Meyer
We present a new approach to the construction of Domain Decomposition (DD) preconditioners for the conjugate gradient method applied to the solution of symmetric and positive definite finite element equations. The DD technique is based on a non-overlapping decomposition of the domain Ω intop subdomains connected later with thep processors of a MIMD computer. The DD preconditioner derived contains three block matrices which must be specified for the specific problem considered. One of the matrices is used for the transformation of the nodal finite element basis into the approximate discrete harmonic basis. The other two matrices are block preconditioners for the Dirichlet problems arising on the subdomains and for a modified Schur complement defined over all nodes on the coupling boundaries between the subdomains. The relative spectral condition number is estimated. Relations to the additive Schwarz method are discussed. In the second part of this paper, we will apply the results of this paper to two-dimensional, symmetric, second-order, elliptic boundary value problems and present numerical results performed on a transputer-network.ZusammenfassungIn der vorliegenden Arbeit wird ein neuer Zugang zur Konstruktion von Vorkonditionierungsoperatoren auf der Basis von Gebietsdekompositionstechniken (DD Techniken) beschrieben. Anwendungen finden diese DD Vorkonditionierungen im Verfahren der konjugierten Gradienten zur iterativen Lösung von symmetrischen und positiv definiten Finiten-Elemente Gleichungen. Die DD Technik basiert auf einer Zerlegung des Gebietes Ω inp sich nicht überlappende Teilgebiete, die später denp Prozessoren eines MIMD Rechners zugeordnet sind. Die DD Vorkonditionierung enthält drei Blockmatrizen, die für ein konkretes Anwendungsproblem jeweils zu spezifizieren sind. Eine dieser Matrizen wird genutzt, um die Knotenbasis in eine näherungsweise diskret harmonische Basis zu transformieren. Die anderen beiden Matrizen können als Blockvorkonditionierungen für die in jedem Teilgebiet entstehenden Dirichlet-Probleme und für ein modifiziertes Schurkomplement auf den Knoten der Koppelränder zwischen den Teilgebieten interpretiert werden. Die relative spektrale Konditionszahl wird abgeschätzt. Eine direkte Verbindung der vorgeschlagenen DD Vorkonditionierung zu einer Additiven Schwarzschen Methode kann gezeigt werden. Im zweiten Teil dieser Artikelserie werden die Resultate dieser Arbeit auf ebene, symmetrische Randwertprobleme für partielle Differentialgleichungen zweiter Ordnung angewandt und die numerischen Resultate, die auf einem Transputer-Hypercube erzeugt wurden, diskutiert.
ieee international conference on high performance computing data and analytics | 2009
Gundolf Haase; Manfred Liebmann; Craig C. Douglas; Gernot Plank
The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core.
Archive | 2003
Craig C. Douglas; Gundolf Haase; Ulrich Langer
From the Publisher: This compact yet thorough tutorial is the perfect introduction to the basic concepts of solving partial differential equations (PDEs) using parallel numerical methods. In just eight short chapters, the authors provide readers with enough basic knowledge of PDEs, discretization methods, solution techniques, parallel computers, parallel programming, and the run-time behavior of parallel algorithms to allow them to understand, develop, and implement parallel PDE solvers. Examples throughout the book are intentionally kept simple so that the parallelization strategies are not dominated by technical details. A Tutorial on Elliptic PDE Solvers and Their Parallelization is a valuable aid for learning about the possible errors and bottlenecks in parallel computing. One of the highlights of the tutorial is that the course material can run on a laptop, not just on a parallel computer or cluster of PCs, thus allowing readers to experience their first successes in parallel computing in a relatively short amount of time. This tutorial is intended for advanced undergraduate and graduate students in computational sciences and engineering; however, it may also be helpful to professionals who use PDE-based parallel computer simulations in the field.
international conference on high performance computing and simulation | 2009
Ronan M. Amorim; Gundolf Haase; Manfred Liebmann; Rodrigo Weber dos Santos
The use of the GPU as a general purpose processor is becoming more popular and there are different approaches for this kind of programming. In this paper we present a comparison between different implementations of the OpenGL and CUDA approaches for solving our test case, a weighted Jacobi iteration with a structured matrix originating from a finite element discretization of the elliptic PDE part of the cardiac bidomain equations. The CUDA approach using textures showed to be the fastest with a speedup of 31 over a CPU implementation using one core and SSE. CUDA showed to be an efficient and easy way of programming GPU for general purpose problems, though it is also easier to write inefficient codes.
IEEE Transactions on Biomedical Engineering | 2012
Aurel Neic; Manfred Liebmann; Elena Hoetzl; Lawrence Mitchell; Edward J. Vigmond; Gundolf Haase; Gernot Plank
Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
SIAM Journal on Scientific Computing | 2002
Gundolf Haase; Michael Kuhn; Stefan Reitzinger
Algebraic multigrid methods are well suited as preconditioners for iterative solvers. We consider linear systems of equations which are sparse and symmetric positive definite and which stem from a finite element discretization of a second order self-adjoint elliptic partial differential equation or a system of them. Since preconditioners based on algebraic multigrid are very efficient, additional speedup can only be achieved by parallelization. In this paper, we propose a general parallel algebraic multigrid algorithm for finite element discretizations based on domain decomposition ideas which is well suited for distributed memory computers. This paper pays special attention to the coarsening strategy which has to be adapted in the parallel case. Moreover, a general framework of data distribution gives rise to a construction scheme for the prolongation operators. Results of numerical studies on parallel computers with distributed memory are presented which show the high efficiency of the approach.
International Journal of Parallel, Emergent and Distributed Systems | 2007
Gundolf Haase; Manfred Liebmann; Gernot Plank
We investigate a new storage format for unstructured sparse matrices based on the space-filling Hilbert curve. Numerical tests with matrix-vector multiplication show the potential of the fractal storage (FS) format in comparison to the traditional compressed row storage (CRS) format. The FS format outperforms the CRS format by up to 50% for matrix-vector multiplications with multiple right hand sides.
parallel computing | 1998
Gundolf Haase
Abstract The paper analyses various parallel incomplete factorizations based on the non-overlapping domain decomposition. The general framework is applied to the investigation of the preconditioning step in cg-like methods. Under certain conditions imposed on the finite element mesh, all matrix and vector types given by the special data distribution can be used in the matrix-by-vector multiplications. Not only the well-known domain decomposition preconditioners fit into the concept but also parallelized global incomplete factorizations are feasible. Additionally, those global incomplete factorizations can be used as smoothers in parallel multigrid methods. Numerical results on a parallel machine with distributed memory are presented.
Computing | 1991
Gundolf Haase; Ulrich Langer; Arnd Meyer
In the first part of this article series, we had derived Domain Decomposition (DD) preconditioners containing three block matrices which must be specified for specific applications. In the present paper, we consider finite element equations arising from the DD discretization of plane, symmetric, 2nd-order, elliptic b.v.p.s and specify the matrices involved in the preconditioner via multigrid and hierarchical techniques. The resulting DD-PCCG methods are asymptotically almost optimal with respect to the operation count and well suited for parallel computations on MIMD computers with local memory and message passing. The numerical experiments performed on a transputer hypercube confirm the efficiency of the DD preconditioners proposed.ZusammenfassungIm ersten Teil dieser Artikelserie haben wir auf Basis von Gebietsdekompositionstechniken (DD Techniken) Vorkonditionierungsoperatoren konstruiert. Diese DD Vorkonditionierungen enthalten drei Blockmatrizen, die für spezifische Anwendungsfälle zu konkretisieren sind. In der vorliegenden Arbeit betrachten wir Finite-Elemente-Gleichungen, die bei der DD Diskretisierung von ebenen, symmetrischen, elliptischen Randwertproblemen für partielle Differentialgleichungen zweiter Ordnung entstehen. Zur Definition der oben genannten Blockmatrizen werden Mehrgitter-und hierarchische Techniken herangezogen. Die entstehenden DD-PCCCG Verfahren sind bezüglich des arithmetischen Aufwands asymptotisch fast optimal und bestens zur Parallelrechnung auf MIMD-Computern mit lokalem Speicher und Botschaftenaustausch geeignet. Die auf einem Transputer-Hypercube durchgeführten numerischen Experimente belegen nachhaltig die Effektivität der vorgeschlagenen DD Vorkonditionierungen.