Pieter Ghysels | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pieter Ghysels is active.

Explore More

Publication

Featured researches published by Pieter Ghysels.

SIAM Journal on Scientific Computing | 2013

Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines

Pieter Ghysels; Thomas J. Ashby; Karl Meerbergen; Wim Vanroose

In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of nonblocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is used only one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebysh...

parallel computing | 2014

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

Pieter Ghysels; Wim Vanroose

Scalability of Krylov subspace methods suffers from costly global synchronization steps that arise in dot-products and norm calculations on parallel machines. In this work, a modified preconditioned Conjugate Gradient (CG) method is presented that removes the costly global synchronization steps from the standard CG algorithm by only performing a single non-blocking reduction per iteration. This global communication phase can be overlapped by the matrix-vector product, which typically only requires local communication. The resulting algorithm will be referred to as pipelined CG. An alternative pipelined method, mathematically equivalent to the Conjugate Residual (CR) method that makes different trade-offs with regard to scalability and serial runtime is also considered. These methods are compared to a recently proposed asynchronous CG algorithm by Gropp. Extensive numerical experiments demonstrate the numerical stability of the methods. Moreover, it is shown that hiding the global synchronization step improves scalability on distributed memory machines using the message passing paradigm and leads to significant speedups compared to standard preconditioned CG.

Physical Biology | 2010

A particle-based model to simulate the micromechanics of single-plant parenchyma cells and aggregates

P. Van Liedekerke; Pieter Ghysels; Engelbert Tijskens; Giovanni Samaey; B Smeedts; Dirk Roose; Herman Ramon

This paper is concerned with addressing how plant tissue mechanics is related to the micromechanics of cells. To this end, we propose a mesh-free particle method to simulate the mechanics of both individual plant cells (parenchyma) and cell aggregates in response to external stresses. The model considers two important features in the plant cell: (1) the cell protoplasm, the interior liquid phase inducing hydrodynamic phenomena, and (2) the cell wall material, a viscoelastic solid material that contains the protoplasm. In this particle framework, the cell fluid is modeled by smoothed particle hydrodynamics (SPH), a mesh-free method typically used to address problems with gas and fluid dynamics. In the solid phase (cell wall) on the other hand, the particles are connected by pairwise interactions holding them together and preventing the fluid to penetrate the cell wall. The cell wall hydraulic conductivity (permeability) is built in as well through the SPH formulation. Although this model is also meant to be able to deal with dynamic and even violent situations (leading to cell wall rupture or cell-cell debonding), we have concentrated on quasi-static conditions. The results of single-cell compression simulations show that the conclusions found by analytical models and experiments can be reproduced at least qualitatively. Relaxation tests revealed that plant cells have short relaxation times (1 micros-10 micros) compared to mammalian cells. Simulations performed on cell aggregates indicated an influence of the cellular organization to the tissue response, as was also observed in experiments done on tissues with a similar structure.

Physical Biology | 2009

Multi-scale simulation of plant tissue deformation using a model for individual cell mechanics

Pieter Ghysels; Giovanni Samaey; B. Tijskens; P. Van Liedekerke; Herman Ramon; Dirk Roose

We present a micro-macro method for the simulation of large elastic deformations of plant tissue. At the microscopic level, we use a mass-spring model to describe the geometrical structure and basic properties of individual plant cells. The macroscopic domain is discretized using standard finite elements, in which the macroscopic material properties (the stress-strain relation) are not given in analytical form, but are computed using the microscopic model in small subdomains, called representative volume elements (RVEs), centered around the macroscopic quadrature points. The boundary conditions for these RVEs are derived from the macroscopic deformation gradient. The computation of the macroscopic stress tensor is based on the definition of virial stress, as defined in molecular dynamics. The anisotropic Eulerian elasticity tensor is estimated using a forward finite difference approximation for the Truesdell rate of the Cauchy stress tensor. We investigate the influence of the size of the RVE and the boundary conditions. This multi-scale method converges to the solution of the full microscopic simulation, for both globally and adaptively refined finite element meshes, and achieves a significant speedup compared to the full microscopic simulation.

ACM Transactions on Mathematical Software | 2016

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

François-Henry Rouet; Xiaoye S. Li; Pieter Ghysels; Artem Napov

We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. This work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.

Soft Matter | 2011

Mechanisms of soft cellular tissue bruising. A particle based simulation approach

Paul Van Liedekerke; Pieter Ghysels; Engelbert Tijskens; Giovanni Samaey; Dirk Roose; Herman Ramon

This paper is concerned with modeling the mechanical behavior of cellular tissue in response to dynamic stimuli. The objective is to investigate the formation of bruises and other damage in tissue under excessive loading. We propose a particle based model to numerically study cells and aggregates of cells described on to subcellular detail. The model focuses on a parenchyma cell type in which two important features are present: the cells interior liquid-like phase inducing hydrodynamic phenomena; and the cell wall, a viscoelastic-plastic solid membrane that encloses the protoplast. The cell fluid is modeled by a Smoothed Particle Hydrodynamics (SPH) technique, while for the cell wall and cell adhesion a nonlinear discrete element model is proposed. Failure in the system is addressed to either cell wall rupture or to debonding of the middle lamella. We show that the model is able to reproduce experimental data of quasistatic compression, and investigate the role of the protoplasm viscosity and the cellular structure on the dynamics of the aggregate system. This indicates that a high viscosity causes better guidance of mechanical stresses through the tissue and can result in a higher penetration of damage, whereas low values will cause more local bruising effects.

SIAM Journal on Scientific Computing | 2016

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

Pieter Ghysels; Xiaoye S. Li; Francois Henry Rouet; Samuel Williams; Artem Napov

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to sevenfold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including ...

Numerical Linear Algebra With Applications | 2012

Improving the arithmetic intensity of multigrid with the help of polynomial smoothers

Pieter Ghysels; Przemyslaw Klosiewicz; Wim Vanroose

SUMMARY The basic building blocks of a classic multigrid algorithm, which are essentially stencil computations, all have a low ratio of executed floating point operations per byte fetched from memory. This important ratio can be identified as the arithmetic intensity. Applications with a low arithmetic intensity are typically bounded by memory traffic and achieve only a small percentage of the theoretical peak performance of the underlying hardware. We propose a polynomial Chebyshev smoother, which we implement using cache-aware tiling, to increase the arithmetic intensity of a multigrid V-cycle. This tiling approach involves a trade-off between redundant computations and cache misses. Unlike common conception, we observe optimal performance for higher degrees of the smoother. The higher-degree polynomial Chebyshev smoother can be used to smooth more than just the upper half of the error frequencies, leading to better V-cycle convergence rates. Smoothing more than the upper half of the error spectrum allows a more aggressive coarsening approach where some levels in the multigrid hierarchy are skipped. Copyright

SIAM Journal on Scientific Computing | 2015

Modeling the Performance of Geometric Multigrid Stencils on Multicore Computer Architectures

Pieter Ghysels; Wim Vanroose

The basic building blocks of the classic geometric multigrid algorithm all have a low ratio of executed floating point operations per byte fetched from memory. On modern computer architectures, such computational kernels are typically bound by memory traffic and achieve only a small percentage of the theoretical peak floating point performance of the underlying hardware. We suggest the use of state-of-the-art (stencil) compiler techniques to improve the flop per byte ratio, also called the arithmetic intensity, of the steps in the algorithm. Our focus will be on the smoother which is a repeated stencil application. With a tiling approach based on the polyhedral loop optimization framework, data reuse in the smoother can be improved, leading to a higher effective arithmetic intensity. For an academic constant coefficient Poisson problem, we present a performance model for the multigrid

international conference on algorithms and architectures for parallel processing | 2012