Wayne Joubert | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wayne Joubert is active.

Explore More

Publication

Featured researches published by Wayne Joubert.

Numerical Linear Algebra With Applications | 1994

On the convergence behavior of the restarted GMRES algorithm for solving nonsymmetric linear systems

Wayne Joubert

The solution of nonsymmetric systems of linear equations continues to be a difficult problem. A main algorithm for solving nonsymmetric problems is restarted GMRES. The algorithm is based on restarting full GMRES every s iterations, for some integer s>0. This paper considers the impact of the restart frequency s on the convergence and work requirements of the method. It is shown that a good choice of this parameter can lead to reduced solution time, while an improper choice may hinder or preclude convergence. An adaptive procedure is also presented for determining automatically when to restart. The results of numerical experiments are presented.

SIAM Journal on Scientific Computing | 1994

A robust GMRES-based adaptive polynomial preconditioning algorithm for nonsymmetric linear systems

Wayne Joubert

In this study a hybrid generalized minimal residual (GMRES)/polynomial preconditioning algorithm for solving nonsymmetric systems of linear equations is defined. The algorithm uses the results from cycles of restarted GMRES to form an effective polynomial preconditioner, typically resulting in decreased work requirements. The algorithm has the advantage over other hybrid algorithms in that its convergence behavior is well understood: the new algorithm converges for all starting vectors if and only if restarted GMRES converges. The results of numerical experiments with the algorithm are presented.

SIAM Journal on Scientific Computing | 1996

ODE recursions and iterative solvers for linear equations

Alfred A. Lorber; Graham F. Carey; Wayne Joubert

Timestepping to a steady-state solution is increasingly applied in engineering and scientific applications as a means for solving equilibrium problems. In the present work we examine the relation between the recursion in timestepping algorithms for semidiscrete systems of ODEs and certain types of iterative methods for solving discretized systems of equilibrium PDEs. We consider, in particular, the possibility of accelerating the ODE approach using recursions that are not time accurate together with parameter selection based on the theory of iterative methods. As one example, we take the parameters arising from the Chebyshev-type iterative methods and use them in a two-stage Runge–Kutta scheme. A comparison study for a representative steady-state diffusion problem indicates a dramatic improvement in convergence and efficiency. We remark that this approach can be trivially incorporated into existing time-integration codes to significant advantage. This yields a hybrid adaptive approach in a single code.

ieee international conference on high performance computing data and analytics | 1994

PCG: a software package for the iterative solution of linear systems on scalar, vector and parallel computers

Wayne Joubert; Graham F. Carey

The PCG package is a software system for solving systems of linear equations by means of preconditioned conjugate gradient (PCG)-type iterative methods on a variety of computer architectures. The software is designed to give high performance with a nearly identical user interface across different scalar, vector and parallel platforms, as well as across different programming models, such as shared-memory, data-parallel and message-passing programming interfaces. The basic features of the package are discussed, as well as techniques used to attain portability and high performance. Representative results for several computers are given.<<ETX>>

ieee international conference on high performance computing data and analytics | 2016

Towards achieving performance portability using directives for accelerators

M. Graham Lopez; Verónica G. Vergara Larrea; Wayne Joubert; Oscar R. Hernandez; Azzam Haidar; Stanimire Tomov; Jack J. Dongarra

In this paper we explore the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, how much tuning might be required and what lessons we can learn from this experience. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. We implement these kernels using various methods provided by newer OpenACC and OpenMP implementations, and we evaluate their performance on various platforms including both X86_64 with attached NVIDIA GPUs, self-hosted Intel Xeon Phi KNL, as well as an X86_64 host system with Intel Xeon Phi coprocessors. In this paper, we explain what factors affected the performance portability such as how to pick the right programming model, its programming style, its availability on different platforms, and how well compilers can optimize and target to multiple platforms.

Numerical Linear Algebra With Applications | 1994

Improved SSOR and incomplete Cholesky solution of linear equations on shared memory and distributed memory parallel computers

Wayne Joubert; Thomas C. Oppe

In this paper two new implementations of SSOR and incomplete factorization preconditioners are given, for shared memory and distributed memory parallel computers respectively. These new implementations give increased solution speeds for matrix problems such as those arising from discretized partial differential equations with natural ordering of the grid points, for which it is well-known that the standard implementation of these preconditioners is difficult to parallelize effectively. For shared memory machines, a new technique is presented here which decreases the number of synchronization points in each preconditioning step and thus allows better parallel speedups. For distributed memory machines, an implementation based on block cyclic reduction is given which circumvents the problem of idle processors during the preconditioning phase. Descriptions of the implementations are given, and numerical comparisons are given for a model diffusion problem on the Cray Y-MP and the CM-2 Connection Machine.

ieee international conference on high performance computing data and analytics | 2017

Evaluation of Directive-based Performance Portable Programming Models

M. Graham Lopez; Verónica G. Vergara Larrea; Wayne Joubert; Oscar R. Hernandez; Azzam Haidar; Stanimire Tomov; Jack J. Dongarra

We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. We implement the kernels of interest using various methods provided by newer OpenACC and OpenMP implementations, and we evaluate their performance on various platforms including both x86\_64 and Power8 with attached NVIDIA GPUs, X86\_64 multicores, self-hosted Intel Xeon Phi KNL, as well as an X86\_64 host system with Intel Xeon Phi coprocessors. Furthermore, we present in detail what factors affected the performance portability, including how to pick the right programming model, its programming style, its availability on different platforms, and how well compilers can optimise and target multiple platforms.

annual simulation symposium | 1997