Patrick R. Amestoy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick R. Amestoy is active.

Explore More

Publication

Featured researches published by Patrick R. Amestoy.

SIAM Journal on Matrix Analysis and Applications | 2001

A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

Patrick R. Amestoy; Iain S. Duff; Jean-Yves L'Excellent; Jacko Koster

In this paper, we analyze the main features and discuss the tuning of the algorithms for the direct solution of sparse linear systems on distributed memory computers developed in the context of a long term European research project. The algorithms use a multifrontal approach and are especially designed to cover a large class of problems. The problems can be symmetric positive definite, general symmetric, or unsymmetric matrices, both possibly rank deficient, and they can be provided by the user in several formats. The algorithms achieve high performance by exploiting parallelism coming from the sparsity in the problem and that available for dense matrices. The algorithms use a dynamic distributed task scheduling technique to accommodate numerical pivoting and to allow the migration of computational tasks to lightly loaded processors. Large computational tasks are divided into subtasks to enhance parallelism. Asynchronous communication is used throughout the solution process to efficiently overlap communication with computation. We illustrate our design choices by experimental results obtained on an SGI Origin 2000 and an IBM SP2 for test matrices provided by industrial partners in the PARASOL project.

parallel computing | 2006

Hybrid scheduling for the parallel solution of linear systems

Patrick R. Amestoy; Abdou Guermouche; Jean-Yves L'Excellent; Stéphane Pralet

We consider the problem of designing a dynamic scheduling strategy that takes into account both workload and memory information in the context of the parallel multifrontal factorization. The originality of our approach is that we base our estimations (work and memory) on a static optimistic scenario during the analysis phase. This scenario is then used during the factorization phase to constrain the dynamic decisions that compute fully irregular partitions in order to better balance the workload. We show that our new scheduling algorithm significantly improves both the memory behaviour and the factorization time. We give experimental results for large challenging real-life 3D problems on 64 and 128 processors.

SIAM Journal on Matrix Analysis and Applications | 1996

An Approximate Minimum Degree Ordering Algorithm

Patrick R. Amestoy; Timothy A. Davis; Iain S. Duff

An approximate minimum degree (AMD) ordering algorithm for preordering a symmetric sparse matrix prior to numerical factorization is presented. We use techniques based on the quotient graph for matrix factorization that allow us to obtain computationally cheap bounds for the minimum degree. We show that these bounds are often equal to the actual degree. The resulting algorithm is typically much faster than previous minimum degree ordering algorithms and produces results that are comparable in quality with the best orderings from other minimum degree algorithms.

ACM Transactions on Mathematical Software | 2004

Algorithm 837: AMD, an approximate minimum degree ordering algorithm

Patrick R. Amestoy; Timothy A. Davis; Iain S. Duff

AMD is a set of routines that implements the approximate minimum degree ordering algorithm to permute sparse matrices prior to numerical factorization. There are versions written in both C and Fortran 77. A MATLAB interface is included.

Geophysics | 2007

3D finite-difference frequency-domain modeling of visco-acoustic wave propagation using a massively parallel direct solver: A feasibility study

Stéphane Operto; Jean Virieux; Patrick R. Amestoy; Jean-Yves L’excellent; Luc Giraud; Hafedh Ben Hadj Ali

We present a finite-difference frequency-domain method for 3D visco-acoustic wave propagation modeling. In the frequency domain, the underlying numerical problem is the resolution of a large sparse system of linear equations whose right-hand side term is the source. This system is solved with a massively parallel direct solver. We first present an optimal 3D finite-difference stencil for frequency-domain modeling. The method is based on a parsimonious staggered-grid method. Differential operators are discretized with second-order accurate staggered-grid stencils on different rotated coordinate systems to mitigate numerical anisotropy. An antilumped mass strategy is implemented to minimize numerical dispersion. The stencil incorporates 27 grid points and spans two grid intervals. Dispersion analysis shows that four grid points per wavelength provide accurate simulations in the 3D domain. To assess the feasibility of the method for frequency-domain full-waveform inversion, we computed simulations in the 3D SEG/EAGE overthrust model for frequencies 5, 7, and 10 Hz. Results confirm the huge memory requirement of the factorization (several hundred Figabytes) but also the CPU efficiency of the resolution phase (few seconds per shot). Heuristic scalability analysis suggests that the memory complexity of the factorization is O(35N(4)) for a N-3 grid. Our method may provide a suitable tool to perform frequency-domain full-waveform inversion using a large distributed-memory platform. Further investigation is still necessary to assess more quantitatively the respective merits and drawbacks of time- and frequency-domain modeling of wave propagation to perform 3D full-waveform inversion.

parallel computing | 2000

MUMPS: A General Purpose Distributed Memory Sparse Solver

Patrick R. Amestoy; Iain S. Duff; Jean-Yves L'Excellent; Jacko Koster

MUMPS is a public domain software package for the multifrontal solution of large sparse linear systems on distributed memory computers. The matrices can be symmetric positive definite, general symmetric, or unsymmetric, and possibly rank deficient. MUMPS exploits parallelism coming from the sparsity in the matrix and parallelism available for dense matrices. Additionally, large computational tasks are divided into smaller subtasks to enhance parallelism. MUMPS uses a distributed dynamic scheduling technique that allows numerical pivoting and the migration of computational tasks to lightly loaded processors. Asynchronous communication is used to overlap communication with computation. In this paper, we report on recently integrated features and illustrate the present performance of the solver on an SGI Origin 2000 and a CRAY T3E.

ieee international conference on high performance computing data and analytics | 1989

Vectorization of a Multiprocessor Multifrontal Code

Patrick R. Amestoy; lain S. Duff

We describe design changes that enhance the vectoriza tion of a multiprocessor version of a multifrontal code for the direct solution of large sparse sets of linear equations. These changes employ techniques used with success in full Gaussian elimination and are based on the use of matrix-vector and matrix-matrix kernels as implemented in the Level 2 and Level 3 BLAS. We illus trate the performance of the improved code by runs on the IBM 3090/VF, the ETA-10P, and the CRAY-2. Al though our experiments are principally on a single pro cessor of these machines, we briefly consider the influ ence of multiprocessing. Speedup factors of more than 11 are obtained, and the modified code performs at over 200 MFLOPS on standard structures problems on one processor of the CRAY-2.

SIAM Journal on Scientific Computing | 2015

Improving multifrontal methods by means of block low-rank representations

Patrick R. Amestoy; Cleve Ashcraft; Olivier Boiteau; Alfredo Buttari; Jean-Yves L'Excellent; Clement Weisbecker

Matrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense algebra operations. Several strategies have been proposed to exploit this property. We propose a low-rank format called Block Low-Rank (BLR), and explain how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method. We present experimental results that show how the BLR format delivers gains that are comparable to those obtained with hierarchical formats such as Hierarchical matrices (H matrices) and Hierarchically Semi-Separable (HSS matrices) but provides much greater flexibility and ease of use which are essential in the context of a general purpose, algebraic solver.

SIAM Journal on Matrix Analysis and Applications | 2002

An Unsymmetrized Multifrontal LU Factorization

Patrick R. Amestoy; Chiara Puglisi

A well-known approach to compute the LU factorization of a general unsymmetric matrix A is to build the elimination tree associated with the pattern of the symmetric matrix A + A rm T and use it as a computational graph to drive the numerical factorization. This approach, although very efficient on a large range of unsymmetric matrices, does not capture the unsymmetric structure of the matrices. We introduce a new algorithm which detects and exploits the structural unsymmetry of the submatrices involved during the process of the elimination tree. We show that with the new algorithm significant gains both in memory and in time to perform the factorization can be obtained.

Computers & Geosciences | 2009

FWT2D: A massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data-Part 1

Florent Sourbier; Stéphane Operto; Jean Virieux; Patrick R. Amestoy; Jean-Yves L'Excellent

This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.

Explore More