Pierre Ramet | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pierre Ramet is active.

Explore More

Publication

Featured researches published by Pierre Ramet.

parallel computing | 2002

PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

Pascal Hénon; Pierre Ramet; Jean Roman

Solving large sparse symmetric positive definite systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. The block partitioning and scheduling problem for sparse parallel factorization without pivoting is considered. There are two major aims to this study: the scalability of the parallel solver, and the compromise between memory overhead and efficiency. Parallel experiments on a large collection of irregular industrial problems validate our approach.

international parallel and distributed processing symposium | 2000

PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions

Pascal Hénon; Pierre Ramet; Jean Roman

We presen t and analyze a general algorithm which computes an efficient static scheduling of block computations for a parallel L:D:Lt factorization of sparse symmetric positive definite systems based on a combination of 1D and 2D block distributions. Our solver uses a supernodal fan-in approach and is fully driven by this scheduling. We give an overview of the algorithm and present performance results and comparisons with PSPASES on an IBM-SP2 with 120 MHz Pow er2SC nodes for a collection of irregular problems.

international parallel and distributed processing symposium | 2014

Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes

Xavier Lacoste; Mathieu Faverge; George Bosilca; Pierre Ramet; Samuel Thibault

The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.

european conference on parallel processing | 1996

Optimal Grain Size Computation for Pipelined Algorithms

Frédéric Desprez; Pierre Ramet; Jean Roman

In this paper, we present a method for overlapping communications on parallel computers for pipelined algorithms. We first introduce a general theoretical model which leads to a generic computation scheme for the optimal packet size. Then, we use the OPIUM library, which provides an easy-to-use and efficient way to compute, in the general case, this optimal packet size, on the column LU factorization; the implementation and performance measures are made on an Intel Paragon.

parallel computing | 2008

On finding approximate supernodes for an efficient block-ILU(k) factorization

Pascal Hénon; Pierre Ramet; Jean Roman

Among existing preconditioners, the level-of-fill ILU has been quite popular as a general-purpose technique. Experimental observations have shown that, when coupled with block techniques, these methods can be quite effective in solving realistic problems arising from various applications. In this work, we consider an extension of this kind of method which is suitable for parallel environments. Our method is developed from the framework of high performance sparse direct solvers. The main idea we propose is to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. These requirements lead to a robust class of parallel preconditioners based on generalized versions of block ILU techniques.

european conference on parallel processing | 1999

A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization

Pascal Hénon; Pierre Ramet; Jean Roman

We present and analyze a general algorithm which computes efficient static schedulings of block computations for parallel sparse linear factorization. Our solver, based on a supernodal fan-in approach, is fully driven by this scheduling. We give an overview of the algorithms and present performance results on a 16-node IBM-SP2 with 66 MHz Power2 thin nodes for a collection of grid and irregular problems.

ieee international conference on high performance computing data and analytics | 2012

Poster: Matrices over Runtime Systems at Exascale

Emmanuel Agullo; George Bosilca; Bérenger Bramas; Cedric Castagnede; Olivier Coulaud; Eric Darve; Jack J. Dongarra; Mathieu Faverge; Nathalie Furmento; Luc Giraud; Xavier Lacoste; Julien Langou; Hatem Ltaief; Matthias Messner; Raymond Namyst; Pierre Ramet; Toru Takahashik; Samuel Thibault; Stanimire Tomov; Ichitaro Yamazaki

The goal of Matrices Over Runtime Systems at Exascale (MORSE) project is to design dense and sparse linear algebra methods that achieve the fastest possible time to an accurate solution on large-scale multicore systems with GPU accelerators, using all the processing power that future high end systems can make available. In this poster, we propose a framework for describing linear algebra algorithms at a high level of abstraction and delegating the actual execution to a runtime system in order to design software whose performance is portable accross architectures. We illustrate our methodology on three classes of problems: dense linear algebra, sparse direct methods and fast multipole methods. The resulting codes have been incorporated into Magma, Pastix and ScalFMM solvers, respectively.

Numerical Algorithms | 2000

Parallel Sparse Linear Algebra and Application to Structural Mechanics

David Goudin; Pascal Hénon; François Pellegrini; Pierre Ramet; Jean Roman; Jean-Jacques Pesqué

The framework of this paper is the parallelization of a plasticity algorithm that uses an implicit method and an incremental approach. More precisely, we will focus on some specific parallel sparse linear algebra algorithms which are the most time-consuming steps to solve efficiently such an engineering application. First, we present a general algorithm which computes an efficient static scheduling of block computations for parallel sparse linear factorization. The associated solver, based on a supernodal fan-in approach, is fully driven by this scheduling. Second, we describe a scalable parallel assembly algorithm based on a distribution of elements induced by the previous distribution for the blocks of the sparse matrix. We give an overview of these algorithms and present performance results on an IBM SP2 for a collection of grid and irregular problems.

ieee international conference on high performance computing data and analytics | 2008

Toward an international sparse linear algebra expert system by interconnecting the ITBL computational Grid with the Grid-TLSE platform

Noriyuki Kushida; Yoshio Suzuki; Naoya Teshima; Norihiro Nakajima; Yves Caniou; Michel J. Daydé; Pierre Ramet

Complex optimization problems are of high interest for Process Systems Engineering. The selection of the relevant technique for the treatment of a given problem has already been studied for batch plant design issues. Classically, most works reported in the dedicated literature yet considered item sizes as continuous variables. In a view of realism, a similar approach is proposed in this paper, with discrete variables for representing equipment capacities, which leads to a combinatorial problem. For this purpose, a Genetic Algorithm was used, which is multiparametric by nature and a grid approach is perfectly relevant to this case study, since the GA code must be run several times, with different values of some input parameters, to guarantee its stochastic nature. This paper is devoted to the presentation of a grid-oriented GA methodology. Some significant results are highlighted and discussed.In the present paper, the methodology of interoperability between ITBL and Grid-TLSE is described. Grid-TLSE is an expert web site to provides user assistance in choosing the right solver for its problems and appropriate values for the control parameters of the selected solve. The time to solution of linear equation solver strongly depends on the type of problem, the selected algorithm, its implementation and the target computer architecture. Grid-TLSE uses the Diet middleware to distribute computing tasks over the Grid. Therefore, extending the variety of computer architecture by Grid middleware interoperability between Diet and ITBL has a beneficial impact to the expert system. To show the feasibility of the methodology, job transfering program as a special service of Diet was developed.

computational science and engineering | 2008

A Domain Decomposition Method Applied to the Simplified Transport Equations

Maxime Barrault; Bruno Lathuilière; Pierre Ramet; Jean Roman

The simulation of the neutron transport inside a nuclear reactor leads to the computation of the lowest eigen pair of a simplified transport operator. Whereas the sequential solution at our disposal today is really efficient, we are not able to run some industrial cases due to the memory consumption and the computational time. This problem brings us to study parallel strategies. In order to re-use an important part of the solver and to bypass some limitations of conforming cartesian meshes, we propose a non overlapping domain decomposition based on the introduction of Lagrange multipliers. The method performs well on up to 100 processors for an industrial test case.

Explore More