Ernesto Dufrechou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ernesto Dufrechou is active.

Explore More

Publication

Featured researches published by Ernesto Dufrechou.

parallel computing | 2016

Exploiting task and data parallelism in ILUPACK's preconditioned CG solver on NUMA architectures and many-core accelerators

José Ignacio Aliaga; Rosa M. Badia; Maria Barreda; Matthias Bollhöfer; Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí

Specialized implementations of ILUPACKs iterative solver for NUMA platforms.Specialized implementations of ILUPACKs iterative solver for many-core accelerators.Exploitation of task parallelism via OmpSs runtime (dynamic schedule).Exploitation of task parallelism via MPI (static schedule).Exploitation of data parallelism for GPUs. We present specialized implementations of the preconditioned iterative linear system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms and many-core hardware co-processors based on the Intel Xeon Phi and graphics accelerators. For the conventional x86 architectures, our approach exploits task parallelism via the OmpSs runtime as well as a message-passing implementation based on MPI, respectively yielding a dynamic and static schedule of the work to the cores, with different numeric semantics to those of the sequential ILUPACK. For the graphics processor we exploit data parallelism by off-loading the computationally expensive kernels to the accelerator while keeping the numeric semantics of the sequential case.

international conference on computational science and its applications | 2014

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction

Peter Benner; Ernesto Dufrechou; Pablo Ezzatti; Pablo Igounet; Enrique S. Quintana-Ortí; Alfredo Remón

In this paper we present new hybrid CPU-GPU routines to accelerate the solution of linear systems, with band coefficient matrix, by off-loading the major part of the computations to the GPU and leveraging highly tuned implementations of the BLAS for the graphics processor. Our experiments with an nVidia S2070 GPU report speed-ups up to 6× for the hybrid band solver based on the LU factorization over analogous CPU-only routines in Intels MKL. As a practical demonstration of these benefits, we plug the new CPU-GPU codes into a sparse matrix Lyapunov equation solver, showing a 3× acceleration on the solution of a large-scale benchmark arising in model reduction.

european conference on parallel processing | 2016

A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems

José Ignacio Aliaga; Matthias Bollhöfer; Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí

The solution of sparse linear systems of large dimension is a critical step in problems that span a diverse range of applications. For this reason, a number of iterative solvers have been developed, among which ILUPACK integrates an inverse-based multilevel ILU preconditioner with appealing numerical properties. In this paper, we enhance the computational performance of ILUPACK by off-loading the execution of several key computational kernels to a Graphics Processing Unit (GPU). In particular, we target the preconditioned GMRES and BiCG methods for sparse general systems and the preconditioned SQMR method for sparse symmetric indefinite problems in ILUPACK. The evaluation on a NVIDIA Kepler GPU shows a sensible reduction of the execution time, while maintaining the convergence rate and numerical properties of the original ILUPACK solver.

ieee international conference on high performance computing data and analytics | 2016

Design of a Task-Parallel Version of ILUPACK for Graphics Processors

José Ignacio Aliaga; Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí

In many scientific and engineering applications, the solution of large sparse systems of equations is one of the most important stages. For this reason, many libraries have been developed among which ILUPACK stands out due to its efficient inverse-based multilevel preconditioner. Several parallel versions of ILUPACK have been proposed in the past. In particular, two task-parallel versions, for shared and distributed memory platforms, and a GPU accelerated data-parallel variant have been developed to solve symmetric positive definite linear systems. In this work we evaluate the combination of both previously covered approaches. Specifically, we leverage the computational power of one GPU (associated with the data-level parallelism) to accelerate each computation of the multicore (task-parallel) variant of ILUPACK. The performed experimental evaluation shows that our proposal can accelerate the multicore variant when the leaf tasks of the parallel solver offer an acceptable dimension.

ieee international conference on high performance computing data and analytics | 2015

Solving Linear Systems on the Intel Xeon-Phi Accelerator via the Gauss-Huard Algorithm

Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón

The solution of linear systems is a key operation in many scientific and engineering applications. Traditional solvers are based on the LU factorization of the coefficient matrix, and optimized implementations of this method are available in well-known dense linear algebra libraries for most hardware architectures. The Gauss-Huard algorithm (GHA) is a reliable and alternative method that presents a computational effort close to that of the LU-based approach. In this work we present several implementations of GHA on the Intel Xeon Phi coprocessor. The experimental results show that our solvers based in GHA represent a competitive alternative to LU-based solvers, being an appealing method for the solution of small to medium linear systems, with remarkable reductions in the time-to-solution for systems of dimension \(n\le 4,000\).

The Journal of Supercomputing | 2015

Extending lyapack for the solution of band Lyapunov equations on hybrid CPU---GPU platforms

Peter Benner; Alfredo Remón; Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí

The solution of large-scale Lyapunov equations is an important tool for the solution of several engineering problems arising in optimal control and model order reduction. In this work, we investigate the case when the coefficient matrix of the equations presents a band structure. Exploiting the structure of this matrix, we can achive relevant reductions in the memory requirements and the number of floating-point operations. Additionally, the new solver efficiently leverages the parallelism of CPU–GPU platforms. Furthermore, it is integrated in the lyapack library to facilitate its use. The new codes are evaluated on the solution of several benchmarks, exposing significant runtime reductions with respect to the original CPU version in lyapack.

ieee international conference on high performance computing data and analytics | 2014

Efficient symmetric band matrix-matrix multiplication on GPUs

Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí; Alfredo Remón

Matrix-matrix multiplication is an important linear algebra operation with a myriad of applications in scientific and engineering computing. Due to the relevance and inner parallelism of this operation, there exist many high performance implementations for a variety of hardware platforms. Exploit the structure of the matrices involved in the operation in general provides relevant time and memory savings. This is the case, e.g., when one of the matrices is a symmetric band matrix. This work presents two efficient specialized implementations of the operation when a symmetric band matrix is involved and the target architecture contains a graphics processor (GPU). In particular, both implementations exploit the structure of the matrices to leverage the vast parallelism of the underlying hardware. The experimental results show remarkable reductions in the computation time over the tuned implementations of the same operation provided by MKL and CUBLAS.

2012 Third Workshop on Applications for Multi-Core Architecture | 2012

A Study on Mixed Precision Techniques for a GPU-based SIP Solver

Pablo Igounet; Ernesto Dufrechou; Martín Pedemonte; Pablo Ezzatti

This article presents the study and application of mixed precision techniques to accelerate a GPU-based implementation of the Strongly Implicit Procedure (SIP) to solve hepta-diagonal linear systems. In particular, two different options to incorporate mixed precision in the GPU implementation are discussed and one of them is implemented. The experimental evaluation of our proposal demonstrates that a runtime similar to a single precision implementation on GPU can be attained, but achieving a numerical accuracy comparable to double precision arithmetic.

programming models and applications for multicores and manycores | 2018

Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers

José Ignacio Aliaga; Matthias Bollhöfer; Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí

We target the solution of sparse linear systems via iterative Krylov subspace-based methods enhanced with the ILUPACK preconditioner on graphics processing units (GPUs). Concretely, in this work we extend ILUPACK with an implementation of the BiCG solver capable of exploiting dual-GPU systems. We leverage the structure of the BiCG to execute the main stages of the solver in a concurrent manner, and take advantage of the extended memory space to improve the data access patterns. The experimental results on a server with two NVIDIA K40 GPUs show important acceleration factors with respect to a previous single GPU variant.

The Journal of Supercomputing | 2018

An efficient GPU version of the preconditioned GMRES method

José Ignacio Aliaga; Ernesto Dufrechou; Pablo Ezzatti; Enrique S. Quintana-Ortí

In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative solvers, among which preconditioned Krylov subspace methods occupy a place of privilege. In a previous effort, we developed a GPU-aware version of the GMRES method included in ILUPACK, a package of solvers distinguished by its inverse-based multilevel ILU preconditioner. In this work, we study the performance of our previous proposal and integrate several enhancements in order to mitigate its principal bottlenecks. The numerical evaluation shows that our novel proposal can reach important run-time reductions.

Explore More