Marc Baboulin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Baboulin is active.

Explore More

Publication

Featured researches published by Marc Baboulin.

parallel computing | 2010

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Stanimire Tomov; Jack J. Dongarra; Marc Baboulin

If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accelerators! This is happening today as accelerators, in particular Graphics Processing Units (GPUs), are steadily making their way into the high performance computing (HPC) world. We highlight the trends leading to the idea of hybrid manycore/GPU systems, and we present a set of techniques that can be used to eciently program them. The presentation is in the context of Dense Linear Algebra (DLA), a major building block for many scientic computing applications.We motivate the need for new algorithms that would split the computation in a way that would fully exploit the power that each of the hybrid components oers. As the area of hybrid multicore/GPU computing is still in its infancy, we also argue for its importance in view of what future architectures may look like. We therefore envision the need for a DLA library similar to LAPACK but for hybrid manycore/GPU systems. We illustrate the main ideas with an LU-factorization algorithm where particular techniques are used to reduce the amount of pivoting, resulting in an algorithm achieving up to 388 GFlop/s for single and up to 99:4 GFlop/s for double precision factorization on a hybrid Intel Xeon (2x4 cores @ 2.33 GHz) { NVIDIA GeForce GTX 280 5 (240 cores @ 1.30 GHz) system.

Computer Physics Communications | 2009

Accelerating scientific computations with mixed precision algorithms

Marc Baboulin; Alfredo Buttari; Jack J. Dongarra; Jakub Kurzak; Julie Langou; Julien Langou; Piotr Luszczek; Stanimire Tomov

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.

ACM Transactions on Mathematical Software | 2013

Accelerating Linear System Solutions Using Randomization Techniques

Marc Baboulin; Jack J. Dongarra; Julien Herrmann; Stanimire Tomov

We illustrate how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a very affordable computational price while providing us with a satisfying accuracy when compared to partial pivoting. This random transformation called Partial Random Butterfly Transformation (PRBT) is optimized in terms of data storage and flops count. We propose a solver where PRBT and the LU factorization with no pivoting take advantage of the current hybrid multicore/GPU machines and we compare its Gflop/s performance with a solver implemented in a current parallel library.

SIAM Journal on Matrix Analysis and Applications | 2007

A Partial Condition Number for Linear Least Squares Problems

Mario Arioli; Marc Baboulin; Serge Gratton

We consider here the linear least squares problem

Numerical Linear Algebra With Applications | 2009

Computing the conditioning of the components of a linear least‐squares solution

Marc Baboulin; Jack J. Dongarra; Serge Gratton; Julien Langou

\min_{y \in \mathbb{R}^n}\|Ay-b\|_2

international parallel and distributed processing symposium | 2012

A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures

Marc Baboulin; Dulceneia Becker; Jack J. Dongarra

, where

international conference on conceptual structures | 2012

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines

Marc Baboulin; Simplice Donfack; Jack J. Dongarra; Laura Grigori; Adrien Rémy; Stanimire Tomov

b \in \mathbb{R}^m

parallel processing and applied mathematics | 2011

Reducing the amount of pivoting in symmetric indefinite systems

Dulceneia Becker; Marc Baboulin; Jack J. Dongarra

and

international conference on conceptual structures | 2013

A parallel solver for incompressible fluid flows

Yushan Wang; Marc Baboulin; Jack J. Dongarra; Joel Falcou; Yann Fraigneau; Olivier P. Le Maître

A \in \mathbb{R}^{m\times n}

international conference on conceptual structures | 2016

High-performance Tensor Contractions for GPUs

Ahmad Abdelfattah; Marc Baboulin; Veselin Dobrev; Jack J. Dongarra; Christopher Earl; Joel Falcou; Azzam Haidar; Ian Karlin; Tzanio V. Kolev; Ian Masliah; Stanimire Tomov

is a matrix of full column rank

Explore More