Jean-Louis Roch
University of Grenoble
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jean-Louis Roch.
european conference on parallel processing | 2008
Daouda Traoré; Jean-Louis Roch; Nicolas Maillard; Thierry Gautier; Julien Bernard
This paper presents provable work-optimal parallelizations of STL (Standard Template Library) algorithms based on the work-stealing technique. Unlike previous approaches where a deque for each processor is typically used to locally store ready tasks and where a processor that runs out of work steals a ready task from the deque of a randomly selected processor, the current paper instead presents an original implementation of work-stealing without using any deque but a distributed list in order to bound overhead for task creations. The paper contains both theoretical and experimental results bounding the work/running time.
international symposium on algorithms and computation | 2010
Marc Tchiboukdjian; Nicolas Gast; Denis Trystram; Jean-Louis Roch; Julien Bernard
Classical list scheduling is a very popular and efficient technique for scheduling jobs in parallel platforms. However, with the increasing number of processors, the cost for managing a single centralized list becomes prohibitive. The objective of this work is to study the extra cost that must be paid when the list is distributed among the processors. We present a general methodology for computing the expected makespan based on the analysis of an adequate potential function which represents the load unbalance between the local lists. A bound on the deviation from the mean is also derived. Then, we apply this technique to show that the expected makespan for scheduling W unit independent tasks on m processors is equal to W/m with an additional term in 3.65log2 W. Moreover, simulations show that our bound is very close to the exact value, approximately 50% off. This new analysis also enables to study the influence of the initial repartition of tasks and the reduction of the number of steals when several thieves can simultaneously steal work in the same processor’s list.
Parallel Algorithms and Applications | 1994
Bogdan Dumitrescu; Jean-Louis Roch; Denis Trystram
We present in this paper the parallelization of fast matrix multiplication algorithms of Strassen and Wino-grad on MIMD distributed architectures whose interconnection networks are ring and torus. Complexity and efficiency are analyzed and good asymptotic behaviour is proved. These new parallel algorithms are compared with standard algorithms on a 128-processor parallel computer; experiments confirm the theoretical results.
Numerical Algorithms | 1997
Bogdan Dumitrescu; Mathias Doreille; Jean-Louis Roch; Denis Trystram
This paper presents a discussion on 2D block mappings for the sparse Cholesky factorization on parallel MIMD architectures with distributed memory. It introduces the fan-in algorithm in a general manner and proposes several mapping strategies. The grid mapping with row balancing, inspired by Rothbergs work (1994), is proved to be more robust than the original fan-out algorithm. Even more efficient is the proportional mapping, as shown by the experiments on a 32 processor IBM SP1 and on a Cray T3D. Subforest-to-subcube mappings are also considered and give good results on the T3D.
joint international conference on vector and parallel processing parallel processing | 1992
Jean-Louis Roch; Gilles Villard
Gcd and lattice reduction are two major problems in the field of Parallel Algebraic Computations. To know if they are in NC is still an open question. We point out their correlations and difficulties. Concerning the lattice basis reduction which is of sequential cost O(n7), we propose a parallelization leading to the time bound O(n3 log2n) for the reduction of good lattices. Experimentations show that high speed-ups can be obtained.
advances in p2p systems | 2009
Thomas Roche; Mathieu Cunche; Jean-Louis Roch
P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in High Performance Computing systems. Our contribution is in two directions: first, instead of restricting to 2D checksums that tolerate only a small number of node failures, we propose to base disk-less checkpointing on linear codes to tolerate potentially a large number of faults. Then, we compare and analyse the use of Low Density Parity Check (LDPC) to classical Reed-Solomon (RS) codes with respect to different fault models to fit P2P systems. Our LDPC disk-less checkpointing method is well suited when only node disconnections are considered, but cannot deal with byzantine peers. Our RS disk-less checkpointing method tolerates such byzantine errors, but is restricted to exact finite field computations.
international symposium on symbolic and algebraic computation | 2010
Majid Khonji; Clément Pernet; Jean-Louis Roch; Thomas Roche; Thomas Stalinski
We study algorithm based fault tolerance techniques for supporting malicious errors in distributed computations based on Chinese remainder theorem. The description holds for both computations with integers or with polynomials over a field. It unifies the approaches of redundant residue number systems and redundant polynomial systems through the Reed Solomon decoding algorithm proposed by Gao. We propose several variations on the application of the extended Euclid algorithm, where the error correction rate is adaptive. Several improvements are studied, including the use of various criterions for the termination of the Euclidean Algorithm, and an acceleration using the Half-GCD techniques. When there is some redundancy in the input, a gap in the quotient sequence is stated at the step matching the error correction, which enables early termination parallel computations. Experiments are shown to compare these approaches.
international conference on parallel architectures and languages europe | 1994
Jean-Louis Roch; Gilles Villard
Proposing a new method for parallel computations on algebraic numbers, we establish that computing the Jordan normal form of matrices over any commutative field F is in (mathcal{N}C_F).
joint international conference on vector and parallel processing parallel processing | 1994
Jean-Louis Roch; A. Vermeerbergen; Gilles Villard
Most load-sharing and load-balancing techniques rely on load indexes which only concentrate on the external behavior of parallel programs. For some applications, including symbolic computation methods, this amounts to make unrealistic assumptions about the stability of parallel programs. We present a new technique which provides reliable predictions on tasks completion times for a class of parallel applications including symbolic computations. A straightforward and simple annotation of the initial program is required. The stability and limits of our load index are also discussed. We finally show how this technique can improve the writing of portable and scalable parallel libraries, even for an heterogeneous parallel machine.
parallel computing | 2016
Jean-Guillaume Dumas; Thierry Gautier; Clément Pernet; Jean-Louis Roch; Ziad Sultan
Use of fast matrix arithmetic and modular reductions penalize fine grainsBlock recursive algorithms with OpenMP tasks can reach state of the art efficiencyThe libkomp library handles recursive tasks more efficiently than libgompRecursive PLUQ decomposition behaves best using explicit task synchronizationsDataflow task synchronizations improves efficiency on finer grain implementations We present block algorithms and their implementation for the parallelization of sub-cubic Gaussian elimination on shared memory architectures. Contrarily to the classical cubic algorithms in parallel numerical linear algebra, we focus here on recursive algorithms and coarse grain parallelization. Indeed, sub-cubic matrix arithmetic can only be achieved through recursive algorithms making coarse grain block algorithms perform more efficiently than fine grain ones. This work is motivated by the design and implementation of dense linear algebra over a finite field, where fast matrix multiplication is used extensively and where costly modular reductions also advocate for coarse grain block decomposition. We incrementally build efficient kernels, for matrix multiplication first, then triangular system solving, on top of which a recursive PLUQ decomposition algorithm is built. We study the parallelization of these kernels using several algorithmic variants: either iterative or recursive and using different splitting strategies. Experiments show that recursive adaptive methods for matrix multiplication, hybrid recursive-iterative methods for triangular system solve and tile recursive versions of the PLUQ decomposition, together with various data mapping policies, provide the best performance on a 32 cores NUMA architecture. Overall, we show that the overhead of modular reductions is more than compensated by the fast linear algebra algorithms and that exact dense linear algebra matches the performance of full rank reference numerical software even in the presence of rank deficiencies.