Théo Mary
University of Toulouse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Théo Mary.
SIAM Journal on Scientific Computing | 2017
Patrick R. Amestoy; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary
Matrices coming from elliptic partial differential equations have been shown to have a low-rank property: well-defined off-diagonal blocks of their Schur complements can be approximated by low-rank products, and this property can be efficiently exploited in multifrontal solvers to provide a substantial reduction of their complexity. Among the possible low-rank formats, the block low-rank (BLR) format is easy to use in a general purpose multifrontal solver and has been shown to provide significant gains compared to full-rank on practical applications. However, unlike hierarchical formats, such as
ieee international conference on high performance computing data and analytics | 2015
Théo Mary; Ichitaro Yamazaki; Jakub Kurzak; Piotr Luszczek; Stanimire Tomov; Jack J. Dongarra
\mathcal{H}
international conference on big data | 2014
Ichitaro Yamazaki; Théo Mary; Jakub Kurzak; Stanimire Tomov; Jack J. Dongarra
and HSS, its theoretical complexity was unknown. In this paper, we extend the theoretical work done on hierarchical matrices in order to compute the theoretical complexity of the BLR multifrontal factorization. We then present several variants of the BLR multifrontal factorization, depending on the strategies used to perform the updates in the frontal matrices and on the constraints on how numerical ...
Geophysics | 2016
Patrick R. Amestoy; Romain Brossier; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary; Ludovic Métivier; Alain Miniussi; Stéphane Operto
A low-rank approximation of a dense matrix plays an important role in many applications. To compute such an approximation, a common approach uses the QR factorization with column pivoting (QRCP). Though the reliability and efficiency of QRCP have been demonstrated, this deterministic approach requires costly communication at each step of the factorization. Since such communication is becoming increasingly expensive on modern computers, an alternative approach based on random sampling, which can be implemented using communication-optimal kernels, is becoming attractive. To study its potential, in this paper, we compare the performance of random sampling with that of QRCP on an NVIDIA Kepler GPU. Our performance results demonstrate that random sampling can be up to 12.8x faster than the deterministic approach for computing the approximation of the same accuracy. We also present the parallel scaling of the random sampling over multiple GPUs on a single compute node, showing a speedup of 3.8x over three Kepler GPUs. These results demonstrate the potential of the random sampling as an excellent computational tool for many applications, and its potential is likely to grow on the emerging computers with the increasing communication costs.
Seg Technical Program Expanded Abstracts | 2015
Patrick R. Amestoy; Romain Brossier; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary; Ludovic Métivier; Alain Miniussi; Stéphane Operto; Jean Virieux; Clement Weisbecker
Low-rank matrix approximations play important roles in many statistical, scientific, and engineering applications. To compute such approximations, different algorithms have been developed by researchers from a wide range of areas including theoretical computer science, numerical linear algebra, statistics, applied mathematics, data analysis, machine learning, and physical and biological sciences. In this paper, to combine these efforts, we present an “access-averse” framework which encapsulates some of the existing algorithms for computing a truncated singular value decomposition (SVD). This framework not only allows us to develop software whose performance can be tuned based on domain specific knowledge, but it also allows a user from one discipline to test an algorithm from another, or to combine the techniques from different algorithms. To demonstrate this potential, we implement the framework on multicore CPUs with multiple GPUs and compare the performance of two representative algorithms, blocked variants of matrix power and Lanczos methods. Our performance studies with large-scale graphs from real applications demonstrate that, when combined with communication-avoiding and thick-restarting techniques, the Lanczos method can be competitive with the power method, which is one of the most popular methods currently used for these applications. InIn addition, though we only focus on the truncated SVDs, the two computational kernels used in our studies, the sparse-matrix dense-matrix multiply and tall-skinny QR factorization, are fundamental building blocks for computing low-rank approximations with other objectives. Hence, our studies may have a greater impact beyond the truncated SVDs.
Geophysical Journal International | 2017
Daniil Shantsev; Piyoosh Jaysaval; Sébastien de la Kethulle de Ryhove; Patrick R. Amestoy; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary
Seg Technical Program Expanded Abstracts | 2015
Patrick R. Amestoy; Romain Brossier; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary; Ludovic Métivier; Alain Miniussi; Stéphane Operto; Alessandra Ribodetti; Jean Virieux; Clement Weisbecker
SIAM Journal on Matrix Analysis and Applications | 2018
Nicholas J. Higham; Théo Mary
Archive | 2018
Patrick R. Amestoy; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary
Archive | 2017
Patrick R. Amestoy; Alfredo Buttari; Jean-Yves L'Excellent; Théo Mary