Paolo Bientinesi
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paolo Bientinesi.
ACM Transactions on Mathematical Software | 2005
Paolo Bientinesi; John A. Gunnels; Margaret E. Myers; Enrique S. Quintana-Ortí; Robert A. van de Geijn
In this article we present a systematic approach to the derivation of families of high-performance algorithms for a large set of frequently encountered dense linear algebra operations. As part of the derivation a constructive proof of the correctness of the algorithm is generated. The article is structured so that it can be used as a tutorial for novices. However, the method has been shown to yield new high-performance algorithms for well-studied linear algebra operations and should also be of interest to those who wish to produce best-in-class high-performance codes.
acm sigplan symposium on principles and practice of parallel programming | 2008
Ernie Chan; Field G. Van Zee; Paolo Bientinesi; Enrique S. Quintana-Ortí; Gregorio Quintana-Ortí; Robert A. van de Geijn
This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for SMP and/or multi-core architectures. We use this system to demonstrate how code described at a high level of abstraction can achieve high performance on such architectures while completely hiding the parallelism from the library programmer. The key insight entails viewing matrices hierarchically, consisting of blocks that serve as units of data where operations over those blocks are treated as units of computation. The implementation transparently enqueues the required operations, internally tracking dependencies, and then executes the operations utilizing out-of-order execution techniques inspired by superscalar microarchitectures. This separation of concerns allows library developers to implement algorithms without concerning themselves with the parallelization aspect of the problem. Different heuristics for scheduling operations can be implemented in the runtime system independent of the code that enqueues the operations. Results gathered on a 16 CPU ccNUMA Itanium2 server demonstrate excellent performance.
SIAM Journal on Scientific Computing | 2005
Paolo Bientinesi; Inderjit S. Dhillon; Robert A. van de Geijn
We present a new parallel algorithm for the dense symmetric eigenvalue/eigenvector problem that is based upon the tridiagonal eigensolver, Algorithm
ACM Transactions on Mathematical Software | 2008
Paolo Bientinesi; Brian Christopher Gunter; Robert A. van de Geijn
\mbox{\sf MR}^3
ACM Transactions on Mathematical Software | 2005
Paolo Bientinesi; Enrique S. Quintana-Ortí; Robert A. van de Geijn
, recently developed by Dhillon and Parlett. Algorithm
parallel processing and applied mathematics | 2009
Paolo Bientinesi; Francisco D. Igual; Daniel Kressner; Enrique S. Quintana-Ortí
\mbox{\sf MR}^3
Journal of Computational and Applied Mathematics | 2013
Lukas Krämer; Edoardo Di Napoli; Martin Galgon; Bruno Lang; Paolo Bientinesi
has a complexity of O(n2) operations for computing all eigenvalues and eigenvectors of a symmetric tridiagonal problem. Moreover the algorithm requires only O(n) extra workspace and can be adapted to compute any subset of k eigenpairs in O(nk) time. In contrast, all earlier stable parallel algorithms for the tridiagonal eigenproblem require O(n3) operations in the worst case, while some implementations, such as divide and conquer, have an extra O(n2) memory requirement. The proposed parallel algorithm balances the workload equally among the processors by traversing a matrix-dependent representation tree which captures the sequence of computations performed by Algorithm
ieee international conference on cloud computing technology and science | 2010
Paolo Bientinesi; Roman Iakymchuk; Jeff Napper
\mbox{\sf MR}^3
ACM Transactions on Mathematical Software | 2008
Field G. Van Zee; Paolo Bientinesi; Tze Meng Low; Robert A. van de Geijn
. The resulting implementation allows problems of very large size to be solved efficiently---the largest dense eigenproblem solved in-core on a 256 processor machine with 2 GBytes of memory per processor is for a matrix of size 128,000
SIAM Journal on Scientific Computing | 2013
Matthias Petschow; Elmar Peise; Paolo Bientinesi
\times