Paolo Bientinesi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paolo Bientinesi is active.

Explore More

Publication

Featured researches published by Paolo Bientinesi.

ACM Transactions on Mathematical Software | 2005

The science of deriving dense linear algebra algorithms

Paolo Bientinesi; John A. Gunnels; Margaret E. Myers; Enrique S. Quintana-Ortí; Robert A. van de Geijn

In this article we present a systematic approach to the derivation of families of high-performance algorithms for a large set of frequently encountered dense linear algebra operations. As part of the derivation a constructive proof of the correctness of the algorithm is generated. The article is structured so that it can be used as a tutorial for novices. However, the method has been shown to yield new high-performance algorithms for well-studied linear algebra operations and should also be of interest to those who wish to produce best-in-class high-performance codes.

acm sigplan symposium on principles and practice of parallel programming | 2008

SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Ernie Chan; Field G. Van Zee; Paolo Bientinesi; Enrique S. Quintana-Ortí; Gregorio Quintana-Ortí; Robert A. van de Geijn

This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for SMP and/or multi-core architectures. We use this system to demonstrate how code described at a high level of abstraction can achieve high performance on such architectures while completely hiding the parallelism from the library programmer. The key insight entails viewing matrices hierarchically, consisting of blocks that serve as units of data where operations over those blocks are treated as units of computation. The implementation transparently enqueues the required operations, internally tracking dependencies, and then executes the operations utilizing out-of-order execution techniques inspired by superscalar microarchitectures. This separation of concerns allows library developers to implement algorithms without concerning themselves with the parallelization aspect of the problem. Different heuristics for scheduling operations can be implemented in the runtime system independent of the code that enqueues the operations. Results gathered on a 16 CPU ccNUMA Itanium2 server demonstrate excellent performance.

SIAM Journal on Scientific Computing | 2005

A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations

Paolo Bientinesi; Inderjit S. Dhillon; Robert A. van de Geijn

We present a new parallel algorithm for the dense symmetric eigenvalue/eigenvector problem that is based upon the tridiagonal eigensolver, Algorithm

ACM Transactions on Mathematical Software | 2008

Families of algorithms related to the inversion of a Symmetric Positive Definite matrix

Paolo Bientinesi; Brian Christopher Gunter; Robert A. van de Geijn

\mbox{\sf MR}^3

ACM Transactions on Mathematical Software | 2005

Representing linear algebra algorithms in code: the FLAME application program interfaces

Paolo Bientinesi; Enrique S. Quintana-Ortí; Robert A. van de Geijn

, recently developed by Dhillon and Parlett. Algorithm

parallel processing and applied mathematics | 2009

Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures

Paolo Bientinesi; Francisco D. Igual; Daniel Kressner; Enrique S. Quintana-Ortí

\mbox{\sf MR}^3

Journal of Computational and Applied Mathematics | 2013

Dissecting the FEAST algorithm for generalized eigenproblems

Lukas Krämer; Edoardo Di Napoli; Martin Galgon; Bruno Lang; Paolo Bientinesi

has a complexity of O(n2) operations for computing all eigenvalues and eigenvectors of a symmetric tridiagonal problem. Moreover the algorithm requires only O(n) extra workspace and can be adapted to compute any subset of k eigenpairs in O(nk) time. In contrast, all earlier stable parallel algorithms for the tridiagonal eigenproblem require O(n3) operations in the worst case, while some implementations, such as divide and conquer, have an extra O(n2) memory requirement. The proposed parallel algorithm balances the workload equally among the processors by traversing a matrix-dependent representation tree which captures the sequence of computations performed by Algorithm

ieee international conference on cloud computing technology and science | 2010

HPC on Competitive Cloud Resources

Paolo Bientinesi; Roman Iakymchuk; Jeff Napper

\mbox{\sf MR}^3

ACM Transactions on Mathematical Software | 2008

Scalable parallelization of FLAME code via the workqueuing model

Field G. Van Zee; Paolo Bientinesi; Tze Meng Low; Robert A. van de Geijn

. The resulting implementation allows problems of very large size to be solved efficiently---the largest dense eigenproblem solved in-core on a 256 processor machine with 2 GBytes of memory per processor is for a matrix of size 128,000

SIAM Journal on Scientific Computing | 2013