Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paolo Bientinesi is active.

Publication


Featured researches published by Paolo Bientinesi.


ACM Transactions on Mathematical Software | 2005

The science of deriving dense linear algebra algorithms

Paolo Bientinesi; John A. Gunnels; Margaret E. Myers; Enrique S. Quintana-Ortí; Robert A. van de Geijn

In this article we present a systematic approach to the derivation of families of high-performance algorithms for a large set of frequently encountered dense linear algebra operations. As part of the derivation a constructive proof of the correctness of the algorithm is generated. The article is structured so that it can be used as a tutorial for novices. However, the method has been shown to yield new high-performance algorithms for well-studied linear algebra operations and should also be of interest to those who wish to produce best-in-class high-performance codes.


acm sigplan symposium on principles and practice of parallel programming | 2008

SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Ernie Chan; Field G. Van Zee; Paolo Bientinesi; Enrique S. Quintana-Ortí; Gregorio Quintana-Ortí; Robert A. van de Geijn

This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for SMP and/or multi-core architectures. We use this system to demonstrate how code described at a high level of abstraction can achieve high performance on such architectures while completely hiding the parallelism from the library programmer. The key insight entails viewing matrices hierarchically, consisting of blocks that serve as units of data where operations over those blocks are treated as units of computation. The implementation transparently enqueues the required operations, internally tracking dependencies, and then executes the operations utilizing out-of-order execution techniques inspired by superscalar microarchitectures. This separation of concerns allows library developers to implement algorithms without concerning themselves with the parallelization aspect of the problem. Different heuristics for scheduling operations can be implemented in the runtime system independent of the code that enqueues the operations. Results gathered on a 16 CPU ccNUMA Itanium2 server demonstrate excellent performance.


SIAM Journal on Scientific Computing | 2005

A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations

Paolo Bientinesi; Inderjit S. Dhillon; Robert A. van de Geijn

We present a new parallel algorithm for the dense symmetric eigenvalue/eigenvector problem that is based upon the tridiagonal eigensolver, Algorithm


ACM Transactions on Mathematical Software | 2008

Families of algorithms related to the inversion of a Symmetric Positive Definite matrix

Paolo Bientinesi; Brian Christopher Gunter; Robert A. van de Geijn

\mbox{\sf MR}^3


ACM Transactions on Mathematical Software | 2005

Representing linear algebra algorithms in code: the FLAME application program interfaces

Paolo Bientinesi; Enrique S. Quintana-Ortí; Robert A. van de Geijn

, recently developed by Dhillon and Parlett. Algorithm


parallel processing and applied mathematics | 2009

Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures

Paolo Bientinesi; Francisco D. Igual; Daniel Kressner; Enrique S. Quintana-Ortí

\mbox{\sf MR}^3


Journal of Computational and Applied Mathematics | 2013

Dissecting the FEAST algorithm for generalized eigenproblems

Lukas Krämer; Edoardo Di Napoli; Martin Galgon; Bruno Lang; Paolo Bientinesi

has a complexity of O(n2) operations for computing all eigenvalues and eigenvectors of a symmetric tridiagonal problem. Moreover the algorithm requires only O(n) extra workspace and can be adapted to compute any subset of k eigenpairs in O(nk) time. In contrast, all earlier stable parallel algorithms for the tridiagonal eigenproblem require O(n3) operations in the worst case, while some implementations, such as divide and conquer, have an extra O(n2) memory requirement. The proposed parallel algorithm balances the workload equally among the processors by traversing a matrix-dependent representation tree which captures the sequence of computations performed by Algorithm


ieee international conference on cloud computing technology and science | 2010

HPC on Competitive Cloud Resources

Paolo Bientinesi; Roman Iakymchuk; Jeff Napper

\mbox{\sf MR}^3


ACM Transactions on Mathematical Software | 2008

Scalable parallelization of FLAME code via the workqueuing model

Field G. Van Zee; Paolo Bientinesi; Tze Meng Low; Robert A. van de Geijn

. The resulting implementation allows problems of very large size to be solved efficiently---the largest dense eigenproblem solved in-core on a 256 processor machine with 2 GBytes of memory per processor is for a matrix of size 128,000


SIAM Journal on Scientific Computing | 2013

High-Performance Solvers for Dense Hermitian Eigenproblems

Matthias Petschow; Elmar Peise; Paolo Bientinesi

\times

Collaboration


Dive into the Paolo Bientinesi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Elmar Peise

RWTH Aachen University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yurii S. Aulchenko

Novosibirsk State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge