Salvatore Filippone | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Salvatore Filippone is active.

Explore More

Publication

Featured researches published by Salvatore Filippone.

ACM Transactions on Mathematical Software | 2000

PSBLAS: a library for parallel linear algebra computation on sparse matrices

Salvatore Filippone; Michele Colajanni

Many computationally intensive problems in engineering and science give rise to the solution of large, sparse, linear systems of equations. Fast and efficient methods for their soltion are very important because these systems usually occur in the innermost loop of the computational scheme. Parallelization is often necessary to achieve an acceptable level of performance. This paper presents the design, implementation, and interface of a library of Basic Linear Algebra Subroutines for sparse matrices (PSBLAS) which is specifically tailored to distributed-memory computers. PSBLAS enables easy, efficient, and portable implementations of parallel iterative solvers for linear systems. The interface keeps in view a Single Program Multiple Data programming model on distributed-memory machines. However, the architecture of the library does not exclude an implementation in different paradigms, such as those based on the shared-memory model.

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers

Alessandro Fanfarillo; Tobias Burnus; Valeria Cardellini; Salvatore Filippone; Dan Nagle; Damian W. I. Rouson

Coarray Fortran is a set of features of the Fortran 2008 standard that make Fortran a PGAS parallel programming language. Two commercial compilers currently support coarrays: Cray and Intel. Here we present two coarray transport layers provided by the new OpenCoarrays project: one library based on MPI and the other on GASNet. We link the GNU Fortran (GFortran) compiler to either of the two OpenCoarrays implementations and present performance comparisons between executables produced by GFortran and the Cray and Intel compilers. The comparison includes synthetic benchmarks, application prototypes, and an application kernel. In our tests, Intel outperforms GFortran only on intra-node small transfers (in particular, scalars). GFortran outperforms Intel on intra-node array transfers and in all settings that require inter-node transfers. The Cray comparisons are mixed, with either GFortran or Cray being faster depending on the chosen hardware platform, network, and transport layer.

ieee international conference on high performance computing data and analytics | 2007

Performance Optimization and Modeling of Blocked Sparse Kernels

Alfredo Buttari; Victor Eijkhout; Julien Langou; Salvatore Filippone

We present a method for automatically selecting optimal implementations of sparse matrix-vector operations. Our software “AcCELS” (Accelerated Compress-storage Elements for Linear Solvers) involves a setup phase that probes machine characteristics, and a run-time phase where stored characteristics are combined with a measure of the actual sparse matrix to find the optimal kernel implementation. We present a performance model that is shown to be accurate over a large range of matrices.

ACM Transactions on Mathematical Software | 2012

Object-Oriented Techniques for Sparse Matrix Computations in Fortran 2003

Salvatore Filippone; Alfredo Buttari

The efficiency of a sparse linear algebra operation heavily relies on the ability of the sparse matrix storage format to exploit the computing power of the underlying hardware. Since no format is universally better than the others across all possible kinds of operations and computers, sparse linear algebra software packages should provide facilities to easily implement and integrate new storage formats within a sparse linear algebra application without the need to modify it; it should also allow to dynamically change a storage format at run-time depending on the specific operations to be performed. Aiming at these important features, we present an Object Oriented design model for a sparse linear algebra package which relies on Design Patterns. We show that an implementation of our model can be efficiently achieved through some of the unique features of the Fortran 2003 language. Experimental results show that the proposed software infrastructure improves the modularity and ease of use of the code at no performance loss.

ACM Transactions on Mathematical Software | 2010

MLD2P4: A Package of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Fortran 95

Pasqua D’Ambra; Daniela di Serafino; Salvatore Filippone

Domain decomposition ideas have long been an essential tool for the solution of PDEs on parallel computers. In recent years many research efforts have been focused on recursively employing domain decomposition methods to obtain multilevel preconditioners to be used with Krylov solvers. In this context, we developed MLD2P4 (MultiLevel Domain Decomposition Parallel Preconditioners Package based on PSBLAS), a package of parallel multilevel preconditioners that combines additive Schwarz domain decomposition methods with a smoothed aggregation technique to build a hierarchy of coarse-level corrections in an algebraic way. The design of MLD2P4 was guided by objectives such as extensibility, flexibility, performance, portability, and ease of use. They were achieved by following an object-based approach while using the Fortran 95 language, as well as by employing the PSBLAS library as a basic framework. In this article, we present MLD2P4 focusing on its design principles, software architecture, and use.

ACM Transactions on Mathematical Software | 2017

Sparse Matrix-Vector Multiplication on GPGPUs

Salvatore Filippone; Valeria Cardellini; Davide Barbieri; Alessandro Fanfarillo

The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high-performance computing architectures. The introduction of General-Purpose Graphics Processing Units (GPGPUs) is no exception, and many articles have been devoted to this problem. With this article, we provide a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years. We discuss the issues and tradeoffs that have been encountered by the various researchers, and a list of solutions, organized in categories according to common features. We also provide a performance comparison across different GPGPU models and on a set of test matrices coming from various application domains.

international multiconference on computer science and information technology | 2010

Use of hybrid recursive CSR/COO data structures in sparse matrix-vector multiplication

Michele Martone; Salvatore Filippone; Salvatore Tucci; Pawel Gepner; Marcin Paprzycki

Recently, we have introduced an approach to basic sparse matrix computations on multicore cache based machines using recursive partitioning. Here, the memory representation of a sparse matrix consists of a set of submatrices, which are used as leaves of a quad-tree structure. In this paper, we evaluate the performance impact, on the Sparse Matrix-Vector Multiplication (SpMV), of a modification to our Recursive CSR implementation, allowing the use of multiple data structures in leaf matrices (CSR/COO, with either 16/32 bit indices).

symbolic and numeric algorithms for scientific computing | 2010

On the Usage of 16 Bit Indices in Recursively Stored Sparse Matrices

Michele Martone; Salvatore Filippone; Marcin Paprzycki; Salvatore Tucci

In our earlier work, we have investigated the feasibility of utilization of recursive partitioning in basic (BLAS oriented) sparse matrix computations, on multi-core cache-based computers. Following encouraging experimental results obtained for SPMV and SPSV operations, here we proceed to tune the storage format. To limit the memory bandwidth overhead we introduce usage of shorter (16 bit) indices in leaf sub matrices (at the end of the recursion). Experimental results obtained for the proposed approach on 8-core machines illustrate speed improvements, when performing sparse matrix-vector multiplication.

symbolic and numeric algorithms for scientific computing | 2010

On BLAS Operations with Recursively Stored Sparse Matrices

Michele Martone; Salvatore Filippone; Marcin Paprzycki; Salvatore Tucci

Recently, we have proposed a recursive partitioning based layout for multi-core computations on sparse matrices. Based on positive results of our initial experiments with matrix-vector multiplication, we discuss how this storage format can be utilized across a range of BLAS-style matrix operations.

high performance computing and communications | 2006

An enhanced parallel version of kiva–3v, coupled with a 1d CFD code, and its use in general purpose engine applications

Gino Bella; Fabio Bozza; Alessandro De Maio; Francesco Del Citto; Salvatore Filippone

Numerical simulations of reactive flows are among the most computational demanding applications in the scientific computing world. KIVA-3V, a widely used computer program for CFD, specifically tailored to engine applications, had been deeply modified in order to improve accuracy and stability, while reducing computational time. The original methods included in KIVA to solve equations of fluid dynamics had been fully replaced by new solvers, with the aim of both improving performance and writing a fully parallel code. Almost every feature of original KIVA-3V has been partially or entirely rewritten, a full 1D code has been included and a strategy to link directly 3D zones with zero dimensional models has been developed. The result is a reliable program, noticeably faster than the original KIVA-3V in serial mode and obviously even more in parallel, capable of treating more complex cases and bigger grids, with the desired level of details where required.

Explore More