Mauro Bianco | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mauro Bianco is active.

Explore More

Publication

Featured researches published by Mauro Bianco.

memory performance dealing with applications systems and architecture | 2007

The STAPL pArray

Gabriel Tanase; Mauro Bianco; Nancy M. Amato; Lawrence Rauchwerger

The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming framework that extends C++ and STL with support for parallelism. STAPL provides parallel data structures (pContainers) and generic parallel algorithms (pAlgorithms), and a methodology for extending them to provide customized functionality. STAPL pContainers are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They provide views as a generic means to access data that can be passed as input to generic pAlgorithms. In this work, we present the STAPL pArray, the parallel equivalent of the sequential STL valarray, a fixed-size data structure optimized for storing and accessing data based on one-dimensional indices. We describe the pArray design and show how it can support a variety of underlying data distribution policies currently available in STAPL, such as blocked or blocked cyclic. We provide experimental results showing that pAlgorithms using the pArray scale well to more than 2,000 processors. We also provide results using different data distributions that illustrate that the performance of pAlgorithms and pArray methods is usually sensitive to the underlying data distribution, and moreover, that there is no one data distribution that performs best for all pAlgorithms, processor counts, or machines.

languages and compilers for parallel computing | 2007

Associative Parallel Containers in STAPL

Gabriel Tanase; Chidambareswaran Raman; Mauro Bianco; Nancy M. Amato; Lawrence Rauchwerger

The Standard Template Adaptive Parallel Library ( stapl ) is a parallel programming framework that extends C++ and stl with support for parallelism. stapl provides a collection of parallel data structures ( pContainers ) and algorithms ( pAlgorithms ) and a generic methodology for extending them to provide customized functionality. stapl pContainers are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. They also provide appropriate interfaces that can be used by generic pAlgorithms . In this work, we present the design and implementation of the stapl associative pContainers : pMap , pSet , pMultiMap , pMultiSet , pHashMap , and pHashSet . These containers provide optimal insert, search, and delete operations for a distributed collection of elements based on keys. Their methods include counterparts of the methods provided by the stl associative containers, and also some asynchronous (non-blocking) variants that can provide improved performance in parallel. We evaluate the performance of the stapl associative pContainers on an IBM Power5 cluster, an IBM Power3 cluster, and on a linux-based Opteron cluster, and show that the new pContainer asynchronous methods, generic pAlgorithms (e.g., pfind ) and a sort application based on associative pContainers , all provide good scalability on more than 103processors.

european conference on parallel processing | 2014

A Generic Strategy for Multi-stage Stencils

Mauro Bianco; Benjamin Cumming

Stencil computations on regular grids are widely used in scientific simulations. Optimization techniques for such stencil computations typically exploit temporal locality across time steps. More complex stencil applications, like those in meteorology and seismic simulations, cannot easily take advantage of these techniques, since the number of physical fields and computation stages to consider at each time step flush all data present in the cache at the beginning of the next time step. In this paper we present a technique for improving performance of such computations, based only on spatial tiling, which is implemented as a generic algorithm.

string processing and information retrieval | 2009

Expectation of Strings with Mismatches under Markov Chain Distribution

Cinzia Pizzi; Mauro Bianco

We study a problem related to the extraction of over-represented words from a given source text x , of length n . The words are allowed to occur with k mismatches, and x is produced by a source over an alphabet Σ according to a Markov chain of order p . We propose an online algorithm to compute the expected number of occurrences of a word y of length m in O (mk |Σ| p + 1). We also propose an offline algorithm to compute the probability of any word that occurs in the text in O (k |Σ|2) after O (nk |Σ| p + 1) pre-processing. This algorithm allows us to compute the expectation for all the words in a text of length n in O (kn 2|Σ|2 + nk |Σ| p + 1), rather than in O (n 3 |Σ| p + 1) that can be obtained with other methods. Although this study was motivated by the motif discovery problem in bioinformatics, the results find their applications in any other domain involving combinatorics on words.

international conference on computational science | 2004

A Fast Multifrontal Solver for Non-linear Multi-physics Problems

Alberto Bertoldo; Mauro Bianco; Geppino Pucci

The paper presents a highly optimized implementation of a multifrontal solver for linear systems arising in the FEM simulation of multi-physics problems related to the behaviour of porous media. The solver features a careful prepro- cessing phase that is crucial to considerably speed up both system assembly and Gaussian elimination. When run on a number of relevant test cases, the proposed solver compares very favourably with both its previous unifrontal counterpart and two general multifrontal solvers well known in the literature.

international symposium on parallel and distributed processing and applications | 2006

A static parallel multifrontal solver for finite element meshes

Alberto Bertoldo; Mauro Bianco; Geppino Pucci

We present a static parallel implementation of the multifrontal method to solve unsymmetric sparse linear systems on distributed-memory architectures. We target Finite Element (FE) applications where numerical pivoting can be avoided, since an implicit minimum-degree ordering based on the FE mesh topology suffices to achieve numerical stability. Our strategy is static in the sense that work distribution and communication patterns are determined in a preprocessing phase preceding the actual numerical computation. To balance the load among the processors, we devise a simple model-driven partitioning strategy to precompute a high-quality balancing for a large family of structured meshes. The resulting approach is proved to be considerably more efficient than the strategies implemented by MUMPS and SuperLU_DIST, two state-of-the-art parallel multifrontal solvers.

international conference on computational science and its applications | 2003

A high-performance UL factorization for the frontal method

Mauro Bianco

An optimized version of a frontal solver for the finite element simulation of non-linear coupled multiphysic problems arising in the simulation of concrete under fire is presented. A new algorithm for UL factorization using BLAS level 3 routines is developed, and then used to implement a pivoting strategy that has shown to be well suited for the linear systems involved. Our implementation also features efficient algorithms to swap rows and columns of the matrix of the system. The resulting code shows to be effective for this kind of complex problems, being able to exhibit 850MFlops performance on a 375MHz IBM Power3 machine, with computational errors comparable with the round-off unit of double precision floating point numbers.

european conference on parallel processing | 2000

On the Predictive Quality of BSP-like Cost Functions for NOWs

Mauro Bianco; Geppino Pucci

The Bulk-Synchronous Parallel (BSP) model [16] provides a simple and portable programming discipline that is particularly suitable for coarse-grained parallel systems such as Networks of Workstations (NOWs). In this work we examine the issue of predictability of the BSP cost function for a NOW consisting of SUN workstations connected through a 10Mbps Ethernet network. In particular, we compare the original BSP cost function with a number of newly proposed variants, with the intent of improving predictability by having the cost function encompass those parameters of the hardware/software system which have the largest impact on performance.

complex, intelligent and software intensive systems | 2007

Obtaining Performance Measures through Microbenchmarking in a Peer-to-Peer Overlay Computer

Paolo Bertasi; Mauro Bianco; Andrea Pietracaprina; Geppino Pucci

We address the problem of developing a suite of microbenchmarking experiments aimed at providing the basic functionalities of a measurement tool for a P2P-based globally distributed computing platform, usually referred to as overlay computer. We argue that such a measuring system should take into account the communication patterns generated by the applications in order to provide useful performance insights

International Journal for Numerical Methods in Engineering | 2003