Massimo Torquati | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Massimo Torquati is active.

Explore More

Publication

Featured researches published by Massimo Torquati.

international conference on parallel processing | 2011

Accelerating code on multi-cores with fastflow

Marco Aldinucci; Marco Danelutto; Peter Kilpatrick; Massimiliano Meneghin; Massimo Torquati

FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications.

european conference on parallel processing | 2005

Dynamic reconfiguration of grid-aware applications in ASSIST

Marco Aldinucci; Alessandro Petrocelli; Edoardo Pistoletti; Massimo Torquati; Marco Vanneschi; Luca Veraldi; Corrado Zoccolo

Current grid-aware applications are implemented on top of low-level libraries by developers who are experts on grid middleware architecture. This approach can hardly support the additional complexity of QoS control in real applications. We discuss a novel approach used in the ASSIST programming environment to implement/guarantee user provided QoS contracts in a transparent and effective way. Our approach is based on the implementation of automatic run-time reconfiguration of ASSIST application executions triggered by mismatch between the user provided QoS contract and the actual performance values achieved.

parallel, distributed and network-based processing | 2010

Efficient Smith-Waterman on Multi-core with FastFlow

Marco Aldinucci; Massimiliano Meneghin; Massimo Torquati

Shared memory multiprocessors have returned to popularity thanks to rapid spreading of commodity multi-core architectures. However, little attention has been paid to supporting effective streaming applications on these architectures. We describe FastFlow, a low-level programming framework based on lock-free queues explicitly designed to support high-level languages for streaming applications. We compare FastFlow with state-of-the-art programming frameworks such as Cilk, OpenMP, and Intel TBB. We experimentally demonstrate that FastFlow is always more efficient than them on a given real world application: the speedup of FastFlow over other solutions may be substantial for fine grain tasks, for example +35\% over OpenMP, +226\% over Cilk, +96\% over TBB for the alignment of protein P01111 against UniProt DB using the Smith-Waterman algorithm.

european conference on parallel processing | 2003

The Implementation of ASSIST, an Environment for Parallel and Distributed Programming

Marco Aldinucci; Sonia Campa; Pierpaolo Ciullo; Massimo Coppola; Silvia Magini; Paolo Pesciullesi; Laura Potiti; Roberto Ravazzolo; Massimo Torquati; Marco Vanneschi; Corrado Zoccolo

We describe the implementation of ASSIST, a programming environment for parallel and distributed programs. Its coordination language is based of the parallel skeleton model, extended with new features to enhance expressiveness, parallel software reuse, software component integration and interfacing to external resources. The compilation process and the structure of the run-time support of ASSIST are discussed with respect to the issues introduced by the new characteristics, presenting an analysis of the first test results.

international conference on parallel processing | 2012

An efficient unbounded lock-free queue for multi-core systems

Marco Aldinucci; Marco Danelutto; Peter Kilpatrick; Massimiliano Meneghin; Massimo Torquati

The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications.

international conference on parallel processing | 2012

Targeting distributed systems in fastflow

Marco Aldinucci; Sonia Campa; Marco Danelutto; Peter Kilpatrick; Massimo Torquati

FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting also a network of multi-core workstations. The extension supports the execution of FastFlow programs by coordinating---in a structured way---the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes.

Lecture Notes in Computer Science | 2013

Structured Parallel Programming with “core” FastFlow

Marco Danelutto; Massimo Torquati

FastFlow is an open source, structured parallel programming framework originally conceived to support highly efficient stream parallel computation while targeting shared memory multi cores. Its efficiency mainly comes from the optimized implementation of the base communication mechanisms and from its layered design. FastFlow eventually provides the parallel applications programmers with a set of ready-to-use, parametric algorithmic skeletons modeling the most common parallelism exploitation patterns. The algorithmic skeleton provided by FastFlow may be freely nested to model more and more complex parallelism exploitation patterns. This tutorial describes the “core” FastFlow, that is the set of skeletons supported since version 1.0 in FastFlow, and outlines the recent advances aimed at (i) introducing new, higher level skeletons and (ii) targeting networked multi cores, possibly equipped with GPUs, in addition to single multi/many core processing elements.

european conference on machine learning | 2010

Porting decision tree algorithms to multicore using FastFlow

Marco Aldinucci; Salvatore Ruggieri; Massimo Torquati

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7× speedup on an Intel dual-quad core machine.

Future Generation Computer Systems | 2014

Parallel patterns for heterogeneous CPU/GPU architectures: structured parallelism from cluster to cloud

Sonia Campa; Marco Danelutto; Mehdi Goli; Horacio González-Vélez; Alina Madalina Popescu; Massimo Torquati

The widespread adoption of traditional heterogeneous systems has substantially improved the computing power available and, in the meantime, raised optimisation issues related to the processing of task streams across both CPU and GPU cores in heterogeneous systems. Similar to the heterogeneous improvement gained in traditional systems, cloud computing has started to add heterogeneity support, typically through GPU instances, to the conventional CPU-based cloud resources. This optimisation of cloud resources will arguably have a real impact when running on-demand computationally-intensive applications. In this work, we investigate the scaling of pattern-based parallel applications from physical, “local” mixed CPU/GPU clusters to a public cloud CPU/GPU infrastructure. Specifically, such parallel patterns are deployed via algorithmic skeletons to exploit a peculiar parallel behaviour while hiding implementation details. We propose a systematic methodology to exploit approximated analytical performance/cost models, and an integrated programming framework that is suitable for targeting both local and remote resources to support the offloading of computations from structured parallel applications to heterogeneous cloud resources, such that performance values not available on local resources may be actually achieved with the remote resources. The amount of remote resources necessary to achieve a given performance target is calculated through the performance models in order to allow any user to hire the amount of cloud resources needed to achieve a given target performance value. Thus, it is therefore expected that such models can be used to devise the optimal proportion of computations to be allocated on different remote nodes for Big Data computations. We present different experiments run with a proof of-concept implementation based on FastFlowon small departmental clusters as well as on a public cloud infrastructure with CPU and GPU using the Amazon Elastic Compute Cloud. In particular, we show how CPU-only and mixed CPU/GPU computations can be offloaded to remote cloud resources with predictable performances and how data intensive applications can be mapped to a mix of local and remote resources to guarantee optimal performances.

parallel computing | 2004

A framework for experimenting with structured parallel programming environment design

Marco Aldinucci; Sonia Campa; Pierpaolo Ciullo; Massimo Coppola; Marco Danelutto; Paolo Pesciullesi; Roberto Ravazzolo; Massimo Torquati; Marco Vanneschi; Corrado Zoccolo

Publisher Summary A software development system based on integrated skeleton technology (ASSIST) is a parallel programming environment aimed at providing programmers of complex parallel application with a suitable and effective programming tool. Being based on algorithmically skeletons and coordination languages technologies, the programming environment relieves the programmer from a number of cumbersome, error prone activities that are required when using traditional parallel programming environments. ASSIST has been specifically designed to be easily customizable in order to experiment different implementation techniques, solutions, algorithms or back-ends any time new features are required or new technologies become available. This chapter explains how the ASSIST programming environment can be used to experiment new implementation techniques, mechanisms, and solutions within the framework of structured parallel programming models. The ASSIST implementation structure is briefly outlined and some experiments performed aimed at extending the environment are discussed. Those experiments were aimed at modifying ASSIST environment in such a way that, it can be used to program GRID architectures, including existing libraries in the application code, target heterogeneous cluster architectures, etc.

Explore More