Daniele Buono | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniele Buono is active.

Explore More

Publication

Featured researches published by Daniele Buono.

international conference on autonomic computing | 2009

Expressing Adaptivity and Context Awareness in the ASSISTANT Programming Model

Carlo Bertolli; Daniele Buono; Gabriele Mencagli; Marco Vanneschi

Pervasive Grid computing platforms are composed of a variety of fixed and mobile nodes, interconnected through multiple wireless and wired network technologies. Pervasive Grid Applications must adapt themselves to the state of their surrounding environment (context), which includes the state of the resources on which they are executed. By focusing on a specific instance of emergency management application, we show how a complex high-performance problem can be solved according to multiple parallelization methodologies. We introduce the ASSISTANT programming model which allows programmers to express multiple versions of a same parallel module, each of them suitable for particular context situations. We show how the exemplified programs can be included in a single ASSISTANT parallel module and how their dynamic switching can be expressed. We provide experimental results demonstrating the effectiveness of the approach.

parallel, distributed and network-based processing | 2013

Parallel Patterns for General Purpose Many-Core

Daniele Buono; Marco Danelutto; Silvia Lametti; Massimo Torquati

Efficient programming of general purpose many-core accelerators poses several challenging problems. The high number of cores available, the peculiarity of the interconnection network, and the complex memory hierarchy organization, all contribute to make efficient programming of such devices difficult. We propose to use parallel design patterns, implemented using algorithmic skeletons, to abstract and hide most of the difficulties related to the efficient programming of many-core accelerators. In particular, we discuss the porting of the FastFlow framework on the Tilera TilePro64 architecture and the results obtained running synthetic benchmarks as well as true application kernels. These results demonstrate the efficiency achieved while using patterns on the TilePro64 both to program stand-alone skeleton-based parallel applications and to accelerate existing sequential code.

international conference on conceptual structures | 2010

Map, reduce and mapreduce, the skeleton way☆

Daniele Buono; Marco Danelutto; Silvia Lametti

Composition of Map and Reduce algorithmic skeletons have been widely studied at the end of the last century and it has demonstrated effective on a wide class of problems. We recall the theoretical results motivating the introduction of these skeletons, then we discuss an experiment implementing three algorithmic skeletons, a map, a reduce and an optimized composition of a map followed by a reduce skeleton (map+reduce). The map+reduce skeleton implemented computes the same kind of problems computed by Google MapReduce, but the data flow through the skeleton is streamed rather than relying on already distributed (and possibly quite large) data items. We discuss the implementation of the three skeletons on top of ProActive/GCM in the MareMare prototype and we present some experimental obtained on a COTS cluster.

parallel, distributed and network-based processing | 2014

Optimizing message-passing on multicore architectures using hardware multi-threading

Daniele Buono; Tiziano De Matteis; Gabriele Mencagli; Marco Vanneschi

Shared-memory and message-passing are two opposite models to develop parallel computations. The shared-memory model, adopted by existing frameworks such as OpenMP, represents a de-facto standard on multi-/many-core architectures. However, message-passing deserves to be studied for its inherent properties in terms of portability and flexibility as well as for its better ease of debugging. Achieving good performance from the use of messages in shared-memory architectures requires an efficient implementation of the run-time support. This paper investigates the definition of a delegation mechanism on multi-threaded architectures able to: (i) overlap communications with calculation phases, (ii) parallelize distribution and collective operations. Our ideas have been exemplified using two parallel benchmarks on the Intel Phi, showing that in these applications our message-passing support outperforms MPI and reaches similar performance compared to standard OpenMP implementations.

high performance computing systems and applications | 2014

Run-time mechanisms for fine-grained parallelism on network processors: The TILEPro64 experience

Daniele Buono; Gabriele Mencagli

The efficient parallelization of very fine-grained computations is an old problem still challenging also on modern shared memory architectures. Scalable parallelizations are possible if the base mechanisms provided by the run-time support (for inter-thread/inter-process synchronization/communication) are carefully designed and developed on top of parallel architectures. This requires a deep knowledge of the hardware behavior and the interaction patterns used by the parallelism paradigms. In this paper we present our experience in developing efficient inter-thread interaction mechanisms on the Tilera TILEPro64 network processor. Although it is a domain-specific parallel architecture, the TILEPro64 represents a notable example of how advanced architectural structures, such as user-accessible on-chip interconnection networks and configurable cache coherence protocols, are of great importance to design lightweight cooperation mechanisms enabling efficient parallel implementations of fine-grained problems. The paper presents our ideas and an experimental evaluation that compares our proposals with other existing run-time supports.

international conference on wireless communications and mobile computing | 2010

Resource discovery support for time-critical adaptive applications

Carlo Bertolli; Daniele Buono; Gabriele Mencagli; Massimo Torquati; Marco Vanneschi; Matteo Mordacchini; Franco Maria Nardini

Several complex and time-critical applications require the existence of novel distributed and dynamical platforms composed of a variety of fixed and mobile processing nodes and networks. Notable examples of such applications are crisis and emergency management and natural phenomenon prediction. In this scenario we need the development of applications able to adapt their behavior according to the dynamical platform conditions, such as the presence of specific classes of computing resources and the actual network availability. For these reasons such adaptive applications need to interact with a fast and reliable resource discovery support, which ensures required response times by means of an high-degree of reconfigurability and selectivity. In this paper we present an integrated approach between our programming model for distributed adaptive time-critical computations and a suitable resource discovery support.

international conference on parallel and distributed computing and networks | 2014

A LIGHTWEIGHT RUN-TIME SUPPORT FOR FAST DENSE LINEAR ALGEBRA ON MULTI-CORE

Daniele Buono; Marco Danelutto; Tiziano De Matteis; Gabriele Mencagli; Massimo Torquati

The work proposes MDF , a lightweight dynamic run-time support able to achieve high performance in the execution of dense linear algebra kernels on shared-cache multi-core. MDF implements a dynamic macro-dataow interpreter processing DAG graphs generated on-the-y out of standard numeric kernel code. The experimental results demonstrate that the performance obtained using MDF on both ne-grain and coarse-grain problems is comparable with or even better than that achieved by de-facto standard solutions (notably PLASMA library), which use separate run-time supports specically optimised for dierent computational grains on modern multi-core.

International Journal of Parallel, Emergent and Distributed Systems | 2014

Performance analysis and structured parallelisation of the space–time adaptive processing computational kernel on multi-core architectures

Daniele Buono; Gabriele Mencagli; Alessio Pascucci; Marco Vanneschi

The development of radar systems on general-purpose off-the-shelf parallel hardware represents an effective means of providing efficient implementations with reasonable realisation costs. However, the fulfilment of the required real-time constraints poses serious problems of performance and efficiency: parallel architectures need to be exploited at best, providing scalable parallelisations able to reach the desired throughput and latency levels. In this paper we discuss the implementation issues of the computational kernel of a well-known radar filtering technique – the space–time adaptive processing – on todays general-purpose parallel architectures (multi-/many-core platforms). In order to address the performance constraints imposed by the real-time implementation of this filtering technique, we apply a structured approach (structured parallel programing) to develop parallel computations as instances and compositions of well-known parallelisation patterns. This paper provides a thorough description of the implementation issues and discusses the performance peaks achievable on a broad range of existing multi-core architectures.

iasted international conference on parallel and distributed computing and systems | 2009