Daniele De Sensi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniele De Sensi is active.

Explore More

Publication

Featured researches published by Daniele De Sensi.

parallel, distributed and network-based processing | 2016

Predicting Performance and Power Consumption of Parallel Applications

Daniele De Sensi

Current architectures provide many control knobs for the reduction of power consumption of applications, like reducing the number of used cores or scaling down their frequency. However, choosing the right values for these knobs in order to satisfy requirements on performance and/or power consumption is a complex task and trying all the possible combinations of these values is an unfeasible solution since it would require too much time. For this reasons, there is the need for techniques that allow an accurate estimation of the performance and power consumption of an application when a specific configuration of the control knobs values is used. Usually, this is done by executing the application with different configurations and by using these information to predict its behaviour when the values of the knobs are changed. However, since this is a time consuming process, we would like to execute the application in the fewest number of configurations possible. In this work, we consider as control knobs the number of cores used by the application and the frequency of these cores. We show that on most Parsec benchmark programs, by executing the application in 1% of the total possible configurations and by applying a multiple linear regression model we are able to achieve an average accuracy of 96% in predicting its execution time and power consumption in all the other possible knobs combinations.

ACM Transactions on Architecture and Code Optimization | 2016

A Reconfiguration Algorithm for Power-Aware Parallel Applications

Daniele De Sensi; Massimo Torquati; Marco Danelutto

In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available power budget. On the other hand, for some applications, it could be possible to decrease their performance, yet maintain an acceptable level, in order to reduce their power consumption. To provide such guarantees, a possible solution consists in changing the number of cores assigned to the application, their clock frequency, and the placement of application threads over the cores. However, power consumption and performance have different trends depending on the application considered and on its input. Finding a configuration of resources satisfying user requirements is, in the general case, a challenging task. In this article, we propose Nornir, an algorithm to automatically derive, without relying on historical data about previous executions, performance and power consumption models of an application in different configurations. By using these models, we are able to select a close-to-optimal configuration for the given user requirement, either performance or power consumption. The configuration of the application will be changed on-the-fly throughout the execution to adapt to workload fluctuations, external interferences, and/or application’s phase changes. We validate the algorithm by simulating it over the applications of the Parsec benchmark suit. Then, we implement our algorithm and we analyse its accuracy and overhead over some of these applications on a real execution environment. Eventually, we compare the quality of our proposal with that of the optimal algorithm and of some state-of-the-art solutions.

parallel, distributed and network-based processing | 2015

Energy Driven Adaptivity in Stream Parallel Computations

Marco Danelutto; Daniele De Sensi; Massimo Torquati

Determining the right amount of resources needed for a given computation is a critical problem. In many cases, computing systems are configured to use an amount of resources to manage high load peaks even though this cause energy waste when the resources are not fully utilised. To avoid this problem, adaptive approaches are used to dynamically increase/decrease computational resources depending on the real needs. A different approach based on Dynamic Voltage and Frequency Scaling (DVFS) is emerging as a possible alternative solution to reduce energy consumption of idle CPUs by lowering their frequencies. In this work, we propose to tackle the problem in stream parallel computations by using both the classic adaptivity concepts and the possibility provided by modern CPUs to dynamically change their frequency. We validate our approach showing a real network application that performs Deep Packet Inspection over network traffic. We are able to manage bandwidth changing over time, guaranteeing minimal packet loss during reconfiguration and minimal energy consumption.

symposium on applied computing | 2017

P 3 ARSEC: towards parallel patterns benchmarking

Marco Danelutto; Tiziano De Matteis; Daniele De Sensi; Gabriele Mencagli; Massimo Torquati

High-level parallel programming is a de-facto standard approach to develop parallel software with reduced time to development. High-level abstractions are provided by existing frameworks as pragma-based annotations in the source code, or through pre-built parallel patterns that recur frequently in parallel algorithms, and that can be easily instantiated by the programmer to add a structure to the development of parallel software. In this paper we focus on this second approach and we propose P3ARSEC, a benchmark suite for parallel pattern-based frameworks consisting of a representative subset of PARSEC applications. We analyse the programmability advantages and the potential performance penalty of using such high-level methodology with respect to hand-made parallelisations using low-level mechanisms. The results are obtained on the new Intel Knights Landing multicore, and show a significantly reduced code complexity with comparable performance.

ACM Transactions on Architecture and Code Optimization | 2017

Bringing Parallel Patterns Out of the Corner: The P 3 ARSEC Benchmark Suite

Daniele De Sensi; Tiziano De Matteis; Massimo Torquati; Gabriele Mencagli; Marco Danelutto

High-level parallel programming is an active research topic aimed at promoting parallel programming methodologies that provide the programmer with high-level abstractions to develop complex parallel software with reduced time to solution. Pattern-based parallel programming is based on a set of composable and customizable parallel patterns used as basic building blocks in parallel applications. In recent years, a considerable effort has been made in empowering this programming model with features able to overcome shortcomings of early approaches concerning flexibility and performance. In this article, we demonstrate that the approach is flexible and efficient enough by applying it on 12 out of 13 PARSEC applications. Our analysis, conducted on three different multicore architectures, demonstrates that pattern-based parallel programming has reached a good level of maturity, providing comparable results in terms of performance with respect to both other parallel programming methodologies based on pragma-based annotations (i.e., Openmp and OmpSs) and native implementations (i.e., Pthreads). Regarding the programming effort, we also demonstrate a considerable reduction in lines of code and code churn compared to Pthreads and comparable results with respect to other existing implementations.

Parallel Processing Letters | 2017

A Power-Aware, Self-Adaptive Macro Data Flow Framework

Marco Danelutto; Daniele De Sensi; Massimo Torquati

The dataflow programming model has been extensively used as an effective solution to implement efficient parallel programming frameworks. However, the amount of resources allocated to the runtime s...

International Journal of Parallel Programming | 2018

Analysing Multiple QoS Attributes in Parallel Design Patterns-Based Applications

Antonio Brogi; Marco Danelutto; Daniele De Sensi; Ahmad Ibrahim; Jacopo Soldani; Massimo Torquati

Parallel design patterns can be fruitfully combined to develop parallel software applications. Different combinations of patterns can feature different QoS while being functionally equivalent. To support application developers in selecting the best combinations of patterns to develop their applications, we hereby propose a probabilistic approach that permits analysing, at design time, multiple QoS attributes of parallel design patterns-based application. We also present a proof-of-concept implementation of our approach, together with some experimental results.

parallel, distributed and network-based processing | 2017

Evaluating Concurrency Throttling and Thread Packing on SMT Multicores

Marco Danelutto; Tiziano De Matteis; Daniele De Sensi; Massimo Torquati

Power-aware computing is gaining an increasing attention both in academic and industrial settings. The problem of guaranteeing a given QoS requirement (either in terms of performance or power consumption) can be faced by selecting and dynamically adapting the amount of physical and logical resources used by the application. In this study, we considered standard multicore platforms by taking as a reference approaches for power-aware computing two well-known dynamic reconfiguration techniques: Concurrency Throttling and Thread Packing. Furthermore, we also studied the impact of using simultaneous multithreading (e.g., Intels HyperThreading) in both techniques. In this work, leveraging on the applications of the PARSEC benchmark suite, we evaluate these techniques by considering performance-power trade-offs, resource efficiency, predictability and required programming effort. The results show that, according to the comparison criteria, these techniques complement each other.

International Journal of Parallel Programming | 2017

The RePhrase Extended Pattern Set for Data Intensive Parallel Computing

Marco Danelutto; Tiziano De Matteis; Daniele De Sensi; Gabriele Mencagli; Massimo Torquati; Marco Aldinucci; Peter Kilpatrick

We discuss the extended parallel pattern set identified within the EU-funded project RePhrase as a candidate pattern set to support data intensive applications targeting heterogeneous architectures. The set has been designed to include three classes of pattern, namely (1) core patterns, modelling common, not necessarily data intensive parallelism exploitation patterns, usually to be used in composition; (2) high level patterns, modelling common, complex and complete parallelism exploitation patterns; and (3) building block patterns, modelling the single components of data intensive applications, suitable for use—in composition—to implement patterns not covered by the core and high level patterns. We discuss the expressive power of the RePhrase extended pattern set and results illustrating the performances that may be achieved with the FastFlow implementation of the high level patterns.

knowledge discovery and data mining | 2018

D2K: Scalable Community Detection in Massive Networks via Small-Diameter k-Plexes

Alessio Conte; Tiziano De Matteis; Daniele De Sensi; Roberto Grossi; Andrea Marino; Luca Versari

This paper studies k-plexes, a well known pseudo-clique model for network communities. In a k-plex, each node can miss at most k-1 links. Our goal is to detect large communities in todays real-world graphs which can have hundreds of millions of edges. While many have tried, this task has been elusive so far due to its computationally challenging nature: k-plexes and other pseudo-cliques are harder to find and more numerous than cliques, a well known hard problem. We present D2K, which is the first algorithm able to find large k-plexes of very large graphs in just a few minutes. The good performance of our algorithm follows from a combination of graph-theoretical concepts, careful algorithm engineering and a high-performance implementation. In particular, we exploit the low degeneracy of real-world graphs, and the fact that large enough k-plexes have diameter 2. We validate a sequential and a parallel/distributed implementation of D2K on real graphs with up to half a billion edges.

Explore More