Tiziano De Matteis
University of Pisa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tiziano De Matteis.
acm sigplan symposium on principles and practice of parallel programming | 2016
Tiziano De Matteis; Gabriele Mencagli
This paper addresses the problem of designing scaling strategies for elastic data stream processing. Elasticity allows applications to rapidly change their configuration on-the-fly (e.g., the amount of used resources) in response to dynamic workload fluctuations. In this work we face this problem by adopting the Model Predictive Control technique, a control-theoretic method aimed at finding the optimal application configuration along a limited prediction horizon in the future by solving an online optimization problem. Our control strategies are designed to address latency constraints, using Queueing Theory models, and energy consumption by changing the number of used cores and the CPU frequency through the Dynamic Voltage and Frequency Scaling (DVFS) support available in the modern multicore CPUs. The proactive capabilities, in addition to the latency- and energy-awareness, represent the novel features of our approach. To validate our methodology, we develop a thorough set of experiments on a high-frequency trading application. The results demonstrate the high-degree of flexibility and configurability of our approach, and show the effectiveness of our elastic scaling strategies compared with existing state-of-the-art techniques used in similar scenarios.
Journal of Systems and Software | 2017
Tiziano De Matteis; Gabriele Mencagli
We design a predictive methodology for elastic data stream processing.We exploit Model Predictive Control to design the predictive controller.We regulate the number of used cores and the CPU frequency.The approach targets multicore-based shared-memory systems.The approach allows the design of strategies that achieve good SASO trade-offs. Data stream processing applications have a long running nature (24hr/7d) with workload conditions that may exhibit wide variations at run-time. Elasticity is the term coined to describe the capability of applications to change dynamically their resource usage in response to workload fluctuations. This paper focuses on strategies for elastic data stream processing targeting multicore systems. The key idea is to exploit Model Predictive Control, a control-theoretic method that takes into account the system behavior over a future time horizon in order to decide the best reconfiguration to execute. We design a set of energy-aware proactive strategies, optimized for throughput and latency QoS requirements, which regulate the number of used cores and the CPU frequency through the Dynamic Voltage and Frequency Scaling (DVFS) support offered by modern multicore CPUs. We evaluate our strategies in a high-frequency trading application fed by synthetic and real-world workload traces. We introduce specific properties to effectively compare different elastic approaches, and the results show that our strategies are able to achieve the best outcome.
International Journal of Parallel Programming | 2017
Tiziano De Matteis; Gabriele Mencagli
The topic of Data Stream Processing is a recent and highly active research area dealing with the in-memory, tuple-by-tuple analysis of streaming data. Continuous queries typically consume huge volumes of data received at a great velocity. Solutions that persistently store all the input tuples and then perform off-line computation are impractical. Rather, queries must be executed continuously as data cross the streams. The goal of this paper is to present parallel patterns for window-based stateful operators, which are the most representative class of stateful data stream operators. Parallel patterns are presented “à la” Algorithmic Skeleton, by explaining the rationale of each pattern, the preconditions to safely apply it, and the outcome in terms of throughput, latency and memory consumption. The patterns have been implemented in the
parallel, distributed and network-based processing | 2014
Daniele Buono; Tiziano De Matteis; Gabriele Mencagli; Marco Vanneschi
trust, security and privacy in computing and communications | 2015
Marco Danelutto; Tiziano De Matteis; Gabriele Mencagli; Massimo Torquati
\mathtt {FastFlow}
symposium on applied computing | 2017
Marco Danelutto; Tiziano De Matteis; Daniele De Sensi; Gabriele Mencagli; Massimo Torquati
IEEE Transactions on Parallel and Distributed Systems | 2017
Gabriele Mencagli; Massimo Torquati; Marco Danelutto; Tiziano De Matteis
FastFlow framework targeting off-the-shelf multicores. To the best of our knowledge this is the first time that a similar effort to merge the Data Stream Processing domain and the field of Structured Parallelism has been made.
ACM Transactions on Architecture and Code Optimization | 2017
Daniele De Sensi; Tiziano De Matteis; Massimo Torquati; Gabriele Mencagli; Marco Danelutto
Shared-memory and message-passing are two opposite models to develop parallel computations. The shared-memory model, adopted by existing frameworks such as OpenMP, represents a de-facto standard on multi-/many-core architectures. However, message-passing deserves to be studied for its inherent properties in terms of portability and flexibility as well as for its better ease of debugging. Achieving good performance from the use of messages in shared-memory architectures requires an efficient implementation of the run-time support. This paper investigates the definition of a delegation mechanism on multi-threaded architectures able to: (i) overlap communications with calculation phases, (ii) parallelize distribution and collective operations. Our ideas have been exemplified using two parallel benchmarks on the Intel Phi, showing that in these applications our message-passing support outperforms MPI and reaches similar performance compared to standard OpenMP implementations.
The Journal of Supercomputing | 2016
Marco Danelutto; Tiziano De Matteis; Gabriele Mencagli; Massimo Torquati
With the wide diffusion of parallel architectures parallelism has become an indispensable factor in the application design. However, the cost of the parallelization process of existing applications is still too high in terms of time-to-development, and often requires a large effort and expertise by the programmer. The REPARA methodology consists in a systematic way to express parallel patterns by annotating the source code using C++11 attributes transformed automatically in a target parallel code based on parallel programming libraries (e.g. FastFlow, Intel TBB). In this paper we apply this approach in the parallelization of a real high-frequency trading application. The description shows the effectiveness of the approach in easily prototyping several parallel variants of the same code. We also propose an extension of a REPARA attribute to express a user-defined scheduling strategy, which makes it possible to design a high-throughput and low-latency parallelization of our code outperforming the other parallel variants in most of the considered test-cases.
parallel, distributed and network-based processing | 2017
Tiziano De Matteis; Gabriele Mencagli
High-level parallel programming is a de-facto standard approach to develop parallel software with reduced time to development. High-level abstractions are provided by existing frameworks as pragma-based annotations in the source code, or through pre-built parallel patterns that recur frequently in parallel algorithms, and that can be easily instantiated by the programmer to add a structure to the development of parallel software. In this paper we focus on this second approach and we propose P3ARSEC, a benchmark suite for parallel pattern-based frameworks consisting of a representative subset of PARSEC applications. We analyse the programmability advantages and the potential performance penalty of using such high-level methodology with respect to hand-made parallelisations using low-level mechanisms. The results are obtained on the new Intel Knights Landing multicore, and show a significantly reduced code complexity with comparable performance.