Mauro Ianni
Sapienza University of Rome
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mauro Ianni.
ieee international conference on high performance computing data and analytics | 2015
Emanuele Santini; Mauro Ianni; Alessandro Pellegrini; Francesco Quaglia
This article presents an innovative runtime support for speculative parallel processing of discrete event simulation models on multi-core architectures, which exploits Hardware-Transactional-Memory (HTM) facilities for the purpose of state recoverability. In this proposal, the speculative updates on the state of the simulation model are executed as concurrent HTM-based transactions that are also in charge of detecting whether the update is consistent with the advancement of logical-time along model execution. Our proposal is fully transparent to the application code. Hence, our HTM-based run-time support can host conventionally developed discrete event models relying on the concept of event-handlers to be dispatched by an underlying simulation engine. Experimental data show that our proposal provides 75% to 92% of the ideal speedup on an Intel Haswell based platform (equipped with 4 physical cores and HTM support) for discrete event models with event granularity ranging between 2 and 12 microseconds. The data also show that these same models cannot be executed efficiently on top of a last generation parallel discrete event simulation platform employing software-based recoverability.
principles of advanced discrete simulation | 2017
Romolo Marotta; Mauro Ianni; Alessandro Pellegrini; Francesco Quaglia
Emerging share-everything Parallel Discrete Event Simulation (PDES) platforms rely on worker threads fully sharing the workload of events to be processed. These platforms require efficient event pool data structures enabling high concurrency of extraction/insertion operations. Non-blocking event pool algorithms are raising as promising solutions for this problem. However, the classical non-blocking paradigm leads concurrent conflicting operations, acting on a same portion of the event pool data structure, to abort and then retry. In this article we present a conflict-resilient non-blocking calendar queue that enables conflicting dequeue operations, concurrently attempting to extract the minimum element, to survive, thus improving the level of scalability of accesses to the hot portion of the data structure---namely the bucket to which the current locality of the events to be processed is bound. We have integrated our solution within an open source share-everything PDES platform and report the results of an experimental analysis of the proposed concurrent data structure compared to some literature solutions.
distributed simulation and real time applications | 2016
Romolo Marotta; Mauro Ianni; Alessandro Pellegrini; Francesco Quaglia
The large diffusion of highly-parallel shared-memory multi-core machines has led Parallel Discrete Event Simulation (PDES) platforms to a shift towards a share-everything model. This model is based on loose coupling between simulation objects and threads, lasting (as an extreme) no more than the lifetime of individual events. Concurrent threads can therefore CPU-dispatch events destined to any object at any point in time, thus fully sharing the workload of events to be processed on a fine grain basis. This demands for efficient mechanisms to share the overall pool of pending events by enabling parallelism in insertion and extraction operations. In this article we present a lock-free event pool which also provides amortized O(1) time complexity for both insertions and extractions. It can sustain highly concurrent accesses, while not leading to noticeable performance degradation when scaling up the thread count. Experimental results demonstrate that our solution stands as a core facility capable of further raising up the pragmatical impact of such an emerging share-everything PDES paradigm.
distributed simulation and real time applications | 2017
Mauro Ianni; Romolo Marotta; Alessandro Pellegrini; Francesco Quaglia
Shared-memory multi-core platforms are changing the nature of Parallel Discrete Event Simulation (PDES) because of the possibility to fully share the workload of events to be processed across threads. In this context, one rising PDES paradigm — referred to as share-everything PDES — is no longer based on the concept of (temporary) biding of simulation objects to worker threads. Rather, each worker threads can — at any time — pick from a fully shared event pool an event to process which can be destined to whatever simulation object. While attention has been posed on the design of concurrent shared pools, allowing non-blocking parallel operations, the scenario where two (or more) threads pick events destined to the same simulation object still lacks adequate synchronization support. In fact, these events are currently sequentialized and processed in a critical section touching the simulation object state, thus leading threads to mutually block each other. In this article we present the design of a share-everything speculative PDES engine that prevents mutual thread blocks because of the access to a same object state. In our design, the non-blocking property is seen as a vertical attribute of the engine (not only of the event pool). This vertical view demands for innovative event-dispatching schemes and, at the same time, innovative interactions with (and management of) the fully-shared event pool, which are features that we embed in our innovative design.
reversible computation | 2016
Davide Cingolani; Mauro Ianni; Alessandro Pellegrini; Francesco Quaglia
Speculative parallel discrete event simulation requires a support for reversing processed events, also called state recovery, when causal inconsistencies are revealed. In this article we present an approach where state recovery relies on a mix of hardware- and software-based techniques. We exploit the Hardware Transactional Memory (HTM) support, as offered by Intel Haswell CPUs, to process events as in-memory transactions, which are possibly committed only after their causal consistency is verified. At the same time, we exploit an innovative software-based reversibility technique, fully relying on transparent software instrumentation targeting x86/ELF objects, which enables undoing side effects by events with no actual backward re-computation. Each thread within our speculative processing engine dynamically (on a per-event basis) selects which recovery mode to rely on (hardware vs software) depending on varying runtime dynamics. The latter are captured by a lightweight analytic model indicating to what extent the HTM support (not paying any instrumentation cost) is efficient, and after what level of events’ parallelism it starts degrading its performance, e.g., due to excessive data conflicts while manipulating causality meta-data within HTM-based transactions. We released our implementation as open source software and provide experimental results for an assessment of its effectiveness.
principles of advanced discrete simulation | 2018
Mauro Ianni; Romolo Marotta; Davide Cingolani; Alessandro Pellegrini; Francesco Quaglia
The share-everything PDES (Parallel Discrete Event Simulation) paradigm is based on fully sharing the possibility to process any individual event across concurrent threads, rather than binding Logical Processes (LPs) and their events to threads. It allows concentrating, at any time, the computing power---the CPU-cores on board of a shared-memory machine---towards the unprocessed events that stand closest to the current commit horizon of the simulation run. This fruitfully biases the delivery of the computing power towards the hot portion of the model execution trajectory. In this article we present an innovative share-everything PDES system that provides (1) fully non-blocking coordination of the threads when accessing shared data structures and (2) fully speculative processing capabilities---Time Warp style processing---of the events. As we show via an experimental study, our proposal can cope with hard workloads where both classical Time Warp systems---based on LPs to threads binding---and previous share-everything proposals---not able to exploit fully speculative processing of the events---tend to fail in delivering adequate performance.
international conference on cluster computing | 2017
Mauro Ianni; Alessandro Pellegrini; Francesco Quaglia
We present a multi-word atomic (1,N) register for multi-core machines exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and the readers in a wait-free manner. Our proposal, called Anonymous Readers Counting (ARC), enables large-scale data sharing by admitting up to 2^{32}-2 concurrent readers on off-the-shelf 64-bit machines, as opposed to the most advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids multiple copies of the register content while accessing it—this affects classical registers algorithms based on atomic read/write operations on single words. Thus, ARC allows for higher scalability with respect to the register size.
distributed simulation and real time applications | 2017
Mauro Ianni; Romolo Marotta; Alessandro Pellegrini; Francesco Quaglia
The increasing diffusion of shared-memory multi-core machines has given rise to a change in the design of Parallel Discrete Event Simulation (PDES) platforms. In particular, the possibility to share large amounts of memory by many worker threads has lead to a boost in the adoption of non-blocking coordination algorithms, which have been proven to offer higher scalability when compared to their blocking counterparts based on critical sections. In this article we present an innovative non-blocking algorithm for computing Global Virtual Time (GVT) — namely, the current commit horizon-in multi-thread PDES engines to be run on top of multi-core machines. Beyond being non-blocking, our proposal has the advantage of providing a logarithmic (rather than linear) number of per-thread memory operations — read/write operations of values involved in the reduction for computing the GVT value-vs the amount of threads participating in the GVT computation. This allows for keeping low the actual CPU time that is required for determining the new GVT value. We compare our algorithm with a literature solution, still based on the non-blocking approach, but entailing a linear number of memory operations, quantifying the advantages from our proposal especially for very large numbers of threads participating in the GVT computation.
simulation tools and techniques for communications, networks and system | 2016
Romolo Marotta; Mauro Ianni; Alessandro Pellegrini; Francesco Quaglia
arXiv: Distributed, Parallel, and Cluster Computing | 2018
Romolo Marotta; Mauro Ianni; Alessandro Pellegrini; Andrea Scarselli; Francesco Quaglia