Is this you? Create Your Porfile

Marco Serafini

Qatar Computing Research Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marco Serafini is active.

Explore More

Publication

Featured researches published by Marco Serafini.

dependable systems and networks | 2011

Zab: High-performance broadcast for primary-backup systems

Flavio Junqueira; Benjamin Reed; Marco Serafini

Zab is a crash-recovery atomic broadcast algorithm we designed for the ZooKeeper coordination service. ZooKeeper implements a primary-backup scheme in which a primary process executes clients operations and uses Zab to propagate the corresponding incremental state changes to backup processes1. Due the dependence of an incremental state change on the sequence of changes previously generated, Zab must guarantee that if it delivers a given state change, then all other changes it depends upon must be delivered first. Since primaries may crash, Zab must satisfy this requirement despite crashes of primaries.

international conference on data engineering | 2015

The power of both choices: Practical load balancing for distributed stream processing engines

Muhammad Anis Uddin Nasir; Gianmarco De Francisci Morales; David García-Soriano; Nicolas Kourtellis; Marco Serafini

We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical “power of two choices” to a distributed streaming setting by leveraging two novel techniques: key splitting and local load estimation. In so doing, it achieves better load balancing than key grouping while being more scalable than shuffle grouping. We test PKG on several large datasets, both real-world and synthetic. Compared to standard hashing, PKG reduces the load imbalance by up to several orders of magnitude, and often achieves nearly-perfect load balance. This result translates into an improvement of up to 60% in throughput and up to 45% in latency when deployed on a real Storm cluster.

very large data bases | 2014

Accordion: elastic scalability for database systems supporting distributed transactions

Marco Serafini; Essam Mansour; Ashraf Aboulnaga; Kenneth Salem; Taha Rafiq; Umar Farooq Minhas

Providing the ability to elastically use more or fewer servers on demand (scale out and scale in) as the load varies is essential for database management systems (DBMSes) deployed on todays distributed computing platforms, such as the cloud. This requires solving the problem of dynamic (online) data placement, which has so far been addressed only for workloads where all transactions are local to one sever. In DBMSes where ACID transactions can access more than one partition, distributed transactions represent a major performance bottleneck. Scaling out and spreading data across a larger number of servers does not necessarily result in a linear increase in the overall system throughput, because transactions that used to access only one server may become distributed. In this paper we present Accordion, a dynamic data placement system for partition-based DBMSes that support ACID transactions (local or distributed). It does so by explicitly considering the affinity between partitions, which indicates the frequency in which they are accessed together by the same transactions. Accordion estimates the capacity of a server by explicitly considering the impact of distributed transactions and affinity on the maximum throughput of the server. It then integrates this estimation in a mixed-integer linear program to explore the space of possible configurations and decide whether to scale out. We implemented Accordion and evaluated it using H-Store, a shared-nothing in-memory DBMS. Our results using the TPC-C and YCSB benchmarks show that Accordion achieves benefits compared to alternative heuristics of up to an order of magnitude reduction in the number of servers used and in the amount of data migrated.

international conference on data engineering | 2016

When two choices are not enough: Balancing at scale in Distributed Stream Processing

Muhammad Anis Uddin Nasir; Gianmarco De Francisci Morales; Nicolas Kourtellis; Marco Serafini

Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated to keys which are significantly more frequent than others. Skew is remarkably more problematic in large deployments: having more workers implies fewer keys per worker, so it becomes harder to “average out” the cost of hot keys with cold keys. We propose a novel load balancing technique that uses a heavy hitter algorithm to efficiently identify the hottest keys in the stream. These hot keys are assigned to d ≥ 2 choices to ensure a balanced load, where d is tuned automatically to minimize the memory and computation cost of operator replication. The technique works online and does not require the use of routing tables. Our extensive evaluation shows that our technique can balance real-world workloads on large deployments, and improve throughput and latency by 150% and 60% respectively over the previous state-of-the-art when deployed on Apache Storm.

dependable systems and networks | 2011

Efficient model checking of fault-tolerant distributed protocols

Péter Bokor; Johannes Kinder; Marco Serafini; Neeraj Suri

To aid the formal verification of fault-tolerant distributed protocols, we propose an approach that significantly reduces the costs of their model checking. These protocols often specify atomic, process-local events that consume a set of messages, change the state of a process, and send zero or more messages. We call such events quorum transitions and leverage them to optimize state exploration in two ways. First, we generate fewer states compared to models where quorum transitions are expressed by single-message transitions. Second, we refine transitions into a set of equivalent, finer-grained transitions that allow partial-order algorithms to achieve better reduction. We implement the MP-Basset model checker, which supports refined quorum transitions. We model check protocols representing core primitives of deployed reliable distributed systems, namely: Paxos consensus, regular storage, and Byzantine-tolerant multicast. We achieve up to 92% memory and 85% time reduction compared to model checking with standard unrefined single-message transitions.

IEEE Transactions on Dependable and Secure Computing | 2011

Application-Level Diagnostic and Membership Protocols for Generic Time-Triggered Systems

Marco Serafini; Péter Bokor; Neeraj Suri; J Vinter; A Ademaj; Wolfgang Brandstätter; Fulvio Tagliabò; J Koch

We present online tunable diagnostic and membership protocols for generic time-triggered (TT) systems to detect crashes, send/receive omission faults, and network partitions. Compared to existing diagnostic and membership protocols for TT systems, our protocols do not rely on the single-fault assumption and also tolerate non-fail-silent (Byzantine) faults. They run at the application level and can be added on top of any TT system (possibly as a middleware component) without requiring modifications at the system level. The information on detected faults is accumulated using a penalty/reward algorithm to handle transient faults. After a fault is detected, the likelihood of node isolation can be adapted to different system configurations, including configurations where functions with different criticality levels are integrated. All protocols are formally verified using model checking. Using actual automotive and aerospace parameters, we also experimentally demonstrate the transient fault handling capabilities of the protocols.

automated software engineering | 2011

Supporting domain-specific state space reductions through local partial-order reduction

Péter Bokor; Johannes Kinder; Marco Serafini; Neeraj Suri

Model checkers offer to automatically prove safety and liveness properties of complex concurrent software systems, but they are limited by state space explosion. Partial-Order Reduction (POR) is an effective technique to mitigate this burden. However, applying existing notions of POR requires to verify conditions based on execution paths of unbounded length, a difficult task in general. To enable a more intuitive and still flexible application of POR, we propose local POR (LPOR). LPOR is based on the existing notion of statically computed stubborn sets, but its locality allows to verify conditions in single states rather than over long paths. As a case study, we apply LPOR to message-passing systems. We implement it within the Java Pathfinder model checker using our general Java-based LPOR library. Our experiments show significant reductions achieved by LPOR for model checking representative message-passing protocols and, maybe surprisingly, that LPOR can outperform dynamic POR.

hot topics in system dependability | 2013

Towards transparent hardening of distributed systems

Diogo Behrens; Christof Fetzer; Flavio Junqueira; Marco Serafini

In distributed systems, errors such as data corruption or arbitrary changes to the flow of programs might cause processes to propagate incorrect state across the system. To prevent error propagation in such systems, an efficient and effective technique is to harden processes against Arbitrary State Corruption (ASC) faults through local detection, without replication. For distributed systems designed from scratch, dealing with state corruption can be made fully transparent, but requires that developers follow a few concrete design patterns. In this paper, we discuss the problem of hardening existing code bases of distributed systems transparently. Existing systems have not been designed with ASC hardening in mind, so they do not necessarily follow required design patterns. For such systems, we focus here on both performance and number of changes to the existing code base. Using memcached as an example, we identify and discuss three areas of improvement: reducing the memory overhead, improving access to state variables, and supporting multi-threading. Our initial evaluation of memcached shows that our ASC-hardened version obtains a throughput that is roughly 76% of the throughput of stock memcached with 128-byte and 1k-byte messages.

social network systems | 2012

Shepherding social feed generation with Sheep

Flavio Junqueira; Vincent Leroy; Marco Serafini; Adam Silberstein

Social feeds are used in many popular Web applications. They let users produce events, as well as read feeds containing the events generated by their friends. This paper investigates the design of an in-memory platform to manage social feeds. We show that straightforward memcache implementations suffer from a low throughput due to bandwidth bottlenecks. Following these observations, we propose Sheep, a system to support applications based on social feeds that leverages data aggregation and co-location to alleviate these bottlenecks. We show experimentally that Sheep outperforms memcache implementations by a factor of 7.

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware | 2010

Weak consistency as a last resort

Marco Serafini; Flavio Junqueira

It is well-known that using a replicated service requires a tradeoff between availability and consistency. Eventual Linearizability represents a midpoint in the design spectrum, since it can be implemented ensuring availability in worst-case periods and providing strong consistency in the regular case. In this paper we show how Eventual Linearizability can be used to support master-worker schemes such as workload sharding in Web applications. We focus on a scenario where sharding maps the workload to a pool of servers spread over an unreliable wide-area network. In this context, Linearizability is desirable, but it may prevent achieving sufficient availability and performance. We show that Eventual Linearizability can significantly reduce the time to completion of the workload in some settings.

Explore More