Panagiota Fatourou
University of Crete
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Panagiota Fatourou.
principles of distributed computing | 2010
Faith Ellen; Panagiota Fatourou; Eric Ruppert; Franck van Breugel
This paper describes the first complete implementation of a non-blocking binary search tree in an asynchronous shared-memory system using single-word compare-and-swap operations. The implementation is linearizable and tolerates any number of crash failures. Insert and Delete operations that modify different parts of the tree do not interfere with one another, so they can run completely concurrently. Find operations only perform reads of shared memory.
acm sigplan symposium on principles and practice of parallel programming | 2012
Panagiota Fatourou; Nikolaos D. Kallimanis
Fine-grain thread synchronization has been proved, in several cases, to be outperformed by efficient implementations of the combining technique where a single thread, called the combiner, holding a coarse-grain lock, serves, in addition to its own synchronization request, active requests announced by other threads while they are waiting by performing some form of spinning. Efficient implementations of this technique significantly reduce the cost of synchronization, so in many cases they exhibit much better performance than the most efficient finely synchronized algorithms. In this paper, we revisit the combining technique with the goal to discover where its real performance power resides and whether or how ensuring some desired properties (e.g., fairness in serving requests) would impact performance. We do so by presenting two new implementations of this technique; the first (CC-Synch) addresses systems that support coherent caches, whereas the second (DSM-Synch) works better in cacheless NUMA machines. In comparison to previous such implementations, the new implementations (1) provide bounds on the number of remote memory references (RMRs) that they perform, (2) support a stronger notion of fairness, and (3) use simpler and less basic primitives than previous approaches. In all our experiments, the new implementations outperform by far all previous state-of-the-art combining-based and fine-grain synchronization algorithms. Our experimental analysis sheds light to the questions we aimed to answer. Several modern multi-core systems organize the cores into clusters and provide fast communication within the same cluster and much slower communication across clusters. We present an hierarchical version of CC-Synch, called H-Synch which exploits the hierarchical communication nature of such systems to achieve better performance. Experiments show that H-Synch significantly outper forms previous state-of-the-art hierarchical approaches. We provide new implementations of common shared data structures (like stacks and queues) based on CC-Synch, DSM-Synch and H-Synch. Our experiments show that these implementations outperform by far all previous (fine-grain or combined-based) implementations of shared stacks and queues.
acm symposium on parallel algorithms and architectures | 2011
Panagiota Fatourou; Nikolaos D. Kallimanis
We present a new simple wait-free universal construction, called Sim, that uses just a Fetch&Add and an LL/SC object and performs a constant number of shared memory accesses. We have implemented SIM in a real shared-memory machine. In theory terms, our practical version of SIM, called P-SIM, has worse complexity than its theoretical analog; in practice though, we experimentally show that P-SIM outperforms several state-of-the-art lock-based and lock-free techniques, and this given that it is wait-free, i.e., that it satisfies a stronger progress condition than all the algorithms it outperforms. We have used P-SIM to get highly-efficient wait-free implementations of stacks and queues. Our experiments show that our implementations outperform the currently state-of-the-art shared stack and queue implementations which ensure only weaker progress properties than wait-freedom.
symposium on the theory of computing | 2003
Panagiota Fatourou; Faith E. Fich; Eric Ruppert
A snapshot object consists of a collection of m > 1 components, each capable of storing a value, shared by n processes in an asynchronous shared-memory distributed system. It supports two operations: a process can UPDATE any individual component or atomically SCAN the entire collection to obtain the values of all the components. It is possible to implement a snapshot object using m registers so that each operation takes O(mn) time.In a previous paper, we proved that m registers are necessary to implement a snapshot object with m < n-1 components. Here we prove that, for any such space-optimal implementation, Ω(mn) steps are required to perform a SCAN operation in the worst case, matching the upper bound. We also extend our space and time lower bounds to implementations that use single-writer registers in addition to the multi-writer registers. Specifically, we prove that at least m multi-writer registers are still needed, provided the SCANS do not read a large fraction of the single-writer registers. We also prove that any implementation that uses single-writer registers in addition to
principles of distributed computing | 2002
Panagiota Fatourou; Faith E. Fich; Eric Ruppert
m
principles of distributed computing | 2014
Faith Ellen; Panagiota Fatourou; Joanna Helga; Eric Ruppert
multi-writer registers uses Ω(√mn) steps in the worst case. Our proof yields insight into the structure of any implementation that uses only m multi-writer registers, showing that processes must access the multi-writer registers in a very constrained way.
symposium on the theory of computing | 2006
Panagiota Fatourou; Faith E. Fich; Eric Ruppert
We consider the problem of wait-free implementation of a multi-writer snapshot object with m ≥ 2 components shared by n > m processes. It is known that this can be done using m multi-writer registers. We give a matching lower bound, slightly improving the previous space lower bound. The main focus of the paper, however, is on time complexity. The best known upper bound on the number of steps a process has to take to perform one operation of the snapshot is O(n). When m is much smaller than n, an implementation whose time complexity is a function of m rather than n would be better. We show that this cannot be achieved for any space-optimal implementation: We prove that Ω(n) steps are required to perform a SCAN operation in the worst case, even if m = 2. This significantly improves previous Ω(min(m, n)) lower bounds. Our proof also yields insight into the structure of any space-optimal implementation, showing that processes simulating the snapshot operations must access the registers in a very constrained way.
Bulletin of The European Association for Theoretical Computer Science | 2015
Dmytro Dziuma; Panagiota Fatourou; Eleni Kanellou
We improve upon an existing non-blocking implementation of a binary search tree from single-word compare-and-swap instructions. We show that the worst-case amortized step complexity of performing a Find, Insert or Delete operation op on the tree is O(h(op)+c(op)) where h(op) is the height of the tree at the beginning of op and c(op) is the maximum number of operations accessing the tree at any one time during op. This is the first bound on the complexity of a non-blocking implementation of a search tree.
international symposium on distributed computing | 2009
Panagiota Fatourou; Nikolaos D. Kallimanis
A snapshot object is an abstraction of the fundamental problem of obtaining a consistent view of the contents of the shared memory in a distributed system while other processes may concurrently update those contents. A snapshot object stores an array of m components and can be accessed by two operations: an UPDATE that changes the value of an individual component and a powerful SCAN that returns the contents of the entire array.This paper proves time-space tradeoffs for fault-tolerant implementations of a snapshot object from registers that support only Read and Write operations. For anonymous implementations (where all processes are programmed identically), we prove that a SCAN requires Ω(n/r) time, where n is the number of processes in the system and r is the number of registers used by the implementation. For the general non-anonymous case, we prove that, for any fixed r, the time required to do a SCAN grows without bound as n increases. These tradeoffs hold even in the case where the snapshot object has just two components.This is the first time a lower bound on the tradeoff between time complexity and the number of registers has been proved for any problem in asynchronous shared-memory systems. We introduce a new tool for proving distributed lower bounds: the notion of a shrinkable execution, from which an adversary can remove portions as necessary.
Distributed Computing | 2016
Faith Ellen; Panagiota Fatourou; Eleftherios Kosmas; Alessia Milani; Corentin Travers
This chapter provides formal definitions for a comprehensive collection of consistency conditions for transactional memory (TM) computing. We express all conditions in a uniform way using a formal framework that we present. For each of the conditions, we provide two versions: one that allows a transaction T to read the value of a data item written by another transaction T′ that can be live and not yet commit-pending provided that T′ will eventually commit, and a version which allows transactions to read values written only by transactions that have either committed before T starts or are commit-pending. Deriving the first version for a consistency condition was not an easy task but it has the benefit that this version is weaker than the second one and so it results in a wider universe of algorithms which there is no reason to exclude from being considered correct. The formalism for the presented consistency conditions is not based on any unrealistic assumptions, such as that transactional operations are executed atomically or that write operations write distinct values for data items. Making such assumptions facilitates the task of formally expressing the consistency conditions significantly, but results in formal presentations of them that are unrealistic, i.e. that cannot be used to characterize the correctness of most of the executions produced by any reasonable TM algorithm.