Vincent Gramoli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vincent Gramoli is active.

Explore More

Publication

Featured researches published by Vincent Gramoli.

Communications of The ACM | 2011

Why STM can be more than a research toy

Aleksandar Dragojevic; Pascal Felber; Vincent Gramoli; Rachid Guerraoui

Despite earlier claims, Software Transactional Memory outperforms sequential code.

acm sigplan symposium on principles and practice of parallel programming | 2012

A speculation-friendly binary search tree

Tyler Crain; Vincent Gramoli; Michel Raynal

We introduce the first binary search tree algorithm designed for speculative executions. Prior to this work, tree structures were mainly designed for their pessimistic (non-speculative) accesses to have a bounded complexity. Researchers tried to evaluate transactional memory using such tree structures whose prominent example is the red-black tree library developed by Oracle Labs that is part of multiple benchmark distributions. Although well-engineered, such structures remain badly suited for speculative accesses, whose step complexity might raise dramatically with contention. We show that our speculation-friendly tree outperforms the existing transaction-based version of the AVL and the red-black trees. Its key novelty stems from the decoupling of update operations: they are split into one transaction that modifies the abstraction state and multiple ones that restructure its tree implementation in the background. In particular, the speculation-friendly tree is shown correct, reusable and it speeds up a transaction-based travel reservation application by up to 3.5x.

acm sigplan symposium on principles and practice of parallel programming | 2015

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms

Vincent Gramoli

In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, Sun Microsystems and AMD. To this end, we developed in C/C++ and Java a new micro-benchmark suite, called Synchrobench, hence helping the community evaluate new data structures and synchronization techniques. The main conclusion of this evaluation is threefold: (i) although compare-and-swap helps achieving the best performance on multicores, doing so correctly is hard; (ii) optimistic locking offers varying performance results while transactional memory offers more consistent results; and (iii) copy-on-write and read-copy-update suffer more from contention than any other technique but could be combined with others to derive efficient algorithms.

working ieee/ifip conference on software architecture | 2016

The Blockchain as a Software Connector

Xiwei Xu; Cesare Pautasso; Liming Zhu; Vincent Gramoli; Alexander Ponomarev; An Binh Tran; Shiping Chen

Blockchain is an emerging technology for decentralized and transactional data sharing across a large network of untrusted participants. It enables new forms of distributed software architectures, where components can find agreements on their shared states without trusting a central integration point or any particular participating components. Considering the blockchain as a software connector helps make explicitly important architectural considerations on the resulting performance and quality attributes (for example, security, privacy, scalability and sustainability) of the system. Based on our experience in several projects using blockchain, in this paper we provide rationales to support the architectural decision on whether to employ a decentralized blockchain as opposed to other software solutions, like traditional shared data storage. Additionally, we explore specific implications of using the blockchain as a software connector including design trade-offs regarding quality attributes.

international conference on parallel processing | 2013

A contention-friendly binary search tree

Tyler Crain; Vincent Gramoli; Michel Raynal

This paper proposes a new lock-based concurrent binary tree using a methodology for writing concurrent data structures. This methodology limits the high contention induced by todays multicore environments to come up with efficient alternatives to the most widely used search structures. Data structures are generally constrained to guarantee a big-oh step complexity even in the presence of concurrency. By contrast our methodology guarantees the big-oh complexity only in the absence of contention and limits the contention when concurrency appears. The key concept lies in dividing update operations within an eager abstract access that returns rapidly for efficiency reason and a lazy structural adaptation that may be postponed to diminish contention. Our evaluation clearly shows that our lock-based tree is up to 2.2× faster than the most recent lock-based tree algorithm we are aware of.

international conference on stabilization safety and security of distributed systems | 2010

A provably starvation-free distributed directory protocol

Hagit Attiya; Vincent Gramoli; Alessia Milani

This paper presents COMBINE, a distributed directory protocol for shared objects, designed for large-scale distributed systems. Directory protocols support move requests, allowing to write the object locally, as well as lookup requests, providing a read-only copy of the object. They have been used in distributed shared memory implementations and in data-flow implementations of distributed software transactional memory in large-scale systems. The protocol runs on an overlay tree, whose leaves are the nodes of the system; it ensures that the cost of serving a request is proportional to the cost of the shortest path between the requesting node and the serving node, in the overlay tree. The correctness of the protocol, including starvation freedom, is proved, despite asynchrony and concurrent requests. The protocol avoids race conditions by combining requests that overtake each other as they pass through the same node. Using an overlay tree with a good stretch factor yields an efficient protocol, even when requests are concurrent.

international conference on principles of distributed systems | 2008

Toward a Theory of Input Acceptance for Transactional Memories

Vincent Gramoli; Derin Harmanci; Pascal Felber

Transactional memory (TM) systems receive as an input a stream of events also known as a workload , reschedule it with respect to several constraints, and output a consistent history. In multicore architectures, the transactional code executed by a processor is a stream of events whose interruption would waste processor cycles. In this paper, we formalize the notion of TM workload into classes of input patterns, whose acceptance helps understanding the performance of a given TM.

Parallel Processing Letters | 2010

On the Input Acceptance of Transactional Memory

Vincent Gramoli; Derin Harmanci; Pascal Felber

We present the Input Acceptance of Transactional Memory (TM). Despite the large interest for performance of TMs, no existing research work has investigated the impact of solving a conflict that does not need to be solved. Traditional solutions for a TM to be correct is to delay or abort a transaction as soon as it presents a risk to violate consistency. Both alternatives are costly and should be avoided if consistency is actually preserved. To address this problem, we introduce the input acceptance of a TM as its ability to commit transactions, we upper-bound the input acceptance of existing TMs and propose a new TM with higher input acceptance.

international conference on distributed computing systems | 2013

No Hot Spot Non-blocking Skip List

Tyler Crain; Vincent Gramoli; Michel Raynal

This paper presents a new non-blocking skip list algorithm. The algorithm alleviates contention by localizing synchronization at the least contended part of the structure without altering consistency of the implemented abstraction. The key idea lies in decoupling a modification to the structure into two stages: an eager abstract modification that returns quickly and whose update affects only the bottom of the structure, and a lazy selective adaptation updating potentially the entire structure but executed continuously in the background. On SPECjbb as well as on micro-benchmarks, we compared the performance of our new non-blocking skip list against the performance of the JDK non-blocking skip list. The results indicate that our implementation can me more than twice as fast as the JDK skip list.

european conference on computer systems | 2012

TM 2 C: a software transactional memory for many-cores

Vincent Gramoli; Rachid Guerraoui; Vasileios Trigonakis

Transactional memory is an appealing paradigm for concurrent programming. Many software implementations of the paradigm were proposed in the last decades for both shared memory multi-core systems and clusters of distributed machines. However, chip manufacturers have started producing many-core architectures, with low network-on-chip communication latency and limited support for cache-coherence, rendering existing transactional memory implementations inapplicable. This paper presents TM2C, the first software Transactional Memory protocol for Many-Core systems. TM2C exploits network-on-chip communications to get granted accesses to shared data through efficient message passing. In particular, it allows visible read accesses and hence effective distributed contention management with eager conflict detection. We also propose FairCM, a companion contention manager that ensures starvation-freedom, which we believe is an important property in many-core systems, as well as an implementation of elastic transactions in these settings. Our evaluation on four benchmarks, i.e., a linked list and a hash table data structures as well as a bank and a MapReduce-like applications, indicates better scalability than locks and up to 20-fold speedup (relative to bare sequential code) when running 24 application cores.

Explore More