Is this you? Create Your Porfile

Nir Shavit

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nir Shavit is active.

Explore More

Publication

Featured researches published by Nir Shavit.

international symposium on distributed computing | 2006

Transactional locking II

David Dice; Ori Shalev; Nir Shavit

The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any systems memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

Distributed Computing | 1997

Software transactional memory

Nir Shavit; Dan Touitou

Summary. As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.

acm symposium on parallel algorithms and architectures | 2010

Flat combining and the synchronization-parallelism tradeoff

Danny Hendler; Itai Incze; Nir Shavit; Moran Tzafrir

In-memory data management systems, such as key-value stores, have become an essential infrastructure in todays big-data processing and cloud computing. They rely on efficient index structures to access data. While unordered indexes, such as hash tables, can perform point search with O(1) time, they cannot be used in many scenarios where range queries must be supported. Many ordered indexes, such as B+ tree and skip list, have a O(log N) lookup cost, where N is number of keys in an index. For an ordered index hosting billions of keys, it may take more than 30 key-comparisons in a lookup, which is an order of magnitude more expensive than that on a hash table. With availability of large memory and fast network in todays data centers, this O(log N) time is taking a heavy toll on applications that rely on ordered indexes. In this paper we introduce a new ordered index structure, named Wormhole, that takes O(log L) worst-case time for looking up a key with a length of L. The low cost is achieved by simultaneously leveraging strengths of three indexing structures, namely hash table, prefix tree, and B+ tree, to orchestrate a single fast ordered index. Wormholes range operations can be performed by a linear scan of a list after an initial lookup. This improvement of access efficiency does not come at a price of compromised space efficiency. Instead, Wormholes index space is comparable to those of B+ tree and skip list. Experiment results show that Wormhole outperforms skip list, B+ tree, ART, and Masstree by up to 8.4x, 4.9x, 4.3x, and 6.6x in terms of key lookup throughput, respectively.Traditional data structure designs, whether lock-based or lock-free, provide parallelism via fine grained synchronization among threads. We introduce a new synchronization paradigm based on coarse locking, which we call flat combining. The cost of synchronization in flat combining is so low, that having a single thread holding a lock perform the combined access requests of all others, delivers, up to a certain non-negligible concurrency level, better performance than the most effective parallel finely synchronized implementations. We use flat-combining to devise, among other structures, new linearizable stack, queue, and priority queue algorithms that greatly outperform all prior algorithms.

symposium on the theory of computing | 1993

The asynchronous computability theorem for t -resilient tasks

Maurice Herlihy; Nir Shavit

We give necessary and sufficient combinatorial conditions characterizing the computational tasks that can be solved by N asynchronous processes, up to t of which can fail by halting. The range of possible input and output values for an asynchronous task can be associated with a high-dimensional geometric structure called a simplicial complex. Our main theorem characterizes computability y in terms of the topological properties of this complex. Most notably, a given task is computable only if it can be associated with a complex that is simply connected with trivial homology groups. In other words, the complex has “no holes!” Applications of this characterization include the first impossibility results for several long-standing open problems in distributed computing, such as the “renaming” problem of Attiya et. al., the “k-set agreement” problem of Chaudhuri, and a generalization of the approximate agreement problem.

Journal of the ACM | 1994

Counting networks

James Aspnes; Maurice Herlihy; Nir Shavit

Many fundamental multi-processor coordination problems can be expressed as <italic>counting problems</italic>: Processes must cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention. Motivated by observations on the behavior of sorting networks, we offer a new approach to solving such problems, by introducing <italic>counting networks</italic>, a new class of networks that can be used to count. We give two counting network constructions, one of depth log <italic>n</italic>(1 + log <italic>n</italic>)/2 using <italic>n</italic> log <italic></italic>(1 + log <italic>n</italic>)/4 “gates,” and a second of depth log<supscrpt>2</supscrpt> <italic>n</italic> using <italic>n</italic> log<supscrpt>2</supscrpt> <italic>n</italic>/2 gates. These networks avoid the sequential bottlenecks inherent to earlier solutions and substantially lower the memory contention. Finally, to show that counting networks are not merely mathematical creatures, we provide experimental evidence that they outperform conventional synchronization techniques under a variety of circumstances.

Journal of the ACM | 2006

Split-ordered lists: Lock-free extensible hash tables

Ori Shalev; Nir Shavit

We present the first lock-free implementation of an extensible hash table running on current architectures. Our algorithm provides concurrent insert, delete, and find operations with an expected O(1) cost. It consists of very simple code, easily implementable using only load, store, and compare-and-swap operations. The new mathematical structure at the core of our algorithm is recursive split-ordering, a way of ordering elements in a linked list so that they can be repeatedly “split” using a single compare-and-swap operation. Metaphorically speaking, our algorithm differs from prior known algorithms in that extensibility is derived by “moving the buckets among the items” rather than “the items among the buckets.” Though lock-free algorithms are expected to work best in multiprogrammed environments, empirical tests we conducted on a large shared memory multiprocessor show that even in non-multiprogrammed environments, the new algorithm performs as well as the most efficient known lock-based resizable hash-table algorithm, and in high load cases it significantly outperforms it.

international conference on principles of distributed systems | 2005

A lazy concurrent list-based set algorithm

Steve Heller; Maurice Herlihy; Victor Luchangco; Mark Moir; William N. Scherer; Nir Shavit

List-based implementations of sets are a fundamental building block of many concurrent algorithms. A skiplist based on the lock-free list-based set algorithm of Michael will be included in the JavaTM Concurrency Package of JDK 1.6.0. However, Michaels lock-free algorithm has several drawbacks, most notably that it requires all list traversal operations, including membership tests, to perform cleanup operations of logically removed nodes, and that it uses the equivalent of an atomically markable reference, a pointer that can be atomically “marked,” which is expensive in some languages and unavailable in others. We present a novel “lazy” list-based implementation of a concurrent set object. It is based on an optimistic locking scheme for inserts and removes, eliminating the need to use the equivalent of an atomically markable reference. It also has a novel wait-free membership test operation (as opposed to Michaels lock-free one) that does not need to perform cleanup operations and is more efficient than that of all previous algorithms. Empirical testing shows that the new lazy-list algorithm consistently outperforms all known algorithms, including Michaels lock-free algorithm, throughout the concurrency range. At high load, with 90% membership tests, the lazy algorithm is more than twice as fast as Michaels. This is encouraging given that typical search structure usage patterns include around 90% membership tests. By replacing the lock-free membership test of Michaels algorithm with our new wait-free one, we achieve an algorithm that slightly outperforms our new lazy-list (though it may not be as efficient in other contexts as it uses Javas RTTI mechanism to create pointers that can be atomically marked).

Journal of Parallel and Distributed Computing | 2010

A scalable lock-free stack algorithm

Danny Hendler; Nir Shavit; Lena Yerushalmi

The literature describes two high performance concurrent stack algorithms based on combining funnels and elimination trees. Unfortunately, the funnels are linearizable but blocking, and the elimination trees are non-blocking but not linearizable. Neither is used in practice since they perform well only at exceptionally high loads. The literature also describes a simple lock-free linearizable stack algorithm that works at low loads but does not scale as the load increases. The question of designing a stack algorithm that is non-blocking, linearizable, and scales well throughout the concurrency range, has thus remained open. This paper presents such a concurrent stack algorithm. It is based on the following simple observation: that a single elimination array used as a backoff scheme for a simple lock-free stack is lock-free, linearizable, and scalable. As our empirical results show, the resulting elimination-backoff stack performs as well as the simple stack at low loads, and increasingly outperforms all other methods (lock-based and non-blocking) as concurrency increases. We believe its simplicity and scalability make it a viable practical alternative to existing constructions for implementing concurrent stacks.

Parallel Processing Letters | 2007

A Lazy Concurrent List-Based Set Algorithm

Steve Heller; Maurice Herlihy; Victor Luchangco; Mark Moir; William N. Scherer; Nir Shavit

We present a novel “lazy” list-based implementation of a concurrent set object. It is based on an optimistic locking scheme for inserts and removes and includes a simple wait-free membership test. Our algorithm improves on the performance of all previous such algorithms.

Journal of the ACM | 1994

Are wait-free algorithms fast?

Hagit Attiya; Nancy A. Lynch; Nir Shavit

The time complexity of wait-free algorithms in “normal” executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log <italic>n</italic> on the time complexity of any wait-free algorithm that achieves <italic>approximate agreement</italic> among <italic>n</italic> processes is proved. In contrast, there exists a non-wait-free algorithm that solves this problem in constant time. This implies an &OHgr;(log <italic>n</italic>) time separation between the wait-free and non-wait-free computation models. On the positive side, we present an O(log <italic>n</italic>) time wait-free approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.

Explore More