Is this you? Create Your Porfile

Alexander Matveev

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander Matveev is active.

Explore More

Publication

Featured researches published by Alexander Matveev.

european conference on computer systems | 2014

StackTrack: an automated transactional approach to concurrent memory reclamation

Dan Alistarh; Patrick Eugster; Maurice Herlihy; Alexander Matveev; Nir Shavit

Dynamic memory reclamation is arguably the biggest open problem in concurrent data structure design: all known solutions induce high overhead, or must be customized to the specific data structure by the programmer, or both. This paper presents StackTrack, the first concurrent memory reclamation scheme that can be applied automatically by a compiler, while maintaining efficiency. StackTrack eliminates most of the expensive bookkeeping required for memory reclamation by leveraging the power of hardware transactional memory (HTM) in a new way: it tracks thread variables dynamically, and in an atomic fashion. This effectively makes all memory references visible without having threads pay the overhead of writing out this information. Our empirical results show that this new approach matches or outperforms prior, non-automated, techniques.

international symposium on distributed computing | 2012

Pessimistic software lock-elision

Yehuda Afek; Alexander Matveev; Nir Shavit

Read-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes. This paper introduces pessimistic lock-elision (PLE), a new approach for non-speculatively replacing read-write locks with pessimistic (i.e. non-aborting) software transactional code that allows read-write concurrency even for contended code and even if the code includes system calls. On systems with hardware transactional support, PLE will allow failed transactions, or ones that contain system calls, to preserve read-write concurrency. Our PLE algorithm is based on a novel encounter-order design of a fully pessimistic STM system that in a variety of benchmarks spanning from counters to trees, even when up to 40% of calls are mutating the locked structure, provides up to 5 times the performance of a state-of-the-art read-write lock.

symposium on operating systems principles | 2015

Read-log-update: a lightweight synchronization mechanism for concurrent programming

Alexander Matveev; Nir Shavit; Pascal Felber; Patrick Marlier

This paper introduces read-log-update (RLU), a novel extension of the popular read-copy-update (RCU) synchronization mechanism that supports scalability of concurrent code by allowing unsynchronized sequences of reads to execute concurrently with updates. RLU overcomes the major limitations of RCU by allowing, for the first time, concurrency of reads with multiple writers, and providing automation that eliminates most of the programming difficulty associated with RCU programming. At the core of the RLU design is a logging and coordination mechanism inspired by software transactional memory algorithms. In a collection of micro-benchmarks in both the kernel and user space, we show that RLU both simplifies the code and matches or improves on the performance of RCU. As an example of its power, we show how it readily scales the performance of a real-world application, Kyoto Cabinet, a truly difficult concurrent programming feat to attempt in general, and in particular with classic RCU.

acm symposium on parallel algorithms and architectures | 2015

ThreadScan: Automatic and Scalable Memory Reclamation

Dan Alistarh; William M. Leiserson; Alexander Matveev; Nir Shavit

The concurrent memory reclamation problem is that of devising a way for a deallocating thread to verify that no other concurrent threads hold references to a memory block being deallocated. To date, in the absence of automatic garbage collection, there is no satisfactory solution to this problem. Existing tracking methods like hazard pointers, reference counters, or epoch-based techniques like RCU, are either prohibitively expensive or require significant programming expertise, to the extent that implementing them efficiently can be worthy of a publication. None of the existing techniques are automatic or even semi-automated. In this paper, we take a new approach to concurrent memory reclamation: instead of manually tracking access to memory locations as done in techniques like hazard pointers, or restricting shared accesses to specific epoch boundaries as in RCU, our algorithm, called ThreadScan, leverages operating system signaling to automatically detect which memory locations are being accessed by concurrent threads. Initial empirical evidence shows that ThreadScan scales surprisingly well and requires negligible programming effort beyond the standard use of Malloc and Free.

international symposium on distributed computing | 2015

Amalgamated Lock-Elision

Yehuda Afek; Alexander Matveev; Oscar Moll; Nir Shavit

Hardware lock-elision HLE introduces concurrency into legacy lock-based code by optimistically executing critical sections in a fast-path as hardware transactions. Its main limitation is that in case of repeated aborts, it reverts to a fallback-path that acquires a serial lock. This fallback-path lacks hardware-software concurrency, because all fast-path hardware transactions abort and wait for the completion of the fallback. Software lock elision has no such limitation, but the overheads incurred are simply too high. We propose amalgamated lock-elision ALE, a novel lock-elision algorithm that provides hardware-software concurrency and efficiency: the fallback-path executes concurrently with fast-path hardware transactions, while the common-path fast-path reads incur no overheads and proceed without any instrumentation. The key idea in ALE is to use a sequence of fine-grained locks in the fallback-path to detect conflicts with the fast-path, and at the same time reduce the costs of these locks by executing the fallback-path as a series segments, where each segment is a dynamic length short hardware transaction. We implemented ALE into GCC and tested the new system on Intel Haswell 16-way chip that provides hardware transactions. We benchmarked linked-lists, hash-tables and red-black trees, as well as converting KyotoCacheDB to use ALE in GCC, and all show that ALE significantly outperforms HLE.

european conference on computer systems | 2016

Hardware read-write lock elision

Pascal Felber; Shady Issa; Alexander Matveev; Paolo Romano

Hardware Lock Elision (HLE) represents a promising technique to enhance parallelism of concurrent applications relying on conventional, lock-based synchronization. The idea at the basis of current HLE approaches is to wrap critical sections into hardware transactions: this allows critical sections to be executed in parallel using a speculative approach, while leveraging on conflict detection capabilities provided by hardware transactions to ensure equivalent semantics to pessimistic lock-based synchronization. In this paper we present RW-LE, the first HLE approach targeting read-write locks. RW-LE introduces an innovative hardware-software co-design that exploits two recent micro-architectural features of POWER8 processors: suspending/resuming transaction execution and rollback-only transactions. RW-LEs original design provides two major benefits with respect to existing HLE techniques: i) eliding the read lock without resorting to the use of hardware transactions, and ii) avoiding to track read memory accesses issued in the write critical section. We evaluate RW-LE by means of an extensive experimental study based on a variety of benchmarks and real-life, complex applications. Our results demonstrate that RW-LE can provide striking performance gain of up to one order of magnitude with respect to state of the art HLE approaches.

european conference on computer systems | 2017

Forkscan: Conservative Memory Reclamation for Modern Operating Systems

Dan Alistarh; William M. Leiserson; Alexander Matveev; Nir Shavit

The problem of efficient concurrent memory reclamation in unmanaged languages such as C or C++ is one of the major challenges facing the parallelization of billions of lines of legacy code. Garbage collectors for C/C++ can be inefficient; thus, programmers are often forced to use finely-crafted concurrent memory reclamation techniques. These techniques can provide good performance, but require considerable programming effort to deploy, and have strict requirements, allowing the programmer very little room for error. In this work, we present Forkscan, a new conservative concurrent memory reclamation scheme which is fully automatic and surprisingly scalable. Forkscans semantics place it between automatic garbage collectors (it requires the programmer to explicitly retire nodes before they can be reclaimed), and concurrent memory reclamation techniques (as it does not assume that nodes are completely unlinked from the data structure for correctness). Forkscans implementation exploits these new semantics for efficiency: we leverage parallelism and optimized implementations of signaling and copy-on-write in modern operating systems to efficiently obtain and process consistent snapshots of memory that can be scanned concurrently with the normal program operation. Empirical evaluation on a range of classical concurrent data structure microbenchmarks shows that Forkscan can preserve the scalability of the original code, while maintaining an order of magnitude lower latency than automatic garbage collection, and demonstrating competitive performance with finely crafted memory reclamation techniques.

international conference on distributed computing systems | 2014

The LevelArray: A Fast, Practical Long-Lived Renaming Algorithm

Dan Alistarh; Justin Kopinsky; Alexander Matveev; Nir Shavit

The long-lived renaming problem appears in shared-memory systems where a set of threads need to register and deregister frequently from the computation, while concurrent operations scan the set of currently registered threads. Instances of this problem show up in concurrent implementations of transactional memory, flat combining, thread barriers, and memory reclamation schemes for lock-free data structures. In this paper, we analyze a randomized solution for long-lived renaming. The algorithmic technique we consider, called the Level Array, has previously been used for hashing and one-shot (single-use) renaming. Our main contribution is to prove that, in long-lived executions, where processes may register and deregister polynomially many times, the technique guarantees constant steps on average and O (log log n) steps with high probability for registering, unit cost for deregistering, and O (n) steps for collect queries, where n is an upper bound on the number of processes that may be active at any point in time. We also show that the algorithm has the surprising property that it is self-healing: under reasonable assumptions on the schedule, operations running while the data structure is in a degraded state implicitly help the data structure re-balance itself. This subtle mechanism obviates the need for expensive periodic rebuilding procedures. Our benchmarks validate this approach, showing that, for typical use parameters, the average number of steps a process takes to register is less than two and the worst-case number of steps is bounded by six, even in executions with billions of operations. We contrast this with other randomized implementations, whose worst-case behavior we show to be unreliable, and with deterministic implementations, whose cost is linear in n.

Archive | 2010