Irina Calciu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Irina Calciu is active.

Explore More

Publication

Featured researches published by Irina Calciu.

acm sigplan symposium on principles and practice of parallel programming | 2013

NUMA-aware reader-writer locks

Irina Calciu; David Dice; Yossi Lev; Victor Luchangco; Virendra J. Marathe; Nir Shavit

Non-Uniform Memory Access (NUMA) architectures are gaining importance in mainstream computing systems due to the rapid growth of multi-core multi-chip machines. Extracting the best possible performance from these new machines will require us to revisit the design of the concurrent algorithms and synchronization primitives which form the building blocks of many of todays applications. This paper revisits one such critical synchronization primitive -- the reader-writer lock. We present what is, to the best of our knowledge, the first family of reader-writer lock algorithms tailored to NUMA architectures. We present several variations which trade fairness between readers and writers for higher concurrency among readers and better back-to-back batching of writers from the same NUMA node. Our algorithms leverage the lock cohorting technique to manage synchronization between writers in a NUMA-friendly fashion, binary flags to coordinate readers and writers, and simple distributed reader counter implementations to enable NUMA-friendly concurrency among readers. The end result is a collection of surprisingly simple NUMA-aware algorithms that outperform the state-of-the-art reader-writer locks by up to a factor of 10 in our microbenchmark experiments. To evaluate our algorithms in a realistic setting we also present performance results of the kccachetest benchmark of the Kyoto-Cabinet distribution, an open-source database which makes heavy use of pthread reader-writer locks. Our locks boost the performance of kccachetest by up to 40% over the best prior alternatives.

international conference on parallel architectures and compilation techniques | 2014

Invyswell: a hybrid transactional memory for haswell's restricted transactional memory

Irina Calciu; Justin E. Gottschlich; Tatiana Shpeisman; Gilles Pokam; Maurice Herlihy

The Intel Haswell processor includes restricted transactional memory (RTM), which is the first commodity-based hardware transactional memory (HTM) to become publicly available. However, like other real HTMs, such as IBMs Blue Gene/Q, Haswells RTM is best-effort, meaning it provides no transactional forward progress guarantees. Because of this, a software fallback system must be used in conjunction with Haswells RTM to ensure transactional programs execute to completion. To complicate matters, Haswell does not provide escape actions. Without escape actions, non-transactional instructions cannot be executed within the context of a hardware transaction, thereby restricting the ways in which a software fallback can interact with the HTM. As such, the challenge of creating a scalable hybrid TM (HyTM) that uses Haswells RTM and a software TM (STM) fallback is exacerbated. In this paper, we present Invyswell, a novel HyTM that exploits the benefits and manages the limitations of Haswells RTM. After describing Invyswells design, we show that it outperforms NOrec, a state-of-the-art STM, by 35%, Hybrid NOrec, NOrecs hybrid implementation, by 18%, and Haswells hardware-only lock elision by 25% across all STAMP benchmarks.

international symposium on distributed computing | 2014

The Adaptive Priority Queue with Elimination and Combining

Irina Calciu; Hammurabi Mendes; Maurice Herlihy

Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for parallelism of the add() operations. In addition, methods such as Flat Combining have been proposed to reduce contention, batching together multiple operations to be executed by a single thread. While this technique can decrease lock-switching overhead and the number of pointer changes required by the removeMin() operations in the priority queue, it can also create a sequential bottleneck and limit parallelism, especially for non-conflicting add() operations.

symposium on cloud computing | 2017

Remote memory in the age of fast networks

Marcos Kawazoe Aguilera; Nadav Amit; Irina Calciu; Xavier Deguillard; Jayneel Gandhi; Pratap Subrahmanyam; Lalith Suresh; Kiran Tati; Rajesh Venkatasubramanian; Michael Wei

As the latency of the network approaches that of memory, it becomes increasingly attractive for applications to use remote memory---random-access memory at another computer that is accessed using the virtual memory subsystem. This is an old idea whose time has come, in the age of fast networks. To work effectively, remote memory must address many technical challenges. In this paper, we enumerate these challenges, discuss their feasibility, explain how some of them are addressed by recent work, and indicate other promising ways to tackle them. Some challenges remain as open problems, while others deserve more study. In this paper, we hope to provide a broad research agenda around this topic, by proposing more problems than solutions.

architectural support for programming languages and operating systems | 2017

Black-box Concurrent Data Structures for NUMA Architectures

Irina Calciu; Siddhartha Sen; Mahesh Balakrishnan; Marcos Kawazoe Aguilera

High-performance servers are Non-Uniform Memory Access (NUMA) machines. To fully leverage these machines, programmers need efficient concurrent data structures that are aware of the NUMA performance artifacts. We propose Node Replication (NR), a black-box approach to obtaining such data structures. NR takes an arbitrary sequential data structure and automatically transforms it into a NUMA-aware concurrent data structure satisfying linearizability. Using NR requires no expertise in concurrent data structure design, and the result is free of concurrency bugs. NR draws ideas from two disciplines: shared-memory algorithms and distributed systems. Briefly, NR implements a NUMA-aware shared log, and then uses the log to replicate data structures consistently across NUMA nodes. NR is best suited for contended data structures, where it can outperform lock-free algorithms by 3.1x, and lock-based solutions by 30x. To show the benefits of NR to a real application, we apply NR to the data structures of Redis, an in-memory storage system. The result outperforms other methods by up to 14x. The cost of NR is additional memory for its log and replicas.

acm symposium on parallel algorithms and architectures | 2017

Concurrent Data Structures for Near-Memory Computing

Zhiyu Liu; Irina Calciu; Maurice Herlihy; Onur Mutlu

The performance gap between memory and CPU has grown exponentially. To bridge this gap, hardware architects have proposed near-memory computing (also called processing-in-memory, or PIM), where a lightweight processor (called a PIM core) is located close to memory. Due to its proximity to memory, a memory access from a PIM core is much faster than that from a CPU core. New advances in 3D integration and die-stacked memory make PIM viable in the near future. Prior work has shown significant performance improvements by using PIM for embarrassingly parallel and data-intensive applications, as well as for pointer-chasing traversals in sequential data structures. However, current server machines have hundreds of cores, and algorithms for concurrent data structures exploit these cores to achieve high throughput and scalability, with significant benefits over sequential data structures. Thus, it is important to examine how PIM performs with respect to modern concurrent data structures and understand how concurrent data structures can be developed to take advantage of PIM. This paper is the first to examine the design of concurrent data structures for PIM. We show two main results: (1) naive PIM data structures cannot outperform state-of-the-art concurrent data structures, such as pointer-chasing data structures and FIFO queues, (2) novel designs for PIM data structures, using techniques such as combining, partitioning and pipelining, can outperform traditional concurrent data structures, with a significantly simpler design.

principles of distributed computing | 2018

Passing Messages while Sharing Memory

Marcos Kawazoe Aguilera; Naama Ben-David; Irina Calciu; Rachid Guerraoui; Erez Petrank; Sam Toueg

We introduce a new distributed computing model called m&m that allows processes to both pass messages and share memory. Motivated by recent hardware trends, we find that this model improves the power of the pure message-passing and shared-memory models. As we demonstrate by example with two fundamental problems---consensus and eventual leader election---the added power leads to new algorithms that are more robust against failures and asynchrony. Our consensus algorithm combines the superior scalability of message passing with the higher fault tolerance of shared memory, while our leader election algorithms reduce the system synchrony needed for correctness. These results point to a wide new space for future exploration of other problems, techniques, and benefits.

Operating Systems Review | 2017

How to implement any concurrent data structure for modern servers

Irina Calciu; Siddhartha Sen; Mahesh Balakrishnan; Marcos Kawazoe Aguilera

In this paper, we propose a method to implement any concurrent data structure. Our method produces implementations that work particularly well in non-uniform memory access (NUMA) machines. Due to recent architecture trends, highly concurrent servers today are NUMA machines, where the cost of accessing a memory location is not the same across every core. To fully leverage these machines, programmers need efficient concurrent data structures that are aware of the NUMA performance artifacts.We propose Node Replication (NR), a black-box approach to obtaining such data structures. NR takes an arbitrary sequential data structure and automatically transforms it into a NUMA-aware concurrent data structure satisfying linearizability. Using NR requires no expertise in concurrent data structure design, and the result is free of concurrency bugs. NR draws ideas from two disciplines: shared-memory algorithms and distributed systems. Briefly, NR implements a NUMA-aware shared log, and then uses the log to replicate data structures consistently across NUMA nodes. The cost of NR is additional memory for its log and replicas.

international conference on principles of distributed systems | 2013