Is this you? Create Your Porfile

Aleksandar Dragojevic

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aleksandar Dragojevic is active.

Explore More

Publication

Featured researches published by Aleksandar Dragojevic.

programming language design and implementation | 2009

Stretching transactional memory

Aleksandar Dragojevic; Rachid Guerraoui; Michal Kapalka

Transactional memory (TM) is an appealing abstraction for programming multi-core systems. Potential target applications for TM, such as business software and video games, are likely to involve complex data structures and large transactions, requiring specific software solutions (STM). So far, however, STMs have been mainly evaluated and optimized for smaller scale benchmarks. We revisit the main STM design choices from the perspective of complex workloads and propose a new STM, which we call SwissTM. In short, SwissTM is lock- and word-based and uses (1) optimistic (commit-time) conflict detection for read/write conflicts and pessimistic (encounter-time) conflict detection for write/write conflicts, as well as (2) a new two-phase contention manager that ensures the progress of long transactions while inducing no overhead on short ones. SwissTM outperforms state-of-the-art STM implementations, namely RSTM, TL2, and TinySTM, in our experiments on STMBench7, STAMP, Lee-TM and red-black tree benchmarks. Beyond SwissTM, we present the most complete evaluation to date of the individual impact of various STM design choices on the ability to support the mixed workloads of large applications.

Communications of The ACM | 2011

Why STM can be more than a research toy

Aleksandar Dragojevic; Pascal Felber; Vincent Gramoli; Rachid Guerraoui

Despite earlier claims, Software Transactional Memory outperforms sequential code.

principles of distributed computing | 2009

Preventing versus curing: avoiding conflicts in transactional memories

Aleksandar Dragojevic; Rachid Guerraoui; Anmol V. Singh; Vasu Singh

Transactional memories are typically speculative and rely on contention managers to cure conflicts. This paper explores a complementary approach that prevents conflicts by scheduling transactions according to predictions on their access sets. We first explore the theoretical boundaries of this approach and prove that (1) a TM scheduler with an accurate prediction can be 2-competitive with an optimal offline TM scheduler, but (2) even a slight inaccuracy in prediction makes the competitive ratio of the TM scheduler in the order of the number of transactions. We then show that, in practice, there is room for a pragmatic approach with good average case performance. We present Shrink, a scheduler that (1) bases its prediction of transactional accesses on the access patterns of the past transactions from the same thread, and (2) uses a novel heuristic, which we call serialization affinity, to schedule transactions with a probability proportional to the current amount of contention. Shrink obtains roughly 70% accurate read and write access predictions on STMBench7 and STAMP. In our experimental evaluation, Shrink significantly improves STM performance in cases the number of executing threads is higher than the number of available CPU cores. For SwissTM, Shrink improves the performance by up to 55% on STMBench7, and up to 120% on STAMP. For TinySTM, Shrink drastically improves the performance on STMBench7 and STAMP benchmarks.

symposium on operating systems principles | 2015

No compromises: distributed transactions with consistency, availability, and performance

Aleksandar Dragojevic; Dushyanth Narayanan; Edmund B. Nightingale; Matthew Renzelmann; Alex Shamis; Anirudh Badam; Miguel Castro

Transactions with strong consistency and high availability simplify building and reasoning about distributed systems. However, previous implementations performed poorly. This forced system designers to avoid transactions completely, to weaken consistency guarantees, or to provide single-machine transactions that require programmers to partition their data. In this paper, we show that there is no need to compromise in modern data centers. We show that a main memory distributed computing platform called FaRM can provide distributed transactions with strict serializability, high performance, durability, and high availability. FaRM achieves a peak throughput of 140 million TATP transactions per second on 90 machines with a 4.9 TB database, and it recovers from a failure in less than 50 ms. Key to achieving these results was the design of new transaction, replication, and recovery protocols from first principles to leverage commodity networks with RDMA and a new, inexpensive approach to providing non-volatile DRAM.

acm symposium on parallel algorithms and architectures | 2009

Optimizing transactions for captured memory

Aleksandar Dragojevic; Yang Ni; Ali-Reza Adl-Tabatabai

In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (also called STM barriers). A compiler unaware of that, however, may translate simple memory load/store operations accessing such memory into more expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler can be accesses to captured memory, which include 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. Similar techniques can also be used to elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) showed that upto 18% performance improvement could be achieved at 16 threads.

international middleware conference | 2012

Unifying thread-level speculation and transactional memory

João Pedro Barreto; Aleksandar Dragojevic; Paulo Ferreira; Ricardo Filipe; Rachid Guerraoui

The motivation of this work is to ask whether Transactional Memory (TM) and Thread-Level Speculation (TLS), two prominent concurrency paradigms usually considered separately, can be combined into a hybrid approach that extracts untapped parallelism and speed-up from common programs. We show that the answer is positive by describing an algorithm, called TLSTM, that leverages an existing TM with TLS capabilities. We also show that our approach is able to achieve up to a 48% increase in throughput over the base TM, on read dominated workloads of long transactions in a multi-threaded application, among other results.

acm sigplan symposium on principles and practice of parallel programming | 2010

Leveraging parallel nesting in transactional memory

João Pedro Barreto; Aleksandar Dragojevic; Paulo Ferreira; Rachid Guerraoui; Michal Kapalka

Exploiting the emerging reality of affordable multi-core architectures goes through providing programmers with simple abstractions that would enable them to easily turn their sequential programs into concurrent ones that expose as much parallelism as possible. While transactional memory promises to make concurrent programming easy to a wide programmer community, current implementations either disallow nested transactions to run in parallel or do not scale to arbitrary parallel nesting depths. This is an important obstacle to the central goal of transactional memory, as programmers can only start parallel threads in restricted parts of their code. This paper addresses the intrinsic difficulty behind the support for parallel nesting in transactional memory, and proposes a novel solution that, to the best of our knowledge, is the first practical solution to meet the lowest theoretical upper bound known for the problem. Using a synthetic workload configured to test parallel transactions on a multi-core machine, a practical implementation of our algorithm yields substantial speed-ups (up to 22x with 33 threads) relatively to serial nesting, and shows that the time to start and commit transactions, as well as to detect conflicts, is independent of nesting depth.

acm sigplan symposium on principles and practice of parallel programming | 2016

ESTIMA: extrapolating scalability of in-memory applications

Georgios Chatzopoulos; Aleksandar Dragojevic; Rachid Guerraoui

This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ESTIMA extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying ESTIMA is the use of stalled cycles (e.g. cycles that the processor spends waiting for various events, such as cache misses or waiting on a lock). ESTIMA measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. ESTIMA can be effectively used to predict the scalability of in-memory applications. For instance, using measurements of memcached and SQLite on a desktop machine, we obtain accurate predictions of their scalability on a server. Our extensive evaluation on a large number of in-memory benchmarks shows that ESTIMA has generally low prediction errors.

parallel computing | 2017

ESTIMA: Extrapolating ScalabiliTy of In-Memory Applications

Georgios Chatzopoulos; Aleksandar Dragojevic; Rachid Guerraoui

This article presents estima , an easy-to-use tool for extrapolating the scalability of in-memory applications. estima is designed to perform a simple yet important task: Given the performance of an application on a small machine with a handful of cores, estima extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying estima is the use of stalled cycles (e.g., cycles that the processor spends waiting for missed cache line fetches or busy locks). estima measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. estima can be effectively used to predict the scalability of in-memory applications for bigger execution machines. For instance, using measurements of memcached and SQLite on a desktop machine, we obtain accurate predictions of their scalability on a server. Our extensive evaluation shows the effectiveness of estima on a large number of in-memory benchmarks.

networked systems design and implementation | 2014