Is this you? Create Your Porfile

Justin Kopinsky

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Justin Kopinsky is active.

Explore More

Publication

Featured researches published by Justin Kopinsky.

international symposium on distributed computing | 2015

Inherent Limitations of Hybrid Transactional Memory

Dan Alistarh; Justin Kopinsky; Srivatsan Ravi; Nir Shavit

Several Hybrid Transactional Memory HyTM schemes have recently been proposed to complement the fast, but best-effort nature of Hardware Transactional Memory HTM with a slow, reliable software backup. However, the costs of providing concurrency between hardware and software transactions in HyTM are still not well understood. In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses. The model allows us to formally quantify and analyze the amount of overhead instrumentation caused by the potential presence of software transactions. We prove that 1 it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for very weak progress guarantees, and 2 the instrumentation cost incurred by a hardware transaction in any progressive opaque HyTM is linear in the size of the transactions data set. We further describe two implementations which exhibit optimal instrumentation costs for two different progress conditions. In sum, this paper proposes the first formal HyTM model and captures for the first time the trade-off between the degree of hardware-software TM concurrency and the amount of instrumentation overhead.

principles of distributed computing | 2018

Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms

Dan Alistarh; Trevor Brown; Justin Kopinsky; Giorgi Nadiradze

There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which certain iterative task-based algorithms can be efficiently parallelized via relaxed concurrent priority schedulers. These allow for high concurrency when inserting and removing tasks, at the cost of executing superfluous work due to the relaxed semantics of the scheduler. In this work, we take a step towards unifying these two research directions, by showing that there exists a family of relaxed priority schedulers that can efficiently and deterministically execute classic iterative algorithms such as greedy maximal independent set (MIS) and matching. Our primary result shows that, given a randomized scheduler with an expected relaxation factor of k in terms of the maximum allowed priority inversions on a task, and any graph on n vertices, the scheduler is able to execute greedy MIS with only an additive factor of \poly(k) expected additional iterations compared to an exact (but not scalable) scheduler. This counter-intuitive result demonstrates that the overhead of relaxation when computing MIS is not dependent on the input size or structure of the input graph. Experimental results show that this overhead can be clearly offset by the gain in performance due to the highly scalable scheduler. In sum, we present an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion.

acm symposium on parallel algorithms and architectures | 2018

Distributionally Linearizable Data Structures

Dan Alistarh; Trevor Brown; Justin Kopinsky; Jerry Zheng Li; Giorgi Nadiradze

Relaxed concurrent data structures have become increasingly popular, due to their scalability in graph processing and machine learning applications (\citeNguyen13, gonzalez2012powergraph ). Despite considerable interest, there exist families of natural, high performing randomized relaxed concurrent data structures, such as the popular MultiQueue~\citeMQ pattern for implementing relaxed priority queue data structures, for which no guarantees are known in the concurrent setting~\citeAKLN17. Our main contribution is in showing for the first time that, under a set of analytic assumptions, a family of relaxed concurrent data structures, including variants of MultiQueues, but also a new approximate counting algorithm we call the MultiCounter, provides strong probabilistic guarantees on the degree of relaxation with respect to the sequential specification, in arbitrary concurrent executions. We formalize these guarantees via a new correctness condition called distributional linearizability, tailored to concurrent implementations with randomized relaxations. Our result is based on a new analysis of an asynchronous variant of the classic power-of-two-choices load balancing algorithm, in which placement choices can be based on inconsistent, outdated information (this result may be of independent interest). We validate our results empirically, showing that the MultiCounter algorithm can implement scalable relaxed timestamps.

Distributed Computing | 2018

Inherent limitations of hybrid transactional memory

Dan Alistarh; Justin Kopinsky; Srivatsan Ravi; Nir Shavit

Several hybrid transactional memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort nature of hardware transactional memory with a slow, reliable software backup. However, the costs of providing concurrency between hardware and software transactions in HyTM are still not well understood. In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses. The model allows us to formally quantify and analyze the amount of overhead (instrumentation) caused by the potential presence of software transactions. We prove that (1) it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for very weak progress guarantees, and (2) the instrumentation cost incurred by a hardware transaction in any progressive opaque HyTM is linear in the size of the transaction’s data set. We further describe two implementations which exhibit optimal instrumentation costs for two different progress conditions. In sum, this paper proposes the first formal HyTM model and captures for the first time the trade-off between the degree of hardware-software TM concurrency and the amount of instrumentation overhead.

Journal of Information Processing | 2017

The Parameterized Complexity of Ricochet Robots

Adam Hesterberg; Justin Kopinsky

Sliding maze puzzles like Ricochet Robots and Atomix are puzzles in which solvers must maneuver agents around a grid board subject to the constraint that whenever an agent moves in some direction, it must move as far as possible in that direction. In general, finding an optimal solution to these puzzles is known to be PSPACE-complete. This paper further shows that these puzzles are W[SAT]-hard with respect to the number of robots in the puzzle instance (and therefore unlikely to be Fixed Parameter Tractable).

international conference on distributed computing systems | 2014

The LevelArray: A Fast, Practical Long-Lived Renaming Algorithm

Dan Alistarh; Justin Kopinsky; Alexander Matveev; Nir Shavit

The long-lived renaming problem appears in shared-memory systems where a set of threads need to register and deregister frequently from the computation, while concurrent operations scan the set of currently registered threads. Instances of this problem show up in concurrent implementations of transactional memory, flat combining, thread barriers, and memory reclamation schemes for lock-free data structures. In this paper, we analyze a randomized solution for long-lived renaming. The algorithmic technique we consider, called the Level Array, has previously been used for hashing and one-shot (single-use) renaming. Our main contribution is to prove that, in long-lived executions, where processes may register and deregister polynomially many times, the technique guarantees constant steps on average and O (log log n) steps with high probability for registering, unit cost for deregistering, and O (n) steps for collect queries, where n is an upper bound on the number of processes that may be active at any point in time. We also show that the algorithm has the surprising property that it is self-healing: under reasonable assumptions on the schedule, operations running while the data structure is in a degraded state implicitly help the data structure re-balance itself. This subtle mechanism obviates the need for expensive periodic rebuilding procedures. Our benchmarks validate this approach, showing that, for typical use parameters, the average number of steps a process takes to register is less than two and the worst-case number of steps is bounded by six, even in executions with billions of operations. We contrast this with other randomized implementations, whose worst-case behavior we show to be unreliable, and with deterministic implementations, whose cost is linear in n.

acm sigplan symposium on principles and practice of parallel programming | 2015