Daniel Nussbaum | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Nussbaum is active.

Explore More

Publication

Featured researches published by Daniel Nussbaum.

architectural support for programming languages and operating systems | 2009

Early experience with a commercial hardware transactional memory implementation

David Dice; Yossi Lev; Mark Moir; Daniel Nussbaum

We report on our experience with the hardware transactional memory (HTM) feature of two pre-production revisions of a new commercial multicore processor. Our experience includes a number of promising results using HTM to improve performance in a variety of contexts, and also identifies some ways in which the feature could be improved to make it even better. We give detailed accounts of our experiences, sharing techniques we used to achieve the results we have, as well as describing challenges we faced in doing so.

acm symposium on parallel algorithms and architectures | 2005

Using elimination to implement scalable and lock-free FIFO queues

Mark Moir; Daniel Nussbaum; Ori Shalev; Nir Shavit

This paper shows for the first time that elimination, a scaling technique formerly applied only to counters and LIFO structures, can be applied to FIFO data structures, specifically, to linearizable FIFO queues. We show how to transform existing nonscalable FIFO queue implementations into scalable implementations using the elimination technique, while preserving lock-freedom and linearizablity.We apply our transformation to the FIFO queue algorithm of Michael and Scott, which is included in the Java™ Concurrency Package. Empirical evaluation on a state-of-the-art CMT multiprocessor chip shows that by using elimination as a backoff technique for the Michael and Scott queue algorithm, we can achieve comparable performance at low loads, and improved scalability as load increases.

acm symposium on parallel algorithms and architectures | 2008

The adaptive transactional memory test platform: a tool for experimenting with transactional code for rock (poster)

Mark Moir; Kevin Moore; Daniel Nussbaum

Sun has recently announced that its forthcoming multicore processor, code-named Rock, will support a form of hardware transactional memory (HTM). Our poster describes this feature, and presents the Adaptive Transactional Memory Test Platform (ATMTP)---a simulator we have developed that allows us and others to experiment with code that uses it, as well as the results of some preliminary experiments conducted using ATMTP.

international conference on parallel processing | 2006

A hierarchical CLH queue lock

Victor Luchangco; Daniel Nussbaum; Nir Shavit

Modern multiprocessor architectures such as CC-NUMA machines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality. A recent paper by Radovic and Hagersten shows that performance gains can be obtained by developing general-purpose mutual-exclusion locks that encourage threads with high mutual memory locality to acquire the lock consecutively, thus reducing the overall cost due to cache misses. Radovic and Hagersten present the first such hierarchical locks. Unfortunately, their locks are backoff locks, which are known to incur higher cache miss rates than queue-based locks, suffer from various fundamental fairness issues, and are hard to tune so as to maximize locality of lock accesses. Extending queue-locking algorithms to be hierarchical requires that requests from threads with high mutual memory locality be consecutive in the queue. Until now, it was not clear that one could design such locks because collecting requests locally and moving them into a global queue seemingly requires a level of coordination whose cost would defeat the very purpose of hierarchical locking. This paper presents a hierarchical version of the Craig, Landin, and Hagersten CLH queue lock, which we call the HCLH queue lock. In this algorithm, threads build implicit local queues of waiting threads, splicing them into a global queue at the cost of only a single CAS operation. In a set of microbenchmarks run on a large scale multiprocessor machine and a state-of-the-art multi-threaded multi-core chip, the HLCH algorithm exhibits better performance and significantly better fairness than the hierarchical backoff locks of Radovic and Hagersten.

acm sigops european workshop | 2004

Chip multithreading systems need a new operating system scheduler

Alexandra Fedorova; Christopher Small; Daniel Nussbaum; Margo I. Seltzer

The unpredictable nature of modern workloads, characterized by frequent branches and control transfers, can result in processor pipeline utilization as low as 19%. Chip multithreading (CMT), a processor architecture combining chip multiprocessing and hardware multithreading, is designed to address this issue. Hardware vendors plan to ship CMT systems within the next two years; understanding how such systems will perform is crucial if we are to use them to full advantage.Our simulation experiments show that a CMT-savvy operating system scheduler could improve application performance by a factor of two. In this paper we describe our initial analysis of application performance on CMT systems and propose a design for a scheduler tailored for the needs of a CMT system.

architectural support for programming languages and operating systems | 2006