Daniel Nussbaum
Sun Microsystems Laboratories
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Nussbaum.
architectural support for programming languages and operating systems | 2009
David Dice; Yossi Lev; Mark Moir; Daniel Nussbaum
We report on our experience with the hardware transactional memory (HTM) feature of two pre-production revisions of a new commercial multicore processor. Our experience includes a number of promising results using HTM to improve performance in a variety of contexts, and also identifies some ways in which the feature could be improved to make it even better. We give detailed accounts of our experiences, sharing techniques we used to achieve the results we have, as well as describing challenges we faced in doing so.
acm symposium on parallel algorithms and architectures | 2005
Mark Moir; Daniel Nussbaum; Ori Shalev; Nir Shavit
This paper shows for the first time that elimination, a scaling technique formerly applied only to counters and LIFO structures, can be applied to FIFO data structures, specifically, to linearizable FIFO queues. We show how to transform existing nonscalable FIFO queue implementations into scalable implementations using the elimination technique, while preserving lock-freedom and linearizablity.We apply our transformation to the FIFO queue algorithm of Michael and Scott, which is included in the Java™ Concurrency Package. Empirical evaluation on a state-of-the-art CMT multiprocessor chip shows that by using elimination as a backoff technique for the Michael and Scott queue algorithm, we can achieve comparable performance at low loads, and improved scalability as load increases.
acm symposium on parallel algorithms and architectures | 2008
Mark Moir; Kevin Moore; Daniel Nussbaum
Sun has recently announced that its forthcoming multicore processor, code-named Rock, will support a form of hardware transactional memory (HTM). Our poster describes this feature, and presents the Adaptive Transactional Memory Test Platform (ATMTP)---a simulator we have developed that allows us and others to experiment with code that uses it, as well as the results of some preliminary experiments conducted using ATMTP.
international conference on parallel processing | 2006
Victor Luchangco; Daniel Nussbaum; Nir Shavit
Modern multiprocessor architectures such as CC-NUMA machines or CMPs have nonuniform communication architectures that render programs sensitive to memory access locality. A recent paper by Radovic and Hagersten shows that performance gains can be obtained by developing general-purpose mutual-exclusion locks that encourage threads with high mutual memory locality to acquire the lock consecutively, thus reducing the overall cost due to cache misses. Radovic and Hagersten present the first such hierarchical locks. Unfortunately, their locks are backoff locks, which are known to incur higher cache miss rates than queue-based locks, suffer from various fundamental fairness issues, and are hard to tune so as to maximize locality of lock accesses. Extending queue-locking algorithms to be hierarchical requires that requests from threads with high mutual memory locality be consecutive in the queue. Until now, it was not clear that one could design such locks because collecting requests locally and moving them into a global queue seemingly requires a level of coordination whose cost would defeat the very purpose of hierarchical locking. This paper presents a hierarchical version of the Craig, Landin, and Hagersten CLH queue lock, which we call the HCLH queue lock. In this algorithm, threads build implicit local queues of waiting threads, splicing them into a global queue at the cost of only a single CAS operation. In a set of microbenchmarks run on a large scale multiprocessor machine and a state-of-the-art multi-threaded multi-core chip, the HLCH algorithm exhibits better performance and significantly better fairness than the hierarchical backoff locks of Radovic and Hagersten.
acm sigops european workshop | 2004
Alexandra Fedorova; Christopher Small; Daniel Nussbaum; Margo I. Seltzer
The unpredictable nature of modern workloads, characterized by frequent branches and control transfers, can result in processor pipeline utilization as low as 19%. Chip multithreading (CMT), a processor architecture combining chip multiprocessing and hardware multithreading, is designed to address this issue. Hardware vendors plan to ship CMT systems within the next two years; understanding how such systems will perform is crucial if we are to use them to full advantage.Our simulation experiments show that a CMT-savvy operating system scheduler could improve application performance by a factor of two. In this paper we describe our initial analysis of application performance on CMT systems and propose a design for a scheduler tailored for the needs of a CMT system.
architectural support for programming languages and operating systems | 2006
Peter C. Damron; Alexandra Fedorova; Yossi Lev; Victor Luchangco; Mark Moir; Daniel Nussbaum
usenix annual technical conference | 2005
Alexandra Fedorova; Margo I. Seltzer; Christopher Small; Daniel Nussbaum
Archive | 2005
Daniel Nussbaum; Victor Luchangco; Mark Moir; Ori Shalev; Nir Shavit
Archive | 2006
Mark Moir; Daniel Nussbaum; Ori Shalev; Nir Shavit
Archive | 2004
Daniel Nussbaum; Alexandra Fedorova; Christopher Small