William N. Scherer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William N. Scherer is active.

Explore More

Publication

Featured researches published by William N. Scherer.

principles of distributed computing | 2005

Advanced contention management for dynamic software transactional memory

William N. Scherer; Michael L. Scott

The obstruction-free Dynamic Software Transactional Memory (DSTM) system of Herlihy et al@. allows only one transaction at a time to acquire an object for writing. Should a second require an object currently in use, a contention manager must determine which may proceed and which must wait or abort.We analyze both new and existing policies for this contention management problem, using experimental results from a 16-processor SunFire machine. We consider both visible and invisible versions of read access, and benchmarks that vary in complexity, level of contention, tendency toward circular dependence, and mix of reads and writes. We present fair proportional-share prioritized versions of several policies, and identify a candidate default policy: one that provides, for the first time, good performance in every case we test. The tradeoff between visible and invisible reads remains application-specific: visible reads reduce the overhead for incremental validation when opening new objects, but the requisite bookkeeping exacerbates contention for the memory interconnect.

international symposium on distributed computing | 2005

Adaptive software transactional memory

Virendra J. Marathe; William N. Scherer; Michael L. Scott

Software Transactional Memory (STM) is a generic synchronization construct that enables automatic conversion of correct sequential objects into correct nonblocking concurrent objects. Recent STM systems, though significantly more practical than their predecessors, display inconsistent performance: differing design decisions cause different systems to perform best in different circumstances, often by dramatic margins. In this paper we consider four dimensions of the STM design space: (i) when concurrent objects are acquired by transactions for modification; (ii) how they are acquired; (iii) what they look like when not acquired; and (iv) the non-blocking semantics for transactions (lock-freedom vs. obstruction-freedom). In this 4-dimensional space we highlight the locations of two leading STM systems: the DSTM of Herlihy et al. and the OSTM of Fraser and Harris. Drawing motivation from the performance of a series of application benchmarks, we then present a new Adaptive STM (ASTM) system that adjusts to the offered workload, allowing it to match the performance of the best known existing system on every tested workload.

international symposium on distributed computing | 2006

Conflict detection and validation strategies for software transactional memory

Michael F. Spear; Virendra J. Marathe; William N. Scherer; Michael L. Scott

In a software transactional memory (STM) system, conflict detection is the problem of determining when two transactions cannot both safely commit. Validation is the related problem of ensuring that a transaction never views inconsistent data, which might potentially cause a doomed transaction to exhibit irreversible, externally visible side effects. Existing mechanisms for conflict detection vary greatly in their degree of speculation and their relative treatment of read-write and write-write conflicts. Validation, for its part, appears to be a dominant factor—perhaps the dominant factor—in the cost of complex transactions. We present the most comprehensive study to date of conflict detection strategies, characterizing the tradeoffs among them and identifying the ones that perform the best for various types of workload. In the process we introduce a lightweight heuristic mechanism—the global commit counter—that can greatly reduce the cost of validation and of single-threaded execution. The heuristic also allows us to experiment with mixed invalidation, a more opportunistic interleaving of reading and writing transactions. Experimental results on a 16-processor SunFire machine running our RSTM system indicate that the choice of conflict detection strategy can have a dramatic impact on performance, and that the best choice is workload dependent. In workloads whose transactions rarely conflict, the commit counter does little to help (and can even hurt) performance. For less scalable applications, however—those in which STM performance has traditionally been most problematic—it can improve transaction throughput many fold.

international conference on principles of distributed systems | 2005

A lazy concurrent list-based set algorithm

Steve Heller; Maurice Herlihy; Victor Luchangco; Mark Moir; William N. Scherer; Nir Shavit

List-based implementations of sets are a fundamental building block of many concurrent algorithms. A skiplist based on the lock-free list-based set algorithm of Michael will be included in the JavaTM Concurrency Package of JDK 1.6.0. However, Michaels lock-free algorithm has several drawbacks, most notably that it requires all list traversal operations, including membership tests, to perform cleanup operations of logically removed nodes, and that it uses the equivalent of an atomically markable reference, a pointer that can be atomically “marked,” which is expensive in some languages and unavailable in others. We present a novel “lazy” list-based implementation of a concurrent set object. It is based on an optimistic locking scheme for inserts and removes, eliminating the need to use the equivalent of an atomically markable reference. It also has a novel wait-free membership test operation (as opposed to Michaels lock-free one) that does not need to perform cleanup operations and is more efficient than that of all previous algorithms. Empirical testing shows that the new lazy-list algorithm consistently outperforms all known algorithms, including Michaels lock-free algorithm, throughout the concurrency range. At high load, with 90% membership tests, the lazy algorithm is more than twice as fast as Michaels. This is encouraging given that typical search structure usage patterns include around 90% membership tests. By replacing the lock-free membership test of Michaels algorithm with our new wait-free one, we achieve an algorithm that slightly outperforms our new lazy-list (though it may not be as efficient in other contexts as it uses Javas RTTI mechanism to create pointers that can be atomically marked).

international conference on supercomputing | 2008

Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

Jun Shirako; David M. Peixotto; Vivek Sarkar; William N. Scherer

Coordination and synchronization of parallel tasks is a major source of complexity in parallel programming. These constructs take many forms in practice including mutual exclusion in accesses to shared resources, termination detection of child tasks, collective barrier synchronization, and point-to-point synchronization. In this paper, we introduce phasers, a new coordination construct that unifies collective and point-to-point synchronizations. We establish two safety properties for phasers: deadlock-freedom and phase-ordering. Performance results obtained from a portable implementation of phasers on three different SMP platforms demonstrate that phasers can deliver superior performance to existing barrier implementations, in addition to the productivity benefits that result from their generality and safety properties.

Parallel Processing Letters | 2007

A Lazy Concurrent List-Based Set Algorithm

Steve Heller; Maurice Herlihy; Victor Luchangco; Mark Moir; William N. Scherer; Nir Shavit

We present a novel “lazy” list-based implementation of a concurrent set object. It is based on an optimistic locking scheme for inserts and removes and includes a simple wait-free membership test. Our algorithm improves on the performance of all previous such algorithms.

Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems | 2004

Design tradeoffs in modern software transactional memory systems

Virendra J. Marathe; William N. Scherer; Michael L. Scott

Software Transactional Memory (STM) is a generic non-blocking synchronization construct that enables automatic conversion of correct sequential objects into correct concurrent objects. Because it is nonblocking, STM avoids traditional performance and correctness problems due to thread failure, preemption, page faults, and priority inversion.In this paper we compare and analyze two recent object-based STM systems, the DSTM of Herlihy et al. and the FSTM of Fraser, both of which support dynamic transactions, in which the set of objects to be modified is not known in advance. We highlight aspects of these systems that lead to performance tradeoffs for various concurrent data structures. More specifically, we consider object ownership acquisition semantics, concurrent object referencing style, the overhead of ordering and bookkeeping, contention management versus helping semantics, and transaction validation. We demonstrate for each system simple benchmarks on which it outperforms the other by a significant margin. This in turn provides us with a preliminary characterization of the applications for which each system is best suited.

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models | 2009

A new vision for coarray Fortran

John M. Mellor-Crummey; Laksono Adhianto; William N. Scherer; Guohua Jin

In 1998, Numrich and Reid proposed Coarray Fortran as a simple set of extensions to Fortran 95 [7]. Their principal extension to Fortran was support for shared data known as coarrays. In 2005, the Fortran Standards Committee began exploring the addition of coarrays to Fortran 2008, which is now being finalized. Careful review of drafts of the emerging Fortran 2008 standard led us to identify several shortcomings with the proposed coarray extensions. In this paper, we briefly critique the coarray extensions proposed for Fortran 2008, outline a new vision for coarrays in Fortran language that is far more expressive, and briefly describe our strategy for implementing the language extensions that we propose.

ieee international conference on high performance computing data and analytics | 2005

Preemption adaptivity in time-published queue-based spin locks

Bijun He; William N. Scherer; Michael L. Scott

The proliferation of multiprocessor servers and multithreaded applications has increased the demand for high-performance synchronization. Traditional scheduler-based locks incur the overhead of a full context switch between threads and are thus unacceptably slow for many applications. Spin locks offer low overhead, but they either scale poorly (test-and-set style locks) or handle preemption badly (queue-based locks). Previous work has shown how to build preemption-tolerant locks using an extended kernel interface, but such locks are neither portable to nor even compatible with most operating systems. In this work, we propose a time-publishing heuristic in which each thread periodically records its current timestamp to a shared memory location. Given the high resolution, roughly synchronized clocks of modern processors, this convention allows threads to guess accurately which peers are active based on the currency of their timestamps. We implement two queue-based locks, MCS-TP and CLH-TP, and evaluate their performance relative to traditional spin locks on a 32-processor IBM p690 multiprocessor. These results show that time-published locks make it feasible, for the first time, to use queue-based spin locks on multiprogrammed systems with a standard kernel interface.

international parallel and distributed processing symposium | 2009

Phaser accumulators: A new reduction construct for dynamic parallelism

Jun Shirako; David M. Peixotto; Vivek Sarkar; William N. Scherer

A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser accumulators, a new reduction construct that meshes seamlessly with phasers to support dynamic parallelism in a phased (iterative) setting. By separating reduction computations into the parts of sending data, performing the computation itself, and retrieving the result, we enable overlap of communication and computation in a manner analogous to that of split-phase barriers. Additionally, this separation enables exploration of implementation strategies that differ as to when the reduction itself is performed: eagerly when the data is supplied, or lazily when a synchronization point is reached. We implement accumulators as extensions to phasers in the Habanero dialect of the X10 programming language. Performance evaluations of the EPCC Syncbench, Spectral-norm, and CG benchmarks on AMD Opteron, Intel Xeon, and Sun UltraSPARC T2 multicore SMPs show superior performance and scalability over OpenMP reductions (on two platforms) and X10 code (on three platforms) written with atomic blocks, with improvements of up to 2.5× on the Opteron and 14.9× on the UltraSPARC T2 relative to OpenMP and 16.5× on the Opteron, 26.3× on the Xeon and 94.8× on the UltraSPARC T2 relative to X10 atomic blocks. To the best of our knowledge, no prior reduction construct supports the dynamic parallelism and asynchronous capabilities of phaser accumulators.

Explore More