Richard L. Hudson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richard L. Hudson is active.

Explore More

Publication

Featured researches published by Richard L. Hudson.

acm sigplan symposium on principles and practice of parallel programming | 2006

McRT-STM: a high performance software transactional memory system for a multi-core runtime

Bratin Saha; Ali-Reza Adl-Tabatabai; Richard L. Hudson; Chi Cao Minh; Benjamin C. Hertzberg

Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex based synchronization). Unfortunately, lock based synchronization often leads to deadlocks, makes fine-grained synchronization difficult, hinders composition of atomic primitives, and provides no support for error recovery. Transactions avoid many of these problems, and therefore, promise to ease concurrent programming.We describe a software transactional memory (STM) system that is part of McRT, an experimental Multi-Core RunTime. The McRT-STM implementation uses a number of novel algorithms, and supports advanced features such as nested transactions with partial aborts, conditional signaling within a transaction, and object based conflict detection for C/C++ applications. The McRT-STM exports interfaces that can be used from C/C++ programs directly or as a target for compilers translating higher level linguistic constructs.We present a detailed performance analysis of various STM design tradeoffs such as pessimistic versus optimistic concurrency, undo logging versus write buffering, and cache line based versus object based conflict detection. We also show a MCAS implementation that works on arbitrary values, coexists with the STM, and can be used as a more efficient form of transactional memory. To provide a baseline we compare the performance of the STM with that of fine-grained and coarse-grained locking using a number of concurrent data structures on a 16-processor SMP system. We also show our STM performance on a non-synthetic workload -- the Linux sendmail application.

programming language design and implementation | 2007

Enforcing isolation and ordering in STM

Tatiana Shpeisman; Vijay Menon; Ali-Reza Adl-Tabatabai; Steven Balensiefer; Dan Grossman; Richard L. Hudson; Katherine F. Moore; Bratin Saha

Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions. A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner. In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems via strong atomicity. We demonstrate how to implement non-transactional data accesses via efficient read and write barriers, and we present compiler optimizations that further reduce the overheads of these barriers. We introduce a dynamic escape analysis that differentiates private and public data at runtime to make barriers cheaper and a static not-accessed-in-transaction analysis that removes many barriers completely. Our results on a set of Java programs show that strong atomicity can be implemented efficiently in a high-performance STM system.

acm sigplan symposium on principles and practice of parallel programming | 2007

Open nesting in software transactional memory

Yang Ni; Vijay Menon; Ali-Reza Adl-Tabatabai; Antony L. Hosking; Richard L. Hudson; J. Eliot B. Moss; Bratin Saha; Tatiana Shpeisman

Transactional memory (TM) promises to simplify concurrent programming while providing scalability competitive to fine-grained locking. Language-based constructs allow programmers to denote atomic regions declaratively and to rely on the underlying system to provide transactional guarantees along with concurrency. In contrast with fine-grained locking, TM allows programmers to write simpler programs that are composable and deadlock-free. TM implementations operate by tracking loads and stores to memory and by detecting concurrent conflicting accesses by different transactions. By automating this process, they greatly reduce the programmers burden, but they also are forced to be conservative. Incertain cases, conflicting memory accesses may not actually violate the higher-level semantics of a program, and a programmer may wish to allow seemingly conflicting transactions to execute concurrently. Open nested transactions enable expert programmers to differentiate between physical conflicts, at the level of memory, and logical conflicts that actually violate application semantics. A TMsystem with open nesting can permit physical conflicts that are not logical conflicts, and thus increase concurrency among application threads. Here we present an implementation of open nested transactions in a Java-based software transactional memory (STM)system. We describe new language constructs to support open nesting in Java, and we discuss new abstract locking mechanisms that a programmer can use to prevent logical conflicts. We demonstrate how these constructs can be mapped efficiently to existing STM data structures. Finally, we evaluate our system on a set of Java applications and data structures, demonstrating how open nesting can enhance application scalability.

international symposium on memory management | 1992

Incremental Collection of Mature Objects

Richard L. Hudson; J. Eliot B. Moss

We present a garbage collection algorithm that extends generational scavenging to collect large older generations (mature objects) non-disruptively. The algorithms approach is to process bounded-size pieces of mature object space at each collection; the subtleties lie in guaranteeing that it eventually collects any and all garbage. The algorithm does not assume any special hardware or operating system support, e.g., for forwarding pointers or protection traps. The algorithm copies objects, so it naturally supports compaction and reclustering.

acm symposium on parallel algorithms and architectures | 2008

Practical weak-atomicity semantics for java stm

Vijay Menon; Steven Balensiefer; Tatiana Shpeisman; Ali-Reza Adl-Tabatabai; Richard L. Hudson; Bratin Saha; Adam Welc

As memory transactions have been proposed as a language-level replacement for locks, there is growing need for well-defined semantics. In contrast to database transactions, transaction memory (TM) semantics are complicated by the fact that programs may access the same memory locations both inside and outside transactions. Strongly atomic semantics, where non transactional accesses are treated as implicit single-operation transactions, remain difficult to provide without specialized hardware support or significant performance overhead. As an alternative, many in the community have informally proposed that a single global lock semantics [18,10], where transaction semantics are mapped to those of regions protected by a single global lock, provide an intuitive and efficiently implementable model for programmers. In this paper, we explore the implementation and performance implications of single global lock semantics in a weakly atomic STM from the perspective of Java, and we discuss why even recent STM implementations fall short of these semantics. We describe a new weakly atomic Java STM implementation that provides single global lock semantics while permitting concurrent execution, but we show that this comes at a significant performance cost. We also propose and implement various alternative semantics that loosen single lock requirements while still providing strong guarantees. We compare our new implementations to previous ones, including a strongly atomic STM.[24]

international symposium on memory management | 2006

McRT-Malloc: a scalable transactional memory allocator

Richard L. Hudson; Bratin Saha; Ali-Reza Adl-Tabatabai; Benjamin C. Hertzberg

Emerging multi-core processors promise to provide an exponentially increasing number of hardware threads with every generation. Applications will need to be highly concurrent to fullyuse the power of these processors. To enable maximum concurrency, libraries (such as malloc-free packages) would therefore need to use non-blocking algorithms. But lock-free algorithms are notoriously difficult to reason about and inappropriate for average programmers. Transactional memory promises to significantly ease concurrent programming for the average programmer. This paper describes a highly efficient non-blocking malloc/free algorithm that supports memory allocation and deallocation inside transactional code blocks. Thus this paper describes a memory allocator that is suitable for emerging multi-core applications, while supporting modern concurrency constructs.This paper makes several novel contributions. It is the first to integrate a software transactional memory system with a malloc/free based memory allocator. We present the first algorithm which ensures that space allocated in an aborted transaction is properly freed and does not lead to a space blowup. Unlike previous lock-free malloc packages, our algorithm avoids atomic operations on typical code paths, making our algorithm substantially more efficient.

programming language design and implementation | 1992

Compiler support for garbage collection in a statically typed language

Amer Diwan; J. Eliot B. Moss; Richard L. Hudson

We consider the problem of supporting compacting garbage collection in the presence of modern compiler optimizations. Since our collector may move any heap object, it must accurately locate, follow, and update all pointers and values derived from pointers. To assist the collector, we extend the compiler to emit tables describing live pointers, and values derived from pointers, at each program location where collection may occur. Significant results include identification of a number of problems posed by optimizations, solutions to those problems, a working compiler, and experimental data concerning table sizes, table compression, and time overhead of decoding tables during collection. While gc support can affect the code produced, our sample programs show no significant changes, the table sizes are a modest fraction of the size of the optimized code, and stack tracing is a small fraction of total gc time. Since the compiler enhancements are also modest, we conclude that the approach is practical.

programming language design and implementation | 2004

Prefetch injection based on hardware monitoring and object metadata

Ali-Reza Adl-Tabatabai; Richard L. Hudson; Mauricio J. Serrano; Sreenivas Subramoney

Cache miss stalls hurt performance because of the large gap between memory and processor speeds - for example, the popular server benchmark SPEC JBB2000 spends 45% of its cycles stalled waiting for memory requests on the Itanium® 2 processor. Traversing linked data structures causes a large portion of these stalls. Prefetching for linked data structures remains a major challenge because serial data dependencies between elements in a linked data structure preclude the timely materialization of prefetch addresses. This paper presents Mississippi Delta (MS Delta), a novel technique for prefetching linked data structures that closely integrates the hardware performance monitor (HPM), the garbage collectors global view of heap and object layout, the type-level metadata inherent in type-safe programs, and JIT compiler analysis. The garbage collector uses the HPMs data cache miss information to identify cache miss intensive traversal paths through linked data structures, and then discovers regular distances (deltas) between these linked objects. JIT compiler analysis injects prefetch instructions using deltas to materialize prefetch addresses.We have implemented MS Delta in a fully dynamic profile-guided optimization system: the StarJIT dynamic compiler [1] and the ORP Java virtual machine [9]. We demonstrate a 28-29% reduction in stall cycles attributable to the high-latency cache misses targeted by MS Delta and a speedup of 11-14% on the cache miss intensive SPEC JBB2000 benchmark.

conference on object-oriented programming systems, languages, and applications | 1997

Garbage collecting the world: one car at a time

Richard L. Hudson; Ronald Morrison; J. Eliot B. Moss; David S. Munro

A new garbage collection algorithm for distributed object systems, called DMOS (Distributed. Mature Object Space), is presented. It is derived from two previous algorithms, MOS (Mature Object Space), sometimes called the train algorithm, and PMOS (Persistent Mature Object Space). The contribution of DMOS is that it provides the following unique combination of properties for a distributed collector: safety, completeness, non-disruptiveness, incrementality, and scalability. Furthermore, the DMOS collector is non-blocking and does not use global tracing.

Sigplan Notices | 2008

Single global lock semantics in a weakly atomic STM

Vijay Menon; Steven Balensiefer; Tatiana Shpeisman; Ali-Reza Adl-Tabatabai; Richard L. Hudson; Bratin Saha; Adam Welc

As memory transactions have been proposed as a language-level replacement for locks, there is growing need for well-defined semantics. In contrast to database transactions, transaction memory (TM) semantics are complicated by the fact that programs may access the same memory locations both inside and outside transactions. Strongly atomic semantics, where non-transactional accesses are treated as implicit single-operation transactions, remain difficult to provide without specialized hardware support and/or significant performance overhead. As an alternative, many in the community have informally proposed that a single global lock semantics [16, 9], where transaction semantics are mapped to those of regions protected by a single global lock, provide an intuitive and efficiently implementable model for programmers. In this paper, we explore the implementation and performance implications of single global lock semantics in a weakly atomic STM from the perspective of Java, and we discuss why even recent STM implementations fall short of these semantics. We describe a new weakly atomic Java STM implementation that provides single global lock semantics while permitting concurrent execution, but we show that this comes at a significant performance cost. We also propose and implement various alternative semantics that loosen single lock requirements while still providing strong guarantees. We compare our new implementations to previous ones, including a strongly atomic STM. [22]

Explore More