Brian D. Carlstrom | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brian D. Carlstrom is active.

Explore More

Publication

Featured researches published by Brian D. Carlstrom.

international symposium on computer architecture | 2004

Transactional Memory Coherence and Consistency

Lance Hammond; Vicky Wong; Michael K. Chen; Brian D. Carlstrom; John D. Davis; Ben Hertzberg; Manohar K. Prabhu; Honggo Wijaya; Christos Kozyrakis; Kunle Olukotun

In this paper, we propose a new shared memory model: transactional memory coherence and consistency (TCC). TCC provides a model in which atomic transactions are always the basic unit of parallel work, communication, memory coherence, and memory reference consistency. TCC greatly simplifies parallel software by eliminating the need for synchronization using conventional locks and semaphores, along with their complexities. TCC hardware must combine all writes from each transaction region in a program into a single packet and broadcast this packet to the permanent shared memory state atomically as a large block. This simplifies the coherence hardware because it reduces the need for small, low-latency messages and completely eliminates the need for conventional snoopy cache coherence protocols, as multiple speculatively written versions of a cache line may safely coexist within the system. Meanwhile, automatic, hardware-controlled rollback of speculative transactions resolves any correctness violations that may occur when several processors attempt to read and write the same data simultaneously. The cost of this simplified scheme is higher interprocessor bandwidth. To explore the costs and benefits of TCC, we study the characteristics of an optimal transaction-based memory system, and examine how different design parameters could affect the performance of real systems. Across a spectrum of applications, the TCC model itself did not limit available parallelism. Most applications are easily divided into transactions requiring only small write buffers, on the order of 4-8 KB. The broadcast requirements of TCC are high, but are well within the capabilities of CMPs and small-scale SMPs with high-speed interconnects.

international symposium on computer architecture | 2006

Architectural Semantics for Practical Transactional Memory

Austen McDonald; JaeWoong Chung; Brian D. Carlstrom; Chi Cao Minh; Hassan Chafi; Christos Kozyrakis; Kunle Olukotun

Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software interface, not limited to simplistic instructions defining transaction boundaries. Without rich semantics, current TM systems cannot support basic features of modern programming languages and operating systems such as transparent library calls, conditional synchronization, system calls, I/O, and runtime exceptions. This paper presents a comprehensive instruction set architecture (ISA) for TM systems. Our proposal introduces three key mechanisms: two-phase commit; support for software handlers on commit, violation, and abort; and full support for open- and closed-nested transactions with independent rollback. These mechanisms provide a flexible interface to implement programming language and operating system functionality. We also show that these mechanisms are practical to implement at the ISA and microarchitecture level for various TM systems. Using an execution-driven simulation, we demonstrate both the functionality (e.g., I/O and conditional scheduling within transactions) and performance potential (2.2� improvement for SPECjbb2000) of the proposed mechanisms. Overall, this paper establishes a rich and efficient interface to foster both hardware and software research on transactional memory.

programming language design and implementation | 2006

The Atomos transactional programming language

Brian D. Carlstrom; Austen McDonald; Hassan Chafi; JaeWoong Chung; Chi Cao Minh; Christoforos E. Kozyrakis; Kunle Olukotun

Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives.The Atomos watch statement allows programmers to specify fine-grained watch sets used with the Atomos retry conditional waiting statement for efficient transactional conflict-driven wakeup even in transactional memory systems with a limited number of transactional contexts. Atomos supports open-nested transactions, which are necessary for building both scalable application programs and virtual machine implementations.The implementation of the Atomos scheduler demonstrates the use of open nesting within the virtual machine and introduces the concept of transactional memory violation handlers that allow programs to recover from data dependency violations without rolling back.Atomos programming examples are given to demonstrate the usefulness of transactional programming primitives. Atomos and Java are compared through the use of several benchmarks. The results demonstrate both the improvements in parallel programming ease and parallel program performance provided by Atomos.

architectural support for programming languages and operating systems | 2004

Programming with transactional coherence and consistency (TCC)

Lance Hammond; Brian D. Carlstrom; Vicky Wong; Ben Hertzberg; Michael K. Chen; Christos Kozyrakis; Kunle Olukotun

Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fundamental unit of parallel work, communication and coherence. As each transaction completes, it writes all of its newly produced state to shared memory atomically, while restarting other processors that have speculatively read stale data. With this mechanism, a TCC-based system automatically handles data synchronization correctly, without programmer intervention. To gain the benefits of TCC, programs must be decomposed into transactions. We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism. With these constructs, writing correct parallel programs requires only small, incremental changes to correct sequential programs. The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.

high-performance computer architecture | 2007

A Scalable, Non-blocking Approach to Transactional Memory

Hassan Chafi; Jared Casper; Brian D. Carlstrom; Austen McDonald; Chi Cao Minh; Woongki Baek; Christos Kozyrakis; Kunle Olukotun

Transactional memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on transactional coherence and consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directory-based TCC design scales efficiently for NUMA systems up to 64 processors

high-performance computer architecture | 2006

The common case transactional behavior of multithreaded programs

JaeWoong Chung; Hassan Chafi; Chi Cao Minh; Austen McDonald; Brian D. Carlstrom; Christos Kozyrakis; Kunle Olukotun

Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. This work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. The analysis covers basic characteristics such as transaction length, distribution of read-set and write-set size, and the frequency of nesting and I/O operations. The measured characteristics provide key insights into the design of efficient TM systems for both non-blocking synchronization and speculative parallelization.

architectural support for programming languages and operating systems | 2006

Tradeoffs in transactional memory virtualization

JaeWoong Chung; Chi Cao Minh; Austen McDonald; Travis Skare; Hassan Chafi; Brian D. Carlstrom; Christos Kozyrakis; Kunle Olukotun

For transactional memory (TM) to achieve widespread acceptance, transactions should not be limited to the physical resources of any specific hardware implementation. TM systems should guarantee correct execution even when transactions exceed scheduling quanta, overflow the capacity of hardware caches and physical memory, or include more independent nesting levels than what is supported in hardware. Existing proposals for TM virtualization are either incomplete or rely on complex hardware implementations, which are an overkill if virtualization is invoked infrequently in the common case.We present eXtended Transactional Memory (XTM), the first TM virtualization system that virtualizes all aspects of transactional execution (time, space, and nesting depth). XTM is implemented in software using virtual memory support. It operates at page granularity, using private copies of overflowed pages to buffer memory updates until the transaction commits and snapshots of pages to detect interference between transactions. We also describe two enhancements to XTM that use limited hardware support to address key performance bottlenecks.We compare XTM to hardwarebased virtualization using both real applications and synthetic microbenchmarks. We show that despite being software-based, XTM and its enhancements are competitive with hardware-based alternatives. Overall, we demonstrate that XTM provides a complete, flexible, and low-cost mechanism for practical TM virtualization.

international symposium on microarchitecture | 2004

Transactional coherence and consistency: simplifying parallel hardware and software

Lance Hammond; Brian D. Carlstrom; Vicky Wong; Michael K. Chen; C. Koryrakis; Kunle Olukotun

Transactional coherence and consistency (TCC) simplifies parallel hardware and software design by eliminating the need for conventional cache coherence and consistency models and letting programmers parallelize a wide range of applications with a simple, lock-free transactional model. TCC eases both parallel programming and parallel architecture design by relying on programmer-defined transactions as the basic unit of parallel work, communication, memory coherence, and memory consistency

international conference on parallel architectures and compilation techniques | 2005

Characterization of TCC on chip-multiprocessors

Austen McDonald; JaeWoong Chung; Hassan Chafi; Chi Cao Minh; Brian D. Carlstrom; Lance Hammond; Christos Kozyrakis; Kunle Olukotun

Transactional coherence and consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization, coherence, and consistency. TCC has the potential to simplify parallel program development and optimization by providing a smooth transition from sequential to parallel programs. In this paper, we study the implementation of TCC on chip-multiprocessors (CMPs). We explore design alternatives such as the granularity of state tracking, double-buffering, and write-update and write-invalidate protocols. Furthermore, we characterize the performance of TCC in comparison to conventional snoopy cache coherence (SCC) using parallel applications optimized for each scheme. We conclude that the two coherence schemes perform similarly, with each scheme having a slight advantage for some applications. The bandwidth requirements of TCC are slightly higher but well within the capabilities of CMP systems. Also, we find that overflow of speculative state can be effectively handled by a simple victim cache. Our results suggest TCC can provide its programming advantages without compromising the performance expected from well-tuned parallel applications.

acm sigplan symposium on principles and practice of parallel programming | 2007

Transactional collection classes

Brian D. Carlstrom; Austen McDonald; Michael Carbin; Christos Kozyrakis; Kunle Olukotun

While parallel programmers find it easier to reason about large atomic regions, the conventional mutual exclusion-based primitives for synchronization force them to interleave many small operations to achieve performance. Transactional memory promises that programmer scan use large atomic regions while achieving similar performance. However, these large transactions can conflict when operating on shared data structures, even for logically independent operations. Transactional collection classes address this problem by allowing long-running transactions to operate on shared data while eliminating unnecessary conflicts. Transactional collection classes wrap existing data structures, without the need for custom implementations or knowledge of data structure internals. Without transactional collection classes, access to shared datafrom within long-running transactions can suffer from data dependency conflicts that are logically unnecessary, but are artifacts of the data structure implementation such as hash table collisions or tree-balancing rotations. Our transactional collection classes use the concept of semantic concurrency control to eliminate these unnecessary data dependencies, replacing them with conflict detection based on the operations of the abstract data type. The design and behavior of these transactional collection classes is discussed with reference to the related work from the database community such as multi-level transactions and semantic concurrency control, as well as other concurrent data structures such as java.util.concurrent. The required transactional semantics needed for implementing transactional collection are enumerated, including open-nested transactions and commit and abort handlers. We also discuss how isolation can be reduced for greater concurrency. Finally, we provide guidelines on the construction of classes that preserve isolation and serializability. The performance of these classes is evaluated with a number of benchmarks including targeted micro-benchmarks and a version of SPECjbb2000 with increased contention. The results show that easier-to-use long transactions can still allow programs to deliver scalable performance by simply wrapping existing data structures with transactional collection classes.

Explore More