Nathan Grasso Bronson

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nathan Grasso Bronson is active.

Explore More

Publication

Featured researches published by Nathan Grasso Bronson.

international symposium on computer architecture | 2007

An effective hybrid transactional memory system with strong isolation guarantees

Chi Cao Minh; Martin Trautmann; JaeWoong Chung; Austen McDonald; Nathan Grasso Bronson; Jared Casper; Christos Kozyrakis; Kunle Olukotun

We propose signature-accelerated transactional memory (SigTM), ahybrid TM system that reduces the overhead of software transactions. SigTM uses hardware signatures to track the read-set and write-set forpending transactions and perform conflict detection between concurrent threads. All other transactional functionality, including dataversioning, is implemented in software. Unlike previously proposed hybrid TM systems, SigTM requires no modifications to the hardware caches, which reduces hardware cost and simplifies support for nested transactions and multithreaded processor cores. SigTM is also the first hybrid TM system to provide strong isolation guarantees between transactional blocks and non-transactional accesses without additional read and write barriers in non-transactional code. Using a set of parallel programs that make frequent use of coarse-grain transactions, we show that SigTM accelerates software transactions by 30% to 280%. For certain workloads, SigTM can match the performance of a full-featured hardware TM system, while for workloads with large read-sets it can be up to two times slower. Overall, we show that SigTM combines the performance characteristics and strong isolation guarantees of hardware TM implementations with the low cost and flexibility of software TM systems.

acm sigplan symposium on principles and practice of parallel programming | 2010

A practical concurrent binary search tree

Nathan Grasso Bronson; Jared Casper; Hassan Chafi; Kunle Olukotun

We propose a concurrent relaxed balance AVL tree algorithm that is fast, scales well, and tolerates contention. It is based on optimistic techniques adapted from software transactional memory, but takes advantage of specific knowledge of the the algorithm to reduce overheads and avoid unnecessary retries. We extend our algorithm with a fast linearizable clone operation, which can be used for consistent iteration of the tree. Experimental evidence shows that our algorithm outperforms a highly tuned concurrent skip list for many access patterns, with an average of 39% higher single-threaded throughput and 32% higher multi-threaded throughput over a range of contention levels and operation mixes.

acm sigplan symposium on principles and practice of parallel programming | 2012

Concurrent tries with efficient non-blocking snapshots

Aleksandar Prokopec; Nathan Grasso Bronson; Phil Bagwell

We describe a non-blocking concurrent hash trie based on shared-memory single-word compare-and-swap instructions. The hash trie supports standard mutable lock-free operations such as insertion, removal, lookup and their conditional variants. To ensure space-efficiency, removal operations compress the trie when necessary. We show how to implement an efficient lock-free snapshot operation for concurrent hash tries. The snapshot operation uses a single-word compare-and-swap and avoids copying the data structure eagerly. Snapshots are used to implement consistent iterators and a linearizable size retrieval. We compare concurrent hash trie performance with other concurrent data structures and evaluate the performance of the snapshot operation.

conference on object-oriented programming systems, languages, and applications | 2011

Testing atomicity of composed concurrent operations

Ohad Shacham; Nathan Grasso Bronson; Alex Aiken; Mooly Sagiv; Martin T. Vechev; Eran Yahav

We address the problem of testing atomicity of composed concurrent operations. Concurrent libraries help programmers exploit parallel hardware by providing scalable concurrent operations with the illusion that each operation is executed atomically. However, client code often needs to compose atomic operations in such a way that the resulting composite operation is also atomic while preserving scalability. We present a novel technique for testing the atomicity of client code composing scalable concurrent operations. The challenge in testing this kind of client code is that a bug may occur very rarely and only on a particular interleaving with a specific thread configuration. Our technique is based on modular testing of client code in the presence of an adversarial environment; we use commutativity specifications to drastically reduce the number of executions explored to detect a bug. We implemented our approach in a tool called COLT, and evaluated its effectiveness on a range of 51 real-world concurrent Java programs. Using COLT, we found 56 atomicity violations in Apache Tomcat, Cassandra, MyFaces Trinidad, and other applications.

ieee international symposium on workload characterization | 2010

Eigenbench: A simple exploration tool for orthogonal TM characteristics

Sungpack Hong; Tayo Oguntebi; Jared Casper; Nathan Grasso Bronson; Christos Kozyrakis; Kunle Olukotun

There are a significant number of Transactional Memory(TM) proposals, varying in almost all aspects of the design space. Although several transactional benchmarks have been suggested, a simple, yet thorough, evaluation framework is still needed to completely characterize a TM system and allow for comparison among the various proposals. Unfortunately, TM system evaluation is difficult because the application characteristics which affect performance are often difficult to isolate from each other. We propose a set of orthogonal application characteristics that form a basis for transactional behavior and are useful in fully understanding the performance of a TM system. In this paper, we present EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system. We show that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics. Because of its flexibility, our microbenchmark is also capable of reproducing a representative set of TM performance pathologies. In this paper, we use Eigenbench to evaluate two well-known TM systems and provide significant insight about their strengths and weaknesses. We also demonstrate how EigenBench can be used to mimic the evaluation coverage of a popular TM benchmark suite called STAMP.

principles of distributed computing | 2010

Transactional predication: high-performance concurrent sets and maps for STM

Nathan Grasso Bronson; Jared Casper; Hassan Chafi; Kunle Olukotun

Concurrent collection classes are widely used in multi-threaded programming, but they provide atomicity only for a fixed set of operations. Software transactional memory (STM) provides a convenient and powerful programming model for composing atomic operations, but concurrent collection algorithms that allow their operations to be composed using STM are significantly slower than their non-composable alternatives. We introduce transactional predication, a method for building transactional maps and sets on top of an underlying non-composable concurrent map. We factor the work of most collection operations into two parts: a portion that does not need atomicity or isolation, and a single transactional memory access. The result approximates semantic conflict detection using the STMs structural conflict detection mechanism. The separation also allows extra optimizations when the collection is used outside a transaction. We perform an experimental evaluation that shows that predication has better performance than existing transactional collection algorithms across a range of workloads.

conference on object-oriented programming systems, languages, and applications | 2011

Automatic fine-grain locking using shape properties

Guy Golan-Gueta; Nathan Grasso Bronson; Alex Aiken; G. Ramalingam; Mooly Sagiv; Eran Yahav

We present a technique for automatically adding fine-grain locking to an abstract data type that is implemented using a dynamic forest -i.e., the data structures may be mutated, even to the point of violating forestness temporarily during the execution of a method of the ADT. Our automatic technique is based on Domination Locking, a novel locking protocol. Domination locking is designed specifically for software concurrency control, and in particular is designed for object-oriented software with destructive pointer updates. Domination locking is a strict generalization of existing locking protocols for dynamically changing graphs. We show our technique can successfully add fine-grain locking to libraries where manually performing locking is extremely challenging. We show that automatic fine-grain locking is more efficient than coarse-grain locking, and obtains similar performance to hand-crafted fine-grain locking.

architectural support for programming languages and operating systems | 2011

Hardware acceleration of transactional memory on commodity systems

Jared Casper; Tayo Oguntebi; Sungpack Hong; Nathan Grasso Bronson; Christos Kozyrakis; Kunle Olukotun

The adoption of transactional memory is hindered by the high overhead of software transactional memory and the intrusive design changes required by previously proposed TM hardware. We propose that hardware to accelerate software transactional memory (STM) can reside outside an unmodified commodity processor core, thereby substantially reducing implementation costs. This paper introduces Transactional Memory Acceleration using Commodity Cores (TMACC), a hardware-accelerated TM system that does not modify the processor, caches, or coherence protocol. We present a complete hardware implementation of TMACC using a rapid prototyping platform. Using this hardware, we implement two unique conflict detection schemes which are accelerated using Bloom filters on an FPGA. These schemes employ novel techniques for tolerating the latency of fine-grained asynchronous communication with an out-of-core accelerator. We then conduct experiments to explore the feasibility of accelerating TM without modifying existing system hardware. We show that, for all but short transactions, it is not necessary to modify the processor to obtain substantial improvement in TM performance. In these cases, TMACC outperforms an STM by an average of 69% in applications using moderate-length transactions, showing maximum speedup within 8% of an upper bound on TM acceleration. Overall, we demonstrate that hardware can substantially accelerate the performance of an STM on unmodified commodity processors.

symposium on principles of programming languages | 2009

Feedback-directed barrier optimization in a strongly isolated STM

Nathan Grasso Bronson; Christos Kozyrakis; Kunle Olukotun

Speed improvements in todays processors have largely been delivered in the form of multiple cores, increasing the importance of abstractions that ease parallel programming. Software transactional memory (STM) addresses many of the complications of concurrency by providing a simple and composable model for safe access to shared data structures. Software transactions extend a language with an atomic primitive that declares that the effects of a block of code should not be interleaved with actions executing concurrently on other threads. Adding barriers to shared memory accesses provides atomicity, consistency and isolation. Strongly isolated STMs preserve the safety properties of transactions for all memory operations in a program, not just those inside an atomic block. Isolation barriers are added to non-transactional loads and stores in such a system to prevent those accesses from observing or corrupting a partially completed transaction. Strong isolation is especially important when integrating transactions into an existing language and memory model. Isolation barriers have a prohibitive performance overhead, however, so most STM proposals have chosen not to provide strong isolation. In this paper we reduce the costs of strong isolation by customizing isolation barriers for their observed usage. The customized barriers provide accelerated execution by blocking threads whose accesses do not follow the expected pattern. We use hot swap to tighten or loosen the hypothesized pattern, while preserving strong isolation. We introduce a family of optimization hypotheses that balance verification cost against generality. We demonstrate the feasibility of dynamic barrier optimization by implementing it in a bytecode-rewriting Java STM. Feedback-directed customization reduces the overhead of strong isolation from 505% to 38% across 11 non-transactional benchmarks; persistent feedback data further reduces the overhead to 16%. Dynamic optimization accelerates a multi-threaded transactional benchmark by 31% for weakly-isolated execution and 34% for strongly-isolated execution.

acm symposium on parallel algorithms and architectures | 2010

Implementing and evaluating nested parallel transactions in software transactional memory

Woongki Baek; Nathan Grasso Bronson; Christos Kozyrakis; Kunle Olukotun

Transactional Memory (TM) is a promising technique that simplifies parallel programming for shared-memory applications. To date, most TM systems have been designed to efficiently support single-level parallelism. To achieve widespread use and maximize performance gains, TM must support nested parallelism available in many applications and supported by several programming models. We present NesTM, a software TM (STM) system that supports closed-nested parallel transactions. NesTM is based on a high-performance, blocking STM that uses eager version management and word-granularity conflict detection. Its algorithm targets the state and runtime overheads of nested parallel transactions. We also describe several subtle correctness issues in supporting nested parallel transactions in NesTM and discuss their performance impact. Through our evaluation, we quantitatively analyze the performance of NesTM using STAMP applications and microbenchmarks based on concurrent data structures. First, we show that the performance overhead of NesTM is reasonable when single-level parallelism is used. Second, we quantify the incremental overhead of NesTM when the parallelism is exploited in deeper nesting levels and draw conclusions that can be useful in designing a nesting-aware TM runtime environment. Finally, we demonstrate a use-case where nested parallelism improves the performance of a transactional microbenchmark.

Explore More