Bratin Saha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bratin Saha is active.

Explore More

Publication

Featured researches published by Bratin Saha.

acm sigplan symposium on principles and practice of parallel programming | 2006

McRT-STM: a high performance software transactional memory system for a multi-core runtime

Bratin Saha; Ali-Reza Adl-Tabatabai; Richard L. Hudson; Chi Cao Minh; Benjamin C. Hertzberg

Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex based synchronization). Unfortunately, lock based synchronization often leads to deadlocks, makes fine-grained synchronization difficult, hinders composition of atomic primitives, and provides no support for error recovery. Transactions avoid many of these problems, and therefore, promise to ease concurrent programming.We describe a software transactional memory (STM) system that is part of McRT, an experimental Multi-Core RunTime. The McRT-STM implementation uses a number of novel algorithms, and supports advanced features such as nested transactions with partial aborts, conditional signaling within a transaction, and object based conflict detection for C/C++ applications. The McRT-STM exports interfaces that can be used from C/C++ programs directly or as a target for compilers translating higher level linguistic constructs.We present a detailed performance analysis of various STM design tradeoffs such as pessimistic versus optimistic concurrency, undo logging versus write buffering, and cache line based versus object based conflict detection. We also show a MCAS implementation that works on arbitrary values, coexists with the STM, and can be used as a more efficient form of transactional memory. To provide a baseline we compare the performance of the STM with that of fine-grained and coarse-grained locking using a number of concurrent data structures on a 16-processor SMP system. We also show our STM performance on a non-synthetic workload -- the Linux sendmail application.

programming language design and implementation | 2006

Compiler and runtime support for efficient software transactional memory

Ali-Reza Adl-Tabatabai; Brian T. Lewis; Vijay Menon; Brian R. Murphy; Bratin Saha; Tatiana Shpeisman

Programmers have traditionally used locks to synchronize concurrent access to shared data. Lock-based synchronization, however, has well-known pitfalls: using locks for fine-grain synchronization and composing code that already uses locks are both difficult and prone to deadlock. Transactional memory provides an alternate concurrency control mechanism that avoids these pitfalls and significantly eases concurrent programming. Transactional memory language constructs have recently been proposed as extensions to existing languages or included in new concurrent language specifications, opening the door for new compiler optimizations that target the overheads of transactional memory.This paper presents compiler and runtime optimizations for transactional memory language constructs. We present a high-performance software transactional memory system (STM) integrated into a managed runtime environment. Our system efficiently implements nested transactions that support both composition of transactions and partial roll back. Our JIT compiler is the first to optimize the overheads of STM, and we show novel techniques for enabling JIT optimizations on STM operations. We measure the performance of our optimizations on a 16-way SMP running multi-threaded transactional workloads. Our results show that these techniques enable transactional memorys performance to compete with that of well-tuned synchronization.

programming language design and implementation | 2007

Enforcing isolation and ordering in STM

Tatiana Shpeisman; Vijay Menon; Ali-Reza Adl-Tabatabai; Steven Balensiefer; Dan Grossman; Richard L. Hudson; Katherine F. Moore; Bratin Saha

Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions. A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner. In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems via strong atomicity. We demonstrate how to implement non-transactional data accesses via efficient read and write barriers, and we present compiler optimizations that further reduce the overheads of these barriers. We introduce a dynamic escape analysis that differentiates private and public data at runtime to make barriers cheaper and a static not-accessed-in-transaction analysis that removes many barriers completely. Our results on a set of Java programs show that strong atomicity can be implemented efficiently in a high-performance STM system.

acm sigplan symposium on principles and practice of parallel programming | 2007

Open nesting in software transactional memory

Yang Ni; Vijay Menon; Ali-Reza Adl-Tabatabai; Antony L. Hosking; Richard L. Hudson; J. Eliot B. Moss; Bratin Saha; Tatiana Shpeisman

Transactional memory (TM) promises to simplify concurrent programming while providing scalability competitive to fine-grained locking. Language-based constructs allow programmers to denote atomic regions declaratively and to rely on the underlying system to provide transactional guarantees along with concurrency. In contrast with fine-grained locking, TM allows programmers to write simpler programs that are composable and deadlock-free. TM implementations operate by tracking loads and stores to memory and by detecting concurrent conflicting accesses by different transactions. By automating this process, they greatly reduce the programmers burden, but they also are forced to be conservative. Incertain cases, conflicting memory accesses may not actually violate the higher-level semantics of a program, and a programmer may wish to allow seemingly conflicting transactions to execute concurrently. Open nested transactions enable expert programmers to differentiate between physical conflicts, at the level of memory, and logical conflicts that actually violate application semantics. A TMsystem with open nesting can permit physical conflicts that are not logical conflicts, and thus increase concurrency among application threads. Here we present an implementation of open nested transactions in a Java-based software transactional memory (STM)system. We describe new language constructs to support open nesting in Java, and we discuss new abstract locking mechanisms that a programmer can use to prevent logical conflicts. We demonstrate how these constructs can be mapped efficiently to existing STM data structures. Finally, we evaluate our system on a set of Java applications and data structures, demonstrating how open nesting can enhance application scalability.

international symposium on microarchitecture | 2006

Architectural Support for Software Transactional Memory

Bratin Saha; Ali-Reza Adl-Tabatabai; Quinn A. Jacobson

Transactional memory provides a concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. Researchers have proposed several different implementations of transactional memory, broadly classified into software transactional memory (STM) and hardware transactional memory (HTM). Both approaches have their pros and cons: STMs provide rich and flexible transactional semantics on stock processors but incur significant overheads. HTMs, on the other hand, provide high performance but implement restricted semantics or add significant hardware complexity. This paper is the first to propose architectural support for accelerating transactions executed entirely in software. We propose instruction set architecture (ISA) extensions and novel hardware mechanisms that improve STM performance. We adapt a high-performance STM algorithm supporting rich transactional semantics to our ISA extensions (called hardware accelerated software transactional memory or HASTM). HASTM accelerates fully virtualized nested transactions, supports language integration, and provides both object-based and cache-line based conflict detection. We have implemented HASTM in an accurate multi-core IA32 simulator. Our simulation results show that (1) HASTM single-thread performance is comparable to a conventional HTM implementation; (2) HASTM scaling is comparable to a STM implementation; and (3) HASTM is resilient to spurious aborts and can scale better than HTM in a multi-core setting. Thus, HASTM provides the flexibility and rich semantics of STM, while giving the performance of HTM

symposium on code generation and optimization | 2007

Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language

Cheng Wang; Wei-Yu Chen; Youfeng Wu; Bratin Saha; Ali-Reza Adl-Tabatabai

Transactional memory offers significant advantages for concurrency control compared to locks. This paper presents the design and implementation of transactional memory constructs in an unmanaged language. Unmanaged languages pose a unique set of challenges to transactional memory constructs - for example, lack of type and memory safety, use of function pointers, aliasing of local variables, and others. This paper describes novel compiler and runtime mechanisms that address these challenges and optimize the performance of transactions in an unmanaged environment. We have implemented these mechanisms in a production-quality C compiler and a high-performance software transactional memory runtime. We measure the effectiveness of these optimizations and compare the performance of lock-based versus transaction-based programming on a set of concurrent data structures and the SPLASH-2 benchmark suite. On a 16 processor SMP system, the transaction-based version of the SPLASH-2 benchmarks scales much better than the coarse-grain locking version and performs comparably to the fine-grain locking version. Compiler optimizations significantly reduce the overheads of transactional memory so that, on a single thread, the transaction-based version incurs only about 6.4% overhead compared to the lock-based version for the SPLASH-2 benchmark suite. Thus, our system is the first to demonstrate that transactions integrate well with an unmanaged language, and can perform as well as fine-grain locking while providing the programming ease of coarse-grain locking even on an unmanaged environment

conference on object-oriented programming systems, languages, and applications | 2008

Design and implementation of transactional constructs for C/C++

Yang Ni; Adam Welc; Ali-Reza Adl-Tabatabai; Moshe Bach; Sion Berkowits; James Cownie; Robert Geva; Sergey Kozhukow; Ravi Narayanaswamy; Jeffrey V. Olivier; Serguei V. Preis; Bratin Saha; Ady Tal; Xinmin Tian

This paper presents a software transactional memory system that introduces first-class C++ language constructs for transactional programming. We describe new C++ language extensions, a production-quality optimizing C++ compiler that translates and optimizes these extensions, and a high-performance STM runtime library. The transactional language constructs support C++ language features including classes, inheritance, virtual functions, exception handling, and templates. The compiler automatically instruments the program for transactional execution and optimizes TM overheads. The runtime library implements multiple execution modes and implements a novel STM algorithm that supports both optimistic and pessimistic concurrency control. The runtime switches a transactions execution mode dynamically to improve performance and to handle calls to precompiled functions and I/O libraries. We present experimental results on 8 cores (two quad-core CPUs) running a set of 20 non-trivial parallel programs. Our measurements show that our system scales well as the numbers of cores increases and that our compiler and runtime optimizations improve scalability.

acm symposium on parallel algorithms and architectures | 2008

Irrevocable transactions and their applications

Adam Welc; Bratin Saha; Ali-Reza Adl-Tabatabai

Transactional memory (TM) provides a safer, more modular, and more scalable alternative to traditional lock-based synchronization. Implementing high performance TM systems has recently been an active area of research. However, current TM systems provide limited, if any, support for transactions executing irrevocable actions, such as I/O and system calls, whose side effects cannot in general be rolled back. This severely limits the ability of these systems to run commercial workloads. This paper describes the design of a transactional memory system that allows irrevocable actions to be executed inside of transactions. While one transaction is executing an irrevocable action, other transactions can still execute and commit concurrently. We use a novel mechanism called single-owner read locks to implement irrevocable actions inside transactions that maximizes concurrency and avoids overhead when the mechanism is not used. We also show how irrevocable transactions can be leveraged for contention management to handle actions whose effects may be expensive to roll back. Finally, we present a thorough performance evaluation of the irrevocability mechanism for the different usage models.

acm symposium on parallel algorithms and architectures | 2008

Practical weak-atomicity semantics for java stm

Vijay Menon; Steven Balensiefer; Tatiana Shpeisman; Ali-Reza Adl-Tabatabai; Richard L. Hudson; Bratin Saha; Adam Welc

As memory transactions have been proposed as a language-level replacement for locks, there is growing need for well-defined semantics. In contrast to database transactions, transaction memory (TM) semantics are complicated by the fact that programs may access the same memory locations both inside and outside transactions. Strongly atomic semantics, where non transactional accesses are treated as implicit single-operation transactions, remain difficult to provide without specialized hardware support or significant performance overhead. As an alternative, many in the community have informally proposed that a single global lock semantics [18,10], where transaction semantics are mapped to those of regions protected by a single global lock, provide an intuitive and efficiently implementable model for programmers. In this paper, we explore the implementation and performance implications of single global lock semantics in a weakly atomic STM from the perspective of Java, and we discuss why even recent STM implementations fall short of these semantics. We describe a new weakly atomic Java STM implementation that provides single global lock semantics while permitting concurrent execution, but we show that this comes at a significant performance cost. We also propose and implement various alternative semantics that loosen single lock requirements while still providing strong guarantees. We compare our new implementations to previous ones, including a strongly atomic STM.[24]

international symposium on memory management | 2006

McRT-Malloc: a scalable transactional memory allocator

Richard L. Hudson; Bratin Saha; Ali-Reza Adl-Tabatabai; Benjamin C. Hertzberg

Emerging multi-core processors promise to provide an exponentially increasing number of hardware threads with every generation. Applications will need to be highly concurrent to fullyuse the power of these processors. To enable maximum concurrency, libraries (such as malloc-free packages) would therefore need to use non-blocking algorithms. But lock-free algorithms are notoriously difficult to reason about and inappropriate for average programmers. Transactional memory promises to significantly ease concurrent programming for the average programmer. This paper describes a highly efficient non-blocking malloc/free algorithm that supports memory allocation and deallocation inside transactional code blocks. Thus this paper describes a memory allocator that is suitable for emerging multi-core applications, while supporting modern concurrency constructs.This paper makes several novel contributions. It is the first to integrate a software transactional memory system with a malloc/free based memory allocator. We present the first algorithm which ensures that space allocated in an aborted transaction is properly freed and does not lead to a space blowup. Unlike previous lock-free malloc packages, our algorithm avoids atomic operations on typical code paths, making our algorithm substantially more efficient.

Explore More