Chi Cao Minh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chi Cao Minh is active.

Explore More

Publication

Featured researches published by Chi Cao Minh.

ieee international symposium on workload characterization | 2008

STAMP: Stanford Transactional Applications for Multi-Processing

Chi Cao Minh; JaeWoong Chung; Christos Kozyrakis; Kunle Olukotun

Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

international symposium on computer architecture | 2007

An effective hybrid transactional memory system with strong isolation guarantees

Chi Cao Minh; Martin Trautmann; JaeWoong Chung; Austen McDonald; Nathan Grasso Bronson; Jared Casper; Christos Kozyrakis; Kunle Olukotun

We propose signature-accelerated transactional memory (SigTM), ahybrid TM system that reduces the overhead of software transactions. SigTM uses hardware signatures to track the read-set and write-set forpending transactions and perform conflict detection between concurrent threads. All other transactional functionality, including dataversioning, is implemented in software. Unlike previously proposed hybrid TM systems, SigTM requires no modifications to the hardware caches, which reduces hardware cost and simplifies support for nested transactions and multithreaded processor cores. SigTM is also the first hybrid TM system to provide strong isolation guarantees between transactional blocks and non-transactional accesses without additional read and write barriers in non-transactional code. Using a set of parallel programs that make frequent use of coarse-grain transactions, we show that SigTM accelerates software transactions by 30% to 280%. For certain workloads, SigTM can match the performance of a full-featured hardware TM system, while for workloads with large read-sets it can be up to two times slower. Overall, we show that SigTM combines the performance characteristics and strong isolation guarantees of hardware TM implementations with the low cost and flexibility of software TM systems.

international symposium on computer architecture | 2006

Architectural Semantics for Practical Transactional Memory

Austen McDonald; JaeWoong Chung; Brian D. Carlstrom; Chi Cao Minh; Hassan Chafi; Christos Kozyrakis; Kunle Olukotun

Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software interface, not limited to simplistic instructions defining transaction boundaries. Without rich semantics, current TM systems cannot support basic features of modern programming languages and operating systems such as transparent library calls, conditional synchronization, system calls, I/O, and runtime exceptions. This paper presents a comprehensive instruction set architecture (ISA) for TM systems. Our proposal introduces three key mechanisms: two-phase commit; support for software handlers on commit, violation, and abort; and full support for open- and closed-nested transactions with independent rollback. These mechanisms provide a flexible interface to implement programming language and operating system functionality. We also show that these mechanisms are practical to implement at the ISA and microarchitecture level for various TM systems. Using an execution-driven simulation, we demonstrate both the functionality (e.g., I/O and conditional scheduling within transactions) and performance potential (2.2� improvement for SPECjbb2000) of the proposed mechanisms. Overall, this paper establishes a rich and efficient interface to foster both hardware and software research on transactional memory.

programming language design and implementation | 2006

The Atomos transactional programming language

Brian D. Carlstrom; Austen McDonald; Hassan Chafi; JaeWoong Chung; Chi Cao Minh; Christoforos E. Kozyrakis; Kunle Olukotun

Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives.The Atomos watch statement allows programmers to specify fine-grained watch sets used with the Atomos retry conditional waiting statement for efficient transactional conflict-driven wakeup even in transactional memory systems with a limited number of transactional contexts. Atomos supports open-nested transactions, which are necessary for building both scalable application programs and virtual machine implementations.The implementation of the Atomos scheduler demonstrates the use of open nesting within the virtual machine and introduces the concept of transactional memory violation handlers that allow programs to recover from data dependency violations without rolling back.Atomos programming examples are given to demonstrate the usefulness of transactional programming primitives. Atomos and Java are compared through the use of several benchmarks. The results demonstrate both the improvements in parallel programming ease and parallel program performance provided by Atomos.

high-performance computer architecture | 2007

A Scalable, Non-blocking Approach to Transactional Memory

Hassan Chafi; Jared Casper; Brian D. Carlstrom; Austen McDonald; Chi Cao Minh; Woongki Baek; Christos Kozyrakis; Kunle Olukotun

Transactional memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on transactional coherence and consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directory-based TCC design scales efficiently for NUMA systems up to 64 processors

high-performance computer architecture | 2006

The common case transactional behavior of multithreaded programs

JaeWoong Chung; Hassan Chafi; Chi Cao Minh; Austen McDonald; Brian D. Carlstrom; Christos Kozyrakis; Kunle Olukotun

Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. This work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. The analysis covers basic characteristics such as transaction length, distribution of read-set and write-set size, and the frequency of nesting and I/O operations. The measured characteristics provide key insights into the design of efficient TM systems for both non-blocking synchronization and speculative parallelization.

architectural support for programming languages and operating systems | 2006

Tradeoffs in transactional memory virtualization

JaeWoong Chung; Chi Cao Minh; Austen McDonald; Travis Skare; Hassan Chafi; Brian D. Carlstrom; Christos Kozyrakis; Kunle Olukotun

For transactional memory (TM) to achieve widespread acceptance, transactions should not be limited to the physical resources of any specific hardware implementation. TM systems should guarantee correct execution even when transactions exceed scheduling quanta, overflow the capacity of hardware caches and physical memory, or include more independent nesting levels than what is supported in hardware. Existing proposals for TM virtualization are either incomplete or rely on complex hardware implementations, which are an overkill if virtualization is invoked infrequently in the common case.We present eXtended Transactional Memory (XTM), the first TM virtualization system that virtualizes all aspects of transactional execution (time, space, and nesting depth). XTM is implemented in software using virtual memory support. It operates at page granularity, using private copies of overflowed pages to buffer memory updates until the transaction commits and snapshots of pages to detect interference between transactions. We also describe two enhancements to XTM that use limited hardware support to address key performance bottlenecks.We compare XTM to hardwarebased virtualization using both real applications and synthetic microbenchmarks. We show that despite being software-based, XTM and its enhancements are competitive with hardware-based alternatives. Overall, we demonstrate that XTM provides a complete, flexible, and low-cost mechanism for practical TM virtualization.

international conference on parallel architectures and compilation techniques | 2007

The OpenTM Transactional Application Programming Interface

Woongki Baek; Chi Cao Minh; Martin Trautmann; Christos Kozyrakis; Kunle Olukotun

Transactional Memory (TM) simplifies parallel programming by supporting atomic and isolated execution of user-identified tasks. To date, TM programming has re quired the use of libraries that make it difficult to achieve scalable performance with code that is easy to develop and maintain. For TM programming to become practical, it is important to integrate TM into familiar, high-level environments for parallel programming. This paper presents OpenTM, an application programming interface (API) for parallel programming with transactions. OpenTM extends OpenMP, a widely used API for shared-memory parallel programming, with a set of compiler directives to express non-blocking synchronization and speculative parallelization based on memory transactions. We also present a portable OpenTM implementation that produces code for hardware, software, and hybrid TM systems. The implementation builds upon the OpenMP support in the GCC compiler and includes a runtime for the C programming language. We evaluate the performance and programmability features of OpenTM. We show that it delivers the performance of fine-grain locks at the programming simplicity of coarse- grain locks. Compared to transactional programming with lower-level interfaces, it removes the burden of manual annotations for accesses to shared variables and enables easy changes of the scheduling and contention management policies. Overall, OpenTM provides a practical and efficient TM programming environment within the familiar scope of OpenMP.

international conference on parallel architectures and compilation techniques | 2005

Characterization of TCC on chip-multiprocessors

Austen McDonald; JaeWoong Chung; Hassan Chafi; Chi Cao Minh; Brian D. Carlstrom; Lance Hammond; Christos Kozyrakis; Kunle Olukotun

Transactional coherence and consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization, coherence, and consistency. TCC has the potential to simplify parallel program development and optimization by providing a smooth transition from sequential to parallel programs. In this paper, we study the implementation of TCC on chip-multiprocessors (CMPs). We explore design alternatives such as the granularity of state tracking, double-buffering, and write-update and write-invalidate protocols. Furthermore, we characterize the performance of TCC in comparison to conventional snoopy cache coherence (SCC) using parallel applications optimized for each scheme. We conclude that the two coherence schemes perform similarly, with each scheme having a slight advantage for some applications. The bandwidth requirements of TCC are slightly higher but well within the capabilities of CMP systems. Also, we find that overflow of speculative state can be effectively handled by a simple victim cache. Our results suggest TCC can provide its programming advantages without compromising the performance expected from well-tuned parallel applications.

ieee international symposium on workload characterization | 2010

Analysis on semantic transactional memory footprint for hardware transactional memory

Jaewoong Chung; Dhruva R. Chakrabarti; Chi Cao Minh

We analyze various characteristics of semantic transactional memory footprint (STMF) that consists of only the memory accesses the underlying hardware transactional memory (HTM) system has to manage for the correct execution of transactional programs. Our analysis shows that STMF can be significantly smaller than declarative transactional memory footprint (DTMF) that contains all memory accesses within transaction boundaries (i.e., only 8.3% of DTMF in the applications examined). This result encourages processor designers and software toolchain developers to explore new design points for low-cost HTM systems and intelligent software toolchains to find and leverage STMF efficiently. We identify seven code patterns that belong to DTMF, but not to STMF, and show that they take up 91.7% of all memory accesses in transactional boundaries, on average, for the transactional programs examined. A new instruction prefix is proposed to express STMF efficiently, and the existing compiler techniques are examined to check their applicability to deduce STMF from DTMF. Our trace analysis shows that using STMF significantly reduces the ratio of transactions overflowing a 32KB L1 cache, from 12.80% to 2.00%, and substantially lowers the false positive probability of Bloom filters used for transaction signature management, from 23.60% to less than 0.001%. The simulation result shows that the STAMP applications with the STMF expression run 40% faster on average than those with the DTMF expression.

Explore More