James Cownie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James Cownie is active.

Explore More

Publication

Featured researches published by James Cownie.

symposium on code generation and optimization | 2010

PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs

Harish Patil; Cristiano Pereira; Mack Stallcup; Gregory Lueck; James Cownie

Analysis of parallel programs is hard mainly because their behavior changes from run to run. We present an execution capture and deterministic replay system that enables repeatable analysis of parallel programs. Our goal is to provide an easy-to-use framework for capturing, deterministically replaying, and analyzing execution of large programs with reasonable runtime and disk usage. Our system, called PinPlay, is based on the popular Pin dynamic instrumentation system hence is very easy to use. PinPlay extends the capability of Pin-based analysis by providing a tool for capturing one execution instance of a program (as log files called pinballs) and by allowing Pin-based tools to run off the captured execution. Most Pintools can be trivially modified to work off pinballs thus doing their usual analysis but with a guaranteed repeatability. Furthermore, the capture/replay works across operating systems (Windows to Linux) as the pinball format is independent of the operating system. We have used PinPlay to analyze and deterministically debug large parallel programs running trillions of instructions. This paper describes the design of PinPlay and its applications for analyses such as simulation point selection, tracing, and debugging.

conference on object-oriented programming systems, languages, and applications | 2008

Design and implementation of transactional constructs for C/C++

Yang Ni; Adam Welc; Ali-Reza Adl-Tabatabai; Moshe Bach; Sion Berkowits; James Cownie; Robert Geva; Sergey Kozhukow; Ravi Narayanaswamy; Jeffrey V. Olivier; Serguei V. Preis; Bratin Saha; Ady Tal; Xinmin Tian

This paper presents a software transactional memory system that introduces first-class C++ language constructs for transactional programming. We describe new C++ language extensions, a production-quality optimizing C++ compiler that translates and optimizes these extensions, and a high-performance STM runtime library. The transactional language constructs support C++ language features including classes, inheritance, virtual functions, exception handling, and templates. The compiler automatically instruments the program for transactional execution and optimizes TM overheads. The runtime library implements multiple execution modes and implements a novel STM algorithm that supports both optimistic and pessimistic concurrency control. The runtime switches a transactions execution mode dynamically to improve performance and to handle calls to precompiled functions and I/O libraries. We present experimental results on 8 cores (two quad-core CPUs) running a set of 20 non-trivial parallel programs. Our measurements show that our system scales well as the numbers of cores increases and that our compiler and runtime optimizations improve scalability.

international workshop on openmp | 2003

DMPL: an OpenMP DLL debugging interface

James Cownie; John DelSignore; Bronis R. de Supinski; Karen H. Warren

OpenMP is a widely adopted standard for threading directives across compiler implementations. The standard is very successful since it provides application writers with a simple, portable programming model for introducing shared memory parallelism into their codes. However, the standards do not address key issues for supporting that programming model in development tools such as debuggers. In this paper, we present DMPL, an OpenMP debugger interface that can be implemented as a dynamically loaded library. DMPL is currently being considered by the OpenMP Tools Committee as a mechanism to bridge the development tool gap in the OpenMP standard.

international workshop on openmp | 2014

A User-Guided Locking API for the OpenMP* Application Program Interface

Hansang Bae; James Cownie; Michael Klemm; Christian Terboven

Although the OpenMP API specification defines a set of runtime routines for simple and nested locks, there is no standardized way to select different lock implementations. Programmers have to use vendor extensions to globally alter the lock implementation for the application; fine-grained control is not possible. Proper use of hardware-based speculative locks can achieve significant runtime improvements but, if used inappropriately, they can lead to severe performance penalties. Thus programmers need to be able to explicitly choose the right lock implementation on a per-lock basis. In this paper, we extend the OpenMP API for locks with functions to provide such hints to the implementation. We also extend the syntax and semantics of the critical construct with clauses to contain hints. Our performance results for micro-benchmarks show that the runtime selection of lock implementations does not add any noticeable overhead. We also show that using an appropriate runtime hint can improve application performance.

international workshop on openmp | 2015

On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms II: User-Guided Speculative Locks

Barna L. Bihari; Hansang Bae; James Cownie; Michael Klemm; Christian Terboven; Lori A. Diachin

In this paper we continue our investigations started in [8] into the effects of using different synchronization mechanisms in OpenMP-threaded iterative mesh optimization algorithms. We port our test code to the Intel® Xeon® processor (former codename “Haswell”) by employing a user-guided locking API for OpenMP [4] that provides a general and unified user interface and runtime framework. Since the Intel® Transactional Synchronization Extensions (TSX) provide two different options for speculation — Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM) — we compare a total of four different run modes: (i) HLE, (ii) RTM, (iii) OpenMP critical, and (iv) “unsynchronized”. As we did in [8], we find that either speculative execution option always outperforms the other two modes in terms of their convergence characteristics. Even with their higher overhead, the TSX options are very competitive when it comes to runtime performance measured with the “time-to-convergence” criterion introduced in [8].

Archive | 2004