Justin E. Gottschlich

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Justin E. Gottschlich is active.

Explore More

Publication

Featured researches published by Justin E. Gottschlich.

international conference on parallel architectures and compilation techniques | 2014

Invyswell: a hybrid transactional memory for haswell's restricted transactional memory

Irina Calciu; Justin E. Gottschlich; Tatiana Shpeisman; Gilles Pokam; Maurice Herlihy

The Intel Haswell processor includes restricted transactional memory (RTM), which is the first commodity-based hardware transactional memory (HTM) to become publicly available. However, like other real HTMs, such as IBMs Blue Gene/Q, Haswells RTM is best-effort, meaning it provides no transactional forward progress guarantees. Because of this, a software fallback system must be used in conjunction with Haswells RTM to ensure transactional programs execute to completion. To complicate matters, Haswell does not provide escape actions. Without escape actions, non-transactional instructions cannot be executed within the context of a hardware transaction, thereby restricting the ways in which a software fallback can interact with the HTM. As such, the challenge of creating a scalable hybrid TM (HyTM) that uses Haswells RTM and a software TM (STM) fallback is exacerbated. In this paper, we present Invyswell, a novel HyTM that exploits the benefits and manages the limitations of Haswells RTM. After describing Invyswells design, we show that it outperforms NOrec, a state-of-the-art STM, by 35%, Hybrid NOrec, NOrecs hybrid implementation, by 18%, and Haswells hardware-only lock elision by 25% across all STAMP benchmarks.

international symposium on computer architecture | 2013

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Gilles Pokam; Klaus Danne; Cristiano Pereira; Rolf Kassa; Tim Kranich; Shiliang Hu; Justin E. Gottschlich; Nima Honarmand; Nathan Dautenhahn; Samuel T. King; Josep Torrellas

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system support. Such an implementation is needed to assess RnR practicality. This paper presents QuickRec, the first multicore Intel Architecture (IA) prototype of RnR for multithreaded programs. QuickRec is based on QuickIA, an Intel emulation platform for rapid prototyping of new IA extensions. QuickRec is composed of a Xeon server platform with FPGA-emulated second-generation Pentium cores, and Capo3, a full software stack for managing the recording hardware from within a modified Linux kernel. This papers focus is understanding and evaluating the implementation issues of RnR on a real platform. Our effort leads to some lessons learned, as well as to some pointers for future research. We demonstrate that RnR can be implemented efficiently on a real multicore IA system. In particular, we show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead. However, the software stack incurs an average recording overhead of nearly 13%, which must be reduced to enable always-on use of RnR.

international symposium on microarchitecture | 2011

CoreRacer: a practical memory race recorder for multicore x86 TSO processors

Gilles Pokam; Cristiano Pereira; Shiliang Hu; Ali-Reza Adl-Tabatabai; Justin E. Gottschlich; Jungwoo Ha; Youfeng Wu

Shared memory multiprocessors are difficult to program because of the non-deterministic ways in which the memory operations from different threads interleave. To address this issue, many hardware-based memory race recorders have been proposed that efficiently log an ordering of the shared memory interleavings between threads for deterministic replay. These approaches are challenging to integrate into current processors because they change the cache subsystem or the coherence protocol, and they mostly support a sequentially consistent memory model. In this paper, we describe CoreRacer, a chunk-based memory race recorder architecture for multicore x86 TSO processors. CoreRacer does not modify the cache subsystem and yet it still integrates into the x86 TSO memory model. We show that by leveraging a specific x86 feature, the invariant timestamp, CoreRacer maintains ordering among chunks without piggybacking on cache coherence messages. We provide a detailed implementation and evaluation of CoreRacer on a cycle-accurate x86 simulator. We show that its integration cost into x86 is minimal and its overhead has negligible effect on performance.

international conference on parallel architectures and compilation techniques | 2012

Visualizing transactional memory

Justin E. Gottschlich; Maurice Herlihy; Gilles Pokam; Jeremy G. Siek

This paper presents TMProf, a transactional memory (TM) profiler, based on three visualization principles. These principles are (i) the precise graphical representation of transaction interactions including cross-correlated information and source code, (ii) visualized soft real-time playback of concurrently executing transactions, and (iii) dynamic visualizations of multiple executions. We describe how these principles break new ground and create new challenges for TM profilers. We discuss our experience using TMProf with InvalSTM, a state-of-the-art software TM, and show how TMProfs feedback led to the design of two new contention managers (CMs). We demonstrate the performance benefits of these CMs, which generally led to improved performance as the amount of work and threads increase per benchmark. Our experimental results show that iBalanced, one of our newly designed CMs, can increase transaction throughput by nearly 10× over iFair, InvalSTMs previously best performing CM.

international conference on parallel architectures and compilation techniques | 2013

Concurrent predicates: a debugging technique for every parallel programmer

Justin E. Gottschlich; Gilles Pokam; Cristiano Pereira; Youfeng Wu

To reduce the complexity of debugging multithreaded programs, researchers have developed many techniques that automatically detect bugs that arise from shared memory errors. These techniques can identify a wide range of bugs, but it can be challenging for a programmer to reproduce a specific bug that he or she is interested in using such techniques. This is because these techniques were not intended for individual bug reproduction but rather an exploratory search for possible bugs. To address this concern we present concurrent predicates (CPs) and concurrent predicate expressions (CPEs), which allow programmers to single out a specific bug by specifying the schedule and program state that must be satisfied for the bug to be reproduced. We present the recipes, that is, the mechanical processes, we use to reproduce data races, atomicity violations, and deadlocks with CP and CPE. We then show how these recipes apply to the diagnosis and reproduction of bugs from 13 handcrafted bugs, five real-world application bugs from RADBench, and three previously unresolved bugs from TBoost.STM, which now includes the fixes we generated using CP and CPE.

languages and compilers for parallel computing | 2011

Optimizing the Concurrent Execution of Locks and Transactions

Justin E. Gottschlich; Jaewoong Chung

Transactional memory (TM) is a promising alternative to mutual exclusion. In spite of this, it may be unrealistic for TM programs to be devoid of locks due to their abundant use in legacy software systems. Consequently, for TMs to be practical they may need to manage the interaction of transactions and locks when they access the same shared-memory. This paper presents two algorithms, one coarse-grained and one fine-grained, that improve the state-of-the-art performance for TMs that support the concurrent execution of locks and transactions. We also discuss the programming language constructs that are necessary to implement such algorithms and present analyses that compare and contrast our approach with prior work. Our analyses demonstrate that, (i) in general, our proposed coarse- and fine-grained algorithms improve program concurrency but (ii) an algorithm’s concurrent throughput potential does not always lead to realized performance gains.

international workshop on openmp | 2014

Towards Transactional Memory for OpenMP

Michael Wong; Eduard Ayguadé; Justin E. Gottschlich; Victor Luchangco; Bronis R. de Supinski; Barna L. Bihari

The OpenMP specification lacks a composable shared memory concurrency mechanism: the current OpenMP concurrency mechanisms, such as OMP critical, locks, or atomics, do not support composition. In this paper, we motivate the need for transactional memory (TM) in OpenMP. The chief reason is to support composition of realistic programs, but we also consider whether TM is easier to program than locks, the use case for TM, and whether a software-only TM can outperform traditional locking through a survey of recent publications. This paper advances upon previous proposals of OpenMP TM by introducing a new construct specifically to handle irrevocable actions, which is also composable. It also proposes a pure atomic transaction construct as well as the concept of transaction safety. Further, we examine how our proposed construct integrates with current OpenMP constructs.

international conference on parallel architectures and compilation techniques | 2015

TSXProf: Profiling Hardware Transactions

Yujie Liu; Justin E. Gottschlich; Gilles Pokam; Michael F. Spear

The availability of commercial hardware transactionalmemory (TM) systems has not yet been met with a rise in the numberof large-scale programs that use memory transactions explicitly. Asignificant impediment to the use of TM is the lack of tool support, specifically profilers that can identify and explain performance anomalies. In this paper, we introduce an end-to-end system that enables lowoverheadperformance profiling of large-scale transactional programs. We present algorithms and an implementation for Intels Haswellprocessors. With our system, it is possible to record a transactionalprograms execution with minimal overhead, and then replay it withina custom profiling tool to identify causes of contention and aborts, down to the granularity of individual memory accesses. Evaluationshows that our algorithms have low overhead, and our tools enableprogrammers to effectively explain performance anomalies.

arXiv: Artificial Intelligence | 2018

The three pillars of machine programming

Justin E. Gottschlich; Armando Solar-Lezama; Nesime Tatbul; Michael Carbin; Martin C. Rinard; Regina Barzilay; Saman P. Amarasinghe; Joshua B. Tenenbaum; Tim Mattson

In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and (iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.

Archive | 2012