Gilles Pokam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gilles Pokam is active.

Explore More

Publication

Featured researches published by Gilles Pokam.

international symposium on computer architecture | 2005

BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Satish Narayanasamy; Gilles Pokam; Brad Calder

Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a programs execution. BugNet focuses on being able to replay the applications execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an applications execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer.

conference on object-oriented programming systems, languages, and applications | 2012

Maple: a coverage-driven testing tool for multithreaded programs

Jie Yu; Satish Narayanasamy; Cristiano Pereira; Gilles Pokam

Testing multithreaded programs is a hard problem, because it is challenging to expose those rare interleavings that can trigger a concurrency bug. We propose a new thread interleaving coverage-driven testing tool called Maple that seeks to expose untested thread interleavings as much as possible. It memoizes tested interleavings and actively seeks to expose untested interleavings for a given test input to increase interleaving coverage. We discuss several solutions to realize the above goal. First, we discuss a coverage metric based on a set of interleaving idioms. Second, we discuss an online technique to predict untested interleavings that can potentially be exposed for a given test input. Finally, the predicted untested interleavings are exposed by actively controlling the thread schedule while executing for the test input. We discuss our experiences in using the tool to expose several known and unknown bugs in real-world applications such as Apache and MySQL.

programming language design and implementation | 2014

Race detection for event-driven mobile applications

Chun-Hung Hsiao; Jie Yu; Satish Narayanasamy; Ziyun Kong; Cristiano Pereira; Gilles Pokam; Peter M. Chen; Jason Flinn

Mobile systems commonly support an event-based model of concurrent programming. This model, used in popular platforms such as Android, naturally supports mobile devices that have a rich array of sensors and user input modalities. Unfortunately, most existing tools for detecting concurrency errors of parallel programs focus on a thread-based model of concurrency. If one applies such tools directly to an event-based program, they work poorly because they infer false dependencies between unrelated events handled sequentially by the same thread. In this paper we present a race detection tool named CAFA for event-driven mobile systems. CAFA uses the causality model that we have developed for the Android event-driven system. A novel contribution of our model is that it accounts for the causal order due to the event queues, which are not accounted for in past data race detectors. Detecting races based on low-level races between memory accesses leads to a large number of false positives. CAFA overcomes this problem by checking for races between high-level operations. We discuss our experience in using CAFA for finding and understanding a number of known and unknown harmful races in open-source Android applications.

international conference on parallel architectures and compilation techniques | 2014

Invyswell: a hybrid transactional memory for haswell's restricted transactional memory

Irina Calciu; Justin E. Gottschlich; Tatiana Shpeisman; Gilles Pokam; Maurice Herlihy

The Intel Haswell processor includes restricted transactional memory (RTM), which is the first commodity-based hardware transactional memory (HTM) to become publicly available. However, like other real HTMs, such as IBMs Blue Gene/Q, Haswells RTM is best-effort, meaning it provides no transactional forward progress guarantees. Because of this, a software fallback system must be used in conjunction with Haswells RTM to ensure transactional programs execute to completion. To complicate matters, Haswell does not provide escape actions. Without escape actions, non-transactional instructions cannot be executed within the context of a hardware transaction, thereby restricting the ways in which a software fallback can interact with the HTM. As such, the challenge of creating a scalable hybrid TM (HyTM) that uses Haswells RTM and a software TM (STM) fallback is exacerbated. In this paper, we present Invyswell, a novel HyTM that exploits the benefits and manages the limitations of Haswells RTM. After describing Invyswells design, we show that it outperforms NOrec, a state-of-the-art STM, by 35%, Hybrid NOrec, NOrecs hybrid implementation, by 18%, and Haswells hardware-only lock elision by 25% across all STAMP benchmarks.

international symposium on microarchitecture | 2009

Architecting a chunk-based memory race recorder in modern CMPs

Gilles Pokam; Cristiano Pereira; Klaus Danne; Rolf Kassa; Ali-Reza Adl-Tabatabai

Prior work on HW support for memory race recording piggybacks time stamps on coherence messages and logs the outcome of memory races using point-to-point or chunk-based approaches. These memory race recorder (MRR) techniques are effective, but they require modifications to the cache coherence protocol that can hurt performance. In addition, prior work has mostly focused on directory coherence and considered only CMP systems with single-level cache hierarchies. Most modern CMP systems shipped today, however, implement snoop coherence and feature multilevel cache hierarchies. To be practical, a MRR must target CMPs with multilevel caches, mitigate the coherence overhead due to piggybacking, and emphasize on replay speed to broaden applicability of deterministic replay. This paper contributes three new solutions for making chunk-based MRR practical for modern CMPs. We show that MRR interactions with a cache hierarchy can degrade performance and present a novel mechanism that mitigates this degradation. We propose new mechanisms for snoop-based caches that eliminate coherence traffic overhead due to piggybacking. We finally propose new techniques for improving replay speed and introduce a novel framework for evaluating the replay speed potential of MRR designs.

international symposium on software testing and analysis | 2013

Selective mutation testing for concurrent code

Milos Gligoric; Lingming Zhang; Cristiano Pereira; Gilles Pokam

Concurrent code is becoming increasingly important with the advent of multi-cores, but testing concurrent code is challenging. Researchers are developing new testing techniques and test suites for concurrent code, but evaluating these techniques and test suites often uses a small number of real or manually seeded bugs. Mutation testing allows creating a large number of buggy programs to evaluate test suites. However, performing mutation testing is expensive even for sequential code, and the cost is higher for concurrent code where each test has to be executed for many (possibly all) thread schedules. The most widely used technique to speed up mutation testing is selective mutation, which reduces the number of mutants by applying only a subset of mutation operators such that test suites that kill all mutants generated by this subset also kill (almost) all mutants generated by all mutation operators. To date, selective mutation has been used only for sequential mutation operators. This paper explores selective mutation for concurrent mutation operators. Our results identify several sets of concurrent mutation operators that can effectively reduce the number of mutants, show that operator-based selection is slightly better than random mutant selection, and show that sequential and concurrent mutation operators are independent, demonstrating the importance of studying concurrent mutation operators.

international symposium on microarchitecture | 2006

BugNet: Recording Application-Level Execution for Deterministic Replay Debugging

Satish Narayanasamy; Gilles Pokam; Brad Calder

With softwares increasing complexity, providing efficient hardware support for software debugging is critical. Hardware support is necessary to observe and capture, with little or no overhead, the exact execution of a program. Providing this ability to developers will allow them to deterministically replay and debug an application to pin-point the root cause of a bug

international symposium on computer architecture | 2013

QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs

Gilles Pokam; Klaus Danne; Cristiano Pereira; Rolf Kassa; Tim Kranich; Shiliang Hu; Justin E. Gottschlich; Nima Honarmand; Nathan Dautenhahn; Samuel T. King; Josep Torrellas

There has been significant interest in hardware-assisted deterministic Record and Replay (RnR) systems for multithreaded programs on multiprocessors. However, no proposal has implemented this technique in a hardware prototype with full operating system support. Such an implementation is needed to assess RnR practicality. This paper presents QuickRec, the first multicore Intel Architecture (IA) prototype of RnR for multithreaded programs. QuickRec is based on QuickIA, an Intel emulation platform for rapid prototyping of new IA extensions. QuickRec is composed of a Xeon server platform with FPGA-emulated second-generation Pentium cores, and Capo3, a full software stack for managing the recording hardware from within a modified Linux kernel. This papers focus is understanding and evaluating the implementation issues of RnR on a real platform. Our effort leads to some lessons learned, as well as to some pointers for future research. We demonstrate that RnR can be implemented efficiently on a real multicore IA system. In particular, we show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead. However, the software stack incurs an average recording overhead of nearly 13%, which must be reduced to enable always-on use of RnR.

architectural support for programming languages and operating systems | 2013

Cyrus: unintrusive application-level record-replay for replay parallelism

Nima Honarmand; Nathan Dautenhahn; Josep Torrellas; Samuel T. King; Gilles Pokam; Cristiano Pereira

Architectures for deterministic record-replay (R&R) of multithreaded code are attractive for program debugging, intrusion analysis, and fault-tolerance uses. However, very few of the proposed designs have focused on maximizing replay speed -- a key enabling property of these systems. The few efforts that focus on replay speed require intrusive hardware or software modifications, or target whole-system R&R rather than the more useful application-level R&R. This paper presents the first hardware-based scheme for unintrusive, application-level R&R that explicitly targets high replay speed. Our scheme, called Cyrus, requires no modification to commodity snoopy cache coherence. It introduces the concept of an on-the-fly software Backend Pass during recording which, as the log is being generated, transforms it for high replay parallelism. This pass also fixes-up the log, and can flexibly trade-off replay parallelism for log size. We analyze the performance of Cyrus using full system (OS plus hardware) simulation. Our results show that Cyrus has negligible recording overhead. In addition, for 8-processor runs of SPLASH-2, Cyrus attains an average replay parallelism of 5, and a replay speed that is, on average, only about 50% lower than the recording speed.

international symposium on microarchitecture | 2011

CoreRacer: a practical memory race recorder for multicore x86 TSO processors

Gilles Pokam; Cristiano Pereira; Shiliang Hu; Ali-Reza Adl-Tabatabai; Justin E. Gottschlich; Jungwoo Ha; Youfeng Wu

Shared memory multiprocessors are difficult to program because of the non-deterministic ways in which the memory operations from different threads interleave. To address this issue, many hardware-based memory race recorders have been proposed that efficiently log an ordering of the shared memory interleavings between threads for deterministic replay. These approaches are challenging to integrate into current processors because they change the cache subsystem or the coherence protocol, and they mostly support a sequentially consistent memory model. In this paper, we describe CoreRacer, a chunk-based memory race recorder architecture for multicore x86 TSO processors. CoreRacer does not modify the cache subsystem and yet it still integrates into the x86 TSO memory model. We show that by leveraging a specific x86 feature, the invariant timestamp, CoreRacer maintains ordering among chunks without piggybacking on cache coherence messages. We provide a detailed implementation and evaluation of CoreRacer on a cycle-accurate x86 simulator. We show that its integration cost into x86 is minimal and its overhead has negligible effect on performance.

Explore More