Satish Narayanasamy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Satish Narayanasamy is active.

Explore More

Publication

Featured researches published by Satish Narayanasamy.

international symposium on computer architecture | 2005

BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging

Satish Narayanasamy; Gilles Pokam; Brad Calder

Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a programs execution. BugNet focuses on being able to replay the applications execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an applications execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer.

programming language design and implementation | 2007

Automatically classifying benign and harmful data races using replay analysis

Satish Narayanasamy; Zhenghao Wang; Jordan Tigani; Andrew James Edwards; Brad Calder

Many concurrency bugs in multi-threaded programs are due to dataraces. There have been many efforts to develop static and dynamic mechanisms to automatically find the data races. Most of the prior work has focused on finding the data races and eliminating the false positives. In this paper, we instead focus on a dynamic analysis technique to automatically classify the data races into two categories - the dataraces that are potentially benign and the data races that are potentially harmful. A harmful data race is a real bug that needs to be fixed. This classification is needed to focus the triaging effort on those data races that are potentially harmful. Without prioritizing the data races we have found that there are too many data races to triage. Our second focus is to automatically provide to the developer a reproducible scenario of the data race, which allows the developer to understand the different effects of a harmful data race on a programs execution. To achieve the above, we record a multi-threaded programs execution in a replay log. The replay log is used to replay the multi-threaded program, and during replay we find the data races using a happens-before based algorithm. To automatically classify if a data race that we find is potentially benign or potentially harmful, were play the execution twice for a given data race - one for each possible order between the conflicting memory operations. If the two replays for the two orders produce the same result, then we classify the data race to be potentially benign. We discuss our experiences in using our replay based dynamic data race checker on several Microsoft applications.

programming language design and implementation | 2009

LiteRace: effective sampling for lightweight data-race detection

Daniel Marino; Madanlal Musuvathi; Satish Narayanasamy

Data races are one of the most common and subtle causes of pernicious concurrency bugs. Static techniques for preventing data races are overly conservative and do not scale well to large programs. Past research has produced several dynamic data race detectors that can be applied to large programs. They are precise in the sense that they only report actual data races. However, dynamic data race detectors incur a high performance overhead, slowing down a programs execution by an order of magnitude. In this paper we present LiteRace, a very lightweight data race detector that samples and analyzes only selected portions of a programs execution. We show that it is possible to sample a multithreaded program at a low frequency, and yet, find infrequently occurring data races. We implemented LiteRace using Microsofts Phoenix compiler. Our experiments with several Microsoft programs, Apache, and Firefox show that LiteRace is able to find more than 70% of data races by sampling less than 2% of memory accesses in a given program execution.

architectural support for programming languages and operating systems | 2011

DoublePlay: parallelizing sequential logging and replay

Kaushik Veeraraghavan; Dongyoon Lee; Benjamin Wester; Jessica Ouyang; Peter M. Chen; Jason Flinn; Satish Narayanasamy

Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order or values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15% with two worker threads and 28% with four threads.

international symposium on computer architecture | 2009

A case for an interleaving constrained shared-memory multi-processor

Jie Yu; Satish Narayanasamy

Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the root cause for a majority of concurrency bugs. We propose a shared-memory multi-processor design that avoids untested interleavings to improve the correctness of a multi-threaded program. Since untested interleavings tend to occur infrequently at runtime, the performance cost of avoiding them is not high. We propose to encode the set of tested correct interleavings in a programs binary executable using Predecessor Set (PSet) constraints. These constraints are efficiently enforced at runtime using processor support, which ensures that the runtime follows a tested interleaving. We analyze several bugs in open source applications such as MySQL, Apache, Mozilla, etc., and show that, by enforcing PSet constraints, we can avoid not only data races and atomicity violations, but also other forms of concurrency bugs.

architectural support for programming languages and operating systems | 2006

Recording shared memory dependencies using strata

Satish Narayanasamy; Cristiano Pereira; Brad Calder

Significant time is spent by companies trying to reproduce and fix bugs. BugNet and FDR are recent architecture proposals that provide architecture support for deterministic replay debugging. They focus on continuously recording information about the programs execution, which can be communicated back to the developer. Using that information, the developer can deterministically replay the programs execution to reproduce and fix the bugs.In this paper, we propose using Strata to efficiently capture the shared memory dependencies. A stratum creates a time layer across all the logs for the running threads, which separates all the memory operations executed before and after the stratum. A strata log allows us to determine all the shared memory dependencies during replay and thereby supports deterministic replay debugging for multi-threaded programs.

architectural support for programming languages and operating systems | 2010

Respec: efficient online multiprocessor replayvia speculation and external determinism

Dongyoon Lee; Benjamin Wester; Kaushik Veeraraghavan; Satish Narayanasamy; Peter M. Chen; Jason Flinn

Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a new way to support deterministic replay of shared memory multithreaded programs on commodity multiprocessor hardware. Respec targets online replay in which the recorded and replayed processes execute concurrently. Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of data-race-free execution intervals and still ensures correct replay for execution intervals that have data races. We modified the Linux kernel to implement our techniques. Our software system adds on average about 18% overhead to the execution time for recording and replaying programs with two threads and 55% overhead for programs with four threads.

conference on object-oriented programming systems, languages, and applications | 2012

Maple: a coverage-driven testing tool for multithreaded programs

Jie Yu; Satish Narayanasamy; Cristiano Pereira; Gilles Pokam

Testing multithreaded programs is a hard problem, because it is challenging to expose those rare interleavings that can trigger a concurrency bug. We propose a new thread interleaving coverage-driven testing tool called Maple that seeks to expose untested thread interleavings as much as possible. It memoizes tested interleavings and actively seeks to expose untested interleavings for a given test input to increase interleaving coverage. We discuss several solutions to realize the above goal. First, we discuss a coverage metric based on a set of interleaving idioms. Second, we discuss an online technique to predict untested interleavings that can potentially be exposed for a given test input. Finally, the predicted untested interleavings are exposed by actively controlling the thread schedule while executing for the test input. We discuss our experiences in using the tool to expose several known and unknown bugs in real-world applications such as Apache and MySQL.

international symposium on computer architecture | 2013

Catnap: energy proportional multiple network-on-chip

Reetuparna Das; Satish Narayanasamy; Sudhir Satpathy; Ronald G. Dreslinski

Multiple networks have been used in several processor implementations to scale bandwidth and ensure protocol-level deadlock freedom for different message classes. In this paper, we observe that a multiple-network design is also attractive from a power perspective and can be leveraged to achieve energy proportionality by effective power gating. Unlike a single-network design, a multiple-network design is more amenable to power gating, as its subnetworks (subnets) can be power gated without compromising the connectivity of the network. To exploit this opportunity, we propose the Catnap architecture which consists of synergistic subnet selection and power-gating policies. Catnap maximizes the number of consecutive idle cycles in a router, while avoiding performance loss due to overloading a subnet. We evaluate a 256-core processor with a concentrated mesh topology using synthetic traffic and 35 applications. We show that the average network power of a power-gating optimized multiple-network design with four subnets could be 44% lower than a bandwidth equivalent single-network design for an average performance cost of about 5%.

architectural support for programming languages and operating systems | 2010

Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism

Dongyoon Lee; Benjamin Wester; Kaushik Veeraraghavan; Satish Narayanasamy; Peter M. Chen; Jason Flinn

Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, replaying shared memory multiprocessor systems at low overhead on commodity hardware is still an open problem. This paper presents Respec, a new way to support deterministic replay of shared memory multithreaded programs on commodity multiprocessor hardware. Respec targets online replay in which the recorded and replayed processes execute concurrently. Respec uses two strategies to reduce overhead while still ensuring correctness: speculative logging and externally deterministic replay. Speculative logging optimistically logs less information about shared memory dependencies than is needed to guarantee deterministic replay, then recovers and retries if the replayed process diverges from the recorded process. Externally deterministic replay relaxes the degree to which the two executions must match by requiring only their system output and final program states match. We show that the combination of these two techniques results in low recording and replay overhead for the common case of datarace-free execution intervals and still ensures correct replay for execution intervals that have data races. We modified the Linux kernel to implement our techniques. Our software system adds on average about 18% overhead to the execution time for recording and replaying programs with two threads and 55% overhead for programs with four threads.

Explore More