Robert H. B. Netzer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert H. B. Netzer is active.

Explore More

Publication

Featured researches published by Robert H. B. Netzer.

ACM Letters on Programming Languages and Systems | 1992

What are race conditions?: Some issues and formalizations

Robert H. B. Netzer; Barton P. Miller

In shared-memory parallel programs that use explicit synchronization, race conditions result when accesses to shared memory are not properly synchronized. Race conditions are often considered to be manifestations of bugs, since their presence can cause the program to behave unexpectedly. Unfortunately, there has been little agreement in the literature as to precisely what constitutes a race condition. Two different notions have been implicitly considered: one pertaining to programs intended to be deterministic (which we call general races) and the other to nondeterministic programs containing critical sections (which we call data races). However, the differences between general races and data races have not yet been recognized. This paper examines these differences by characterizing races using a formal model and exploring their properties. We show that two variations of each type of race exist: feasible general races and data races capture the intuitive notions desired for debugging and apparent races capture less accurate notions implicitly assumed by most dynamic race detection methods. We also show that locating feasible races is an NP-hard problem, implying that only the apparent races, which are approximations to feasible races, can be detected in practice. The complexity of dynamically locating apparent races depends on the type of synchronization used by the program. Apparent races can be exhaustively located efficiently only for weak types of synchronization that are incapable of implementing mutual exclusion. This result has important implications since we argue that debugging general races requires exhaustive race detection and is inherently harder than debugging data races (which requires only partial race detection). Programs containing data races can therefore be efficiently debugged by locating certain easily identifiable races. In contrast, programs containing general races require more complex debugging techniques.

IEEE Transactions on Parallel and Distributed Systems | 1995

Necessary and sufficient conditions for consistent global snapshots

Robert H. B. Netzer; Jian Xu

Consistent global snapshots are important in many distributed applications. We prove the exact conditions for an arbitrary checkpoint, or a set of checkpoints, to belong to a consistent global snapshot, a previously open problem. To describe the conditions, we introduce a generalization of Lamports (1978) happened-before relation called a zigzag path. >

ACM Transactions on Programming Languages and Systems | 1991

Techniques for debugging parallel programs with flowback analysis

Jong-Deok Choi; Barton P. Miller; Robert H. B. Netzer

Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program’s execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are building a system, called PPD, that performs flowback analysis while keeping the execution time overhead low. We also extend the semantics of flowback analysis to parallel programs. This paper describes details of the graphs and algorithms needed to implement efficient flowback analysis for parallel programs. Execution time overhead is kept low by recording only a small amount of trace during a program’s execution. We use semantic analysis and a technique called incremental tracing to keep the time and space overhead low. As part of the semantic analysis, PPD uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic information produced during execution time. Parallel programs have been accommodated in two ways. First, the flowback dependences can span process boundaries; i.e., the most recent modification to a variable might be traced to a different process than the one that contains the current reference. The static and dynamic program dependence graphs of the individual processes are tied together with synchronization and data dependence information to form complete graphs that represent the entire program. Second, our algorithms will detect potential data race conditions in the access to shared variables. The programmer can be directed to the cause of the race condition. PPD is currently being implemented for the C programming language on a Sequent Symmetry shared-memory multiprocessor. Index Items − debugging, parallel program, flowback analysis, incremental tracing, semantic analysis, program dependence graph. hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh Research supported in part by National Science Foundation grants CCR-8703373 and CCR-8815928, Office of Naval Research Contract N00014-89-J-1222, and a Digital Equipment Corporation External Research Grant. TR 786 / To appear in ACM Trans. on Programming Languages and Systems

workshop on parallel & distributed debugging | 1993

Optimal tracing and replay for debugging shared-memory parallel programs

Robert H. B. Netzer

Execution replay is a crucial part of debugging. Because explicitly parallel shared-memory programs can be nondeterministic, a tool is required that traces executions so they can be replayed for debugging. We present an adaptive tracing strategy that is optimal and records the minimal number of shared-memory references required to exactly replay executions. Our algorithm makes runtime tracing decisions by detecting and tracing a certain type of race condition on-the-fly . Unlike past schemes, we make no assumptions about the execution’ s correctness (it need not be race free). Experiments show that only 0.01−2% of the shared-memory references are usually traced, a 2−4 order of magnitude reduction over past techniques which trace every access.

acm sigplan symposium on principles and practice of parallel programming | 1991

Improving the accuracy of data race detection

Robert H. B. Netzer; Barton P. Miller

For shared-memory parallel programs that use explicit synchronization, data race detection is an important part of debugging. A data race exists when concurrently executing sections of code access common shared variables. In programs intended to be data race free, they are sources of nondeterminism usually considered bugs. Previous methods for detecting data races in executions of parallel programs can determine when races occurred, but can report many data races that are artifacts of others and not direct manifestations of program bugs. Artifacts exist because some races can cause others and can also make false races appear real. Such artifacts can overwhelm the programmer with information irrelevant for debugging. This paper presents results showing how to identify nonartifact data races by validation and ordering. Data race validation attempts to determine which races involve events that either did execute concurrently or could have (called feasible data races). We show how each detected race can either be guaranteed feasible, or when insufficient information is available, sets of races can be identified within which at least one is guaranteed feasible. Data race ordering attempts to identify races that did not occur only as a result of others. Data races can be partitioned so that it is known whether a race in one partition may have affected a race in another. The first partitions are guaranteed to contain at least one feasible data race that is not an artifact of any kind. By combining validation and ordering, the programmer can be directed to those data races that should be investigated first for debugging. hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh Research supported in part by National Science Foundation grant CCR-8815928, Office of Naval Research grant N00014-89-J-1222, and a Digital Equipment Corporation External Research Grant. To appear in Proc. of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Williamsburg, VA, April 1991.

international symposium on computer architecture | 1991

Detecting data races on weak memory systems

Sarita V. Adve; Mark D. Hill; Barton P. Miller; Robert H. B. Netzer

For shared-memory systems, the most commonly assumed programmer’s model of memory is sequential consistency. The weaker models of weak ordering, release consistency with sequentially consistent synchronization operations, data-race-free-O, and data-race-free-1 provide higher performance by guaranteeing sequential consistency to only a restricted class of programs - mainly programs that do not exhibit data races. To allow programmers to use the intuition and algorithms already developed for sequentially consistent systems, it is impontant to determine when a program written for a weak system exhibits no data races. In this paper, we investigate the extension of dynamic data race detection techniques developed for sequentially consistent systems to weak systems. A potential problem is that in the presence of a data race, weak systems fail to guarantee sequential consistency and therefore dynamic techniques may not give meaningful results. However, we reason that in practice a weak system will preserve sequential consistency at least until the “first” data races since it cannot predict if a data race will occur. We formalize this condition and show that it allows data races to be dynamically detected. Further, since this condition is already obeyed by all proposed implementations of weak systems, the full performance of weak systems can be exploited.

Distributed Computing | 2000

Communication-based prevention of useless checkpoints in distributed computations

Jean-michel Hélary; Achour Mostefaoui; Robert H. B. Netzer; Michel Raynal

Summary. A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design communication-induced checkpointing protocols that direct processes to take additional local (forced) checkpoints to ensure no local checkpoint is useless.The paper first proves two properties related to integer timestamps which are associated with each local checkpoint. The first property is a necessary and sufficient condition that these timestamps must satisfy for no checkpoint to be useless. The second property provides an easy timestamp-based determination of consistent global checkpoints. Then, a general communication-induced checkpointing protocol is proposed. This protocol, derived from the two previous properties, actually defines a family of timestamp-based communication-induced checkpointing protocols. It is shown that several existing checkpointing protocols for the same problem are particular instances of the general protocol. The design of this general protocol is motivated by the use of communication-induced checkpointing protocols in “consistent global checkpoint”-based distributed applications such as the detection of stable or unstable properties and the determination of distributed breakpoints.

IEEE Transactions on Parallel and Distributed Systems | 1997

Finding consistent global checkpoints in a distributed computation

D. Manivannan; Robert H. B. Netzer; Mukesh Singhal

Consistent global checkpoints have many uses in distributed computations. A central question in applications that use consistent global checkpoints is to determine whether a consistent global checkpoint that includes a given set of local checkpoints can exist. Netzer and Xu (1995) presented the necessary and sufficient conditions under which such a consistent global checkpoint can exist, but they did not explore what checkpoints could be constructed. In this paper, we prove exactly which local checkpoints can be used for constructing such consistent global checkpoints. We illustrate the use of our results with a simple and elegant algorithm to enumerate all such consistent global checkpoints.

measurement and modeling of computer systems | 1996

Debugging race conditions in message-passing programs

Robert H. B. Netzer; Timothy W. Brennan; Suresh K. Damodaran-Kamal

In this paper we address the problem of dynamically locating unwanted nondeterminism (race conditions) in executions of explicitly parallel message-passing programs. We formally define what it means for a race to exist and show conceptually how to dynamically locate races. We also show the import ante of accurate race detection as a starting point for debugging, reporting only races directly caused by bugs and not by other races. We argue that accurate detection using a pure on-the-fly algorithm requires space bounded by the execution’s length, an impractical requirement for long program runs. To address this problem, we present a two-pass on-the-fly algorithm that requires space independent of the execution’s length. Our algorithm is simple, efficient, and accurately locates races in runs of any length.

international parallel and distributed processing symposium | 1993

Adaptive independent checkpointing for reducing rollback propagation

Jian Xu; Robert H. B. Netzer

Independent checkpointing is a simple technique for providing fault tolerance in distributed systems. However, it can suffer from the domino effect, which causes the rollback of one process to potentially propagate to others. In this paper we present an adaptive checkpointing algorithm to practically eliminate rollback propagation for independent checkpointing. Our algorithm is based on proofs of the conditions necessary and sufficient for a checkpoint to belong to some consistent global checkpoint, previously an open question. We characterize these conditions with a generalization of Lamports happened-before relation called a zigzag path. Our algorithm tracks zigzag paths on-line and checkpoints when certain paths are detected. Experiments on an iPSC/860 hypercube show that our algorithm reduces the average rollback required to recover from any fault to less than one checkpoint interval per process, and checkpoints only 4% more often than traditional periodic checkpointing algorithms. We thus eliminate rollback propagation without the runtime overhead of coordinated checkpoints or other schemes that attempt to reduce rollback propagation.<<ETX>>

Explore More