Ajei Sarat Gopal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ajei Sarat Gopal is active.

Explore More

Publication

Featured researches published by Ajei Sarat Gopal.

workshop on parallel & distributed debugging | 1991

Restoring consistent global states of distributed computations

Arthur P. Goldberg; Ajei Sarat Gopal; Andy Lowry; Rob Strom

We present a mechanism for restoring any consistent global state of a distributed computation. This capability can form the baais of support for rollback and replay of computations, an activity we view aa essential in a comprehensive environment for debugging distributed programs. Our mechanism records occasional state checkpoints and logs all messages communicated between processes. Our mechanism offers flexibility in the following ways: any consistent global state of the computation can be restored; execution can be replayed either exactly as it occurred initially or with user-controlled variations; there is no need to know a prioti what states might be of interest. In addition, if checkpoints and logs are written to stable storage, our mechanism can be used to restore states of computations that cause the system to crash.

principles of distributed computing | 1993

Unifying self-stabilization and fault-tolerance

Ajei Sarat Gopal; Kenneth J. Perry

In this paper we combine two previously disparate aspects of reliable distributed computing – selfstabllization, i.e., tolerance of systemic failures, and fault-tolerance, i.e., tolerance of process failures. We define what it means for a protocol to solve a problem while tolerating both types of failures and demonstrate a “compiler” that transforms a process failuretolerant protocol for a synchronous system into a process and systemic failure-tolerant protocol. For asynchronous systems, we present a protocol that solves a crucial problem (Consensus) while tolerating both process and systemic failures.

international workshop on distributed algorithms | 1989

Reliable Broadcast in Synchronous and Asynchronous Environments (Preliminary Version)

Ajei Sarat Gopal; Sam Toueg

This paper studies the problem of reliable broadcast of a sequence of values in a system subject to processor failures. We consider three failure models — crash, in which a processor may stop executing at any time, send omission, in which processors may intermittently fail to send messages and general omission, in which processors may intermittently fail to send and receive messages — in both synchronous (the “round model”) and asynchronous systems. In contrast to the Byzantine Generals formulation of reliable broadcast, the problem we consider can be solved for asynchronous systems. In synchronous systems, we first present an algorithm tolerant of crash failures, and use translation techniques to derive algorithms tolerant of send omission failures and general omission failures. For asynchronous systems, we present simple algorithms tolerant of all three failure models.

international conference on distributed computing systems | 1997

Extensible resource management for cluster computing

Nayeem Islam; Andreas L. Prodromidis; Mark S. Squillante; Liana Liyow Fong; Ajei Sarat Gopal

Advanced general purpose parallel systems should be able to support diverse applications with different resource requirements without compromising effectiveness and efficiency. We present a resource management model for cluster computing that allows multiple scheduling policies to co-exist dynamically. In particular, we have built Octopus, an extensible and distributed hierarchical scheduler that implements new space sharing, gang scheduling and load sharing strategies. A series of experiments performed on an IBM SP2 suggest that Octopus can effectively match application requirements to available resources, and improve the performance of a variety of parallel applications within a cluster.

international conference on computational logistics | 1992

High-level language support for programming distributed systems

Joshua S. Auerbach; David F. Bacon; Arthur P. Goldberg; Germán S. Goldszmidt; Ajei Sarat Gopal; Mark T. Kennedy; Andy Lowry; James R. Russell; William Silverman; Robert E. Strom; Daniel M. Yellin; Shaula Yemini

A strategy for simplifying the programming of heterogeneous distributed systems is presented. The approach used is based on integrating a high-level distributed programming model, the process model, directly into programming languages. Distributed applications written in such languages are portable across different environments, are shorter, and are simpler to develop than similar applications developed using conventional approaches. The process model is discussed, and Hermes and Concert/C, two languages that implement this model, are described. Hermes is a secure, representation-independent language designed explicitly around the process model. Concert/C is the C language augmented with a small set of extensions to support the process model while allowing reuse of existing C code. Hermes has been prototyped: an implementation of Concert/C is in development.<<ETX>>

principles of distributed computing | 1990

Early-delivery atomic broadcast

Ajei Sarat Gopal; H. Raymond Strong; Sam Toueg; Flaviu Cristian

This paper presents early-delivery atomic broadcast protocols for systems in which processors are subject to arbitrary failures, but have access to a message authentication facility. Informally, atomic broadcast requires that all the non-faulty processors deliver the same set of messages in the, same order. Atomic broadcast has no solution in asynchronous systems. If it did, the agreement on message delivery order could be used to contradict the impossibility of fault-tolerant agreement in an asynchronous system [FLP85] [DDS87], Thus, we only consider synchronous systems. However, we avoid the traditional “round” model of synchronous computation, as many issues can be hidden in the implicit synchronization of that model. Furthermore, unlike [BJ87] and others, and in the spirit of Lamport’s technique [Lam84], we use time explicitly in our protocols.

international conference on computer communications | 1990

Broadcast in fast networks

Ajei Sarat Gopal; Inder S. Gopal; Shay Kutten

The current trend in network technology is to implement as much of the switching function as possible directly in specialized high-speed hardware. A broadcast algorithm for such a network that is tolerant of failures in the form of message loss is presented. The model used is based on the one introduced by Cidon et al. (see Proc. of Seventh Annual ACM Symp. on Principles of Distributed Comput., Toronto, Canada. P.75-89, 1988); the hardware functions assumed are simple enough to be implemented in high-speed logic. The basic idea is to forward broadcast messages directly in hardware, thereby avoiding software-introduced delays. Software intervention (possible only after the broadcasted message has already been forwarded) is required only to ensure termination in case of failures. With high probability, the broadcast will terminate in time O(n tau /sub max/), where n is the number of nodes and tau /sub max/ is an upper bound on (variable) message delivery time across a link.<<ETX>>

principles of distributed computing | 1991

Inconsistency and contamination (preliminary version)

Ajei Sarat Gopal; Sam Toueg

Inconsistency and Contamination* (Preliminary Version) Ajei Gopalt and Sam Toueg

international conference on distributed computing systems | 1994

Concert/C: supporting distributed programming with language extensions and a portable multiprotocol runtime

Joshua S. Auerbach; Ajei Sarat Gopal; James R. Russell; Mark T. Kennedy

Department of Computer Science Cornell University Ithaca, New York 14853 ajei, sam~cs .cornell .edu For many applications, the usual specifications of broadcasts and multicasts are inadequate as they do not sufficiently restrict the behavior of faulty processes. In particular, a faulty process is allowed to reach an inconsistent state (for example by failing to deliver a message that is delivered by correct processes), and to subsequently contaminate the rest of the system by successfully broadcasting messages based on its erroneous state. It is often desirable to prevent contamination and/or inconsistency. We give precise definitions of inconsistency and cent amination, develop protocols for their prevention, and derive lower bounds on time and fault-tolerance. In this preliminary version of the paper, we concentrate on atomic broadcast and atomic multicast in the presence of general omission failures. Other kinds of broadcasts and multicasts and other types of failures are treated in *The research was partially supported by NSF grants CCR-8901780 and (XXL9102231. tPartially supported by an IBM Graduate Student Fellowship. ~Partially supported by DARPA/NASA Ames Grant NAG-2-593. Permission to copy whhout fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the thle of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. TO COPY otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 0-89791-439-2/91/0007/0257

IEEE ACM Transactions on Networking | 1999