Thomas A. Joseph | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas A. Joseph is active.

Explore More

Publication

Featured researches published by Thomas A. Joseph.

ACM Transactions on Computer Systems | 1987

Reliable communication in the presence of failures

Kenneth P. Birman; Thomas A. Joseph

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.

symposium on operating systems principles | 1987

Exploiting virtual synchrony in distributed systems

Kenneth P. Birman; Thomas A. Joseph

We describe applications of a virtually synchronous environment for distributed programming, which underlies a collection of distributed programming tools in the ISIS2 system. A virtually synchronous environment allows processes to be structured into process groups, and makes events like broadcasts to the group as an entity, group membership changes, and even migration of an activity from one place to another appear to occur instantaneously — in other words, synchronously. A major advantage to this approach is that many aspects of a distributed application can be treated independently without compromising correctness. Moreover, user code that is designed as if the system were synchronous can often be executed concurrently. We argue that this approach to building distributed and fault-tolerant software is more straightforward, more flexible, and more likely to yield correct solutions than alternative approaches.

IEEE Transactions on Software Engineering | 1985

Implementing Fault-Tolerant Distributed Objects

Kenneth P. Birman; Thomas A. Joseph; Thomas Raeuchle; A. El Abbadi

This paper describes a technique for implementing k-resilient objects–distributed objects that remain available, and whose operations are guaranteed to progress to completion, despite up to k site failures. The implementation is derived from the object specification automatically, and does not require any information beyond what would be required for a nonresilient nondistributed implementation. It is therefore unnecessary for an applications programmer to have knowledge of the complex protocols nonnally employed to implement fault-tolerant objects. Our technique is used in ISIS, a system being developed at Cornell to support resilient objects.

ACM Transactions on Computer Systems | 1986

Low cost management of replicated data in fault-tolerant distributed systems

Thomas A. Joseph; Kenneth P. Birman

Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. In this paper, we describe a technique that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. We also discuss how this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism.

Distributed systems | 1990

Exploiting replication in distributed systems

Kenneth P. Birman; Thomas A. Joseph

Techniques are examined for replicating data and execution in directly distributed systems: systems in which multiple processes interact directly with one another while continuously respecting constraints on their joint behavior. Directly distributed systems are often required to solve difficult problems, ranging from management of replicated data to dynamic reconfiguration in response to failures. It is shown that these problems reduce to more primitive, order-based consistency problems, which can be solved using primitives such as the reliable broadcast protocols. Moreover, given a system that implements reliable broadcast primitives, a flexible set of high-level tools can be provided for building a wide variety of directly distributed application programs.

Distributed systems | 1990

Reliable Broadcast Protocols

Thomas A. Joseph; Kenneth P. Birman

Abstract : The distinguishing feature of a distributed program is not just that its various parts are distributed over a number of processors but that these parts communicate with one another. The hardware in a distributed system allows a processor to send messages to other processors; the operating system usually extends this facility to allow a process on one machine to send messages to a process on another. The operating system may also provide facilities to set up virtual circuits between processes and may include protocols that ensure a certain degree of reliability in the communication. From the point of view of a programming language, however, these facilities are still rather low-level, and this has led to a search for appropriate high-level abstractions for inter- process communication. Some researchers suggest that distribution be completely hidden from the programmer. They argue for an abstraction that looks like a global shared memory. This abstraction has the advantage that it is simple to program with; writing a distributed program is no different from writing a non- distributed one.

Proceedings of the Asilomar Workshop on Fault-Tolerant Distributed Computing | 1990

Communication Support for Reliable Distributed Computing

Kenneth P. Birman; Thomas A. Joseph

We describe a collection of communication primitives integrated with a mechanism for handling process failure and recovery. These primitives facilitate the implementation of fault-tolerant process groups, which can be used to provide distributed services in an environment subject to non-malicious crash failures.

Science of Computer Programming | 1986

State machines and assertions: An integrated approach to modeling and verification of distributed systems

Thomas A. Joseph; Thomas Räuchle; Sam Toueg

This paper describes a methodology for modeling and verifying protocols for asynchronous message passing systems. It combines the techniques of finite state analysis and axiomatic verification. It overcomes the problem of state explosion by using variables and logical assertions where the finite state approach would require a large number of states. By explicitly including states where interactions between processes occur, the complexity of assertional proofs is significantly reduced. Properties like freedom from deadlock, freedom from unspecified message receptions, boundedness of channel size, and partial correctness can be proved. Properties of channels like losing or garbling messages can be modeled, as can premature and non-premature timeouts. The technique is illustrated by proving a sliding window flow control protocol and an alternating bit protocol that is correct only if timeouts are non-premature.

Fehlertolerierende Rechensysteme, 2. GI/NTG/GMR-Fachtagung | 1984

Extending resilient objets efficiently

Kenneth P. Birman; Thomas A. Joseph; Thomas Räuchle

Resilient objects are instances of distributed abstract data types that are tolerant to failures. Due to the distributed nature of resilient objects and the replication of data, potential for a high degree of concurrency exists within them. This paper introduces a new concurrency control algorithm, which achieves higher concurrency than conventional methods like two-phase locking. Objects are specified in a high level language, and the algorithm uses the specification to take advantage of the structure of resilient objects and to exploit semantic information about operations. This information is given in a high level specification language.

Archive | 1990