Kenneth P. Birman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenneth P. Birman is active.

Explore More

Publication

Featured researches published by Kenneth P. Birman.

ACM Transactions on Computer Systems | 1991

Lightweight causal and atomic group multicast

Kenneth P. Birman; André Schiper; Pat Stephenson

Reference LSR-ARTICLE-1991-001View record in Web of Science Record created on 2005-05-20, modified on 2016-08-08

ACM Transactions on Computer Systems | 1987

Reliable communication in the presence of failures

Kenneth P. Birman; Thomas A. Joseph

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.

ACM Transactions on Computer Systems | 2003

Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

Robbert van Renesse; Kenneth P. Birman; Werner Vogels

Scalable management and self-organizational capabilities areemerging as central requirements for a generation of large-scale,highly dynamic, distributed applications. We have developed anentirely new distributed information management system calledAstrolabe. Astrolabe collects large-scale system state, permittingrapid updates and providing on-the-fly attribute aggregation. Thislatter capability permits an application to locate a resource, andalso offers a scalable way to track system state as it evolves overtime. The combination of features makes it possible to solve a widevariety of management and self-configuration problems. This paperdescribes the design of the system with a focus upon itsscalability. After describing the Astrolabe service, we presentexamples of the use of Astrolabe for locating resources,publish-subscribe, and distributed synchronization in largesystems. Astrolabe is implemented using a peer-to-peer protocol,and uses a restricted form of mobile code based on the SQL querylanguage for aggregation. This protocol gives rise to a novelconsistency model. Astrolabe addresses several securityconsiderations using a built-in PKI. The scalability of the systemis evaluated using both simulation and experiments; these confirmthat Astrolabe could scale to thousands and perhaps millions ofnodes, with information propagation delays in the tens of seconds.

Communications of The ACM | 1993

The process group approach to reliable distributed computing

Kenneth P. Birman

Abstract : The difficulty of developing reliable distributed software is an impediment to applying distributed computing technology in many settings. Experience with the ISIS system suggests that a structured approach based on virtually synchronous process groups yields systems that are substantially easier to develop, exploit sophisticated forms of cooperative computation and achieve high reliability. This paper reviews six years of on ISIS, describing the model, its implementation challenges, and the types of applications to which ISIS has been applied.

Communications of The ACM | 1996

Horus: a flexible group communication system

Robbert van Renesse; Kenneth P. Birman; Silvano Maffeis

The Horus system offers flexible group communication support for distributed applications. It is extensively layered and highly reconfigurable, allowing applications to only pay for services they use, and for groups with different communication needs to coexist in a single system. The approach encourages experimentation with new communication properties and incremental extension of the system, and enables us to support a variety of application-oriented interfaces.

ACM Transactions on Computer Systems | 1999

Bimodal multicast

Kenneth P. Birman; Mark Hayden; Oznur Ozkasap; Zhen Xiao; Mihai Budiu; Yaron Minsky

There are many methods for making a multicast protocol “reliable.” At one end of the spectrum, a reliable multicast protocol might offer tomicity guarantees, such as all-or-nothing delivery, delivery ordering, and perhaps additional properties such as virtually synchronous addressing. At the other are protocols that use local repair to overcome transient packet loss in the network, offering “best effort” reliability. Yet none of this prior work has treated stability of multicast delivery as a basic reliability property, such as might be needed in an internet radio, television, or conferencing application. This article looks at reliability with a new goal: development of a multicast protocol which is reliable in a sense that can be rigorously quantified and includes throughput stability guarantees. We characterize this new protocol as a “bimodal multicast” in reference to its reliability model, which corresponds to a family of bimodal probability distributions. Here, we introduce the protocol, provide a theoretical analysis of its behavior, review experimental results, and discuss some candidate applications. These confirm that bimodal multicast is reliable, scalable, and that the protocol provides remarkably stable delivery throughput.

symposium on operating systems principles | 1987

Exploiting virtual synchrony in distributed systems

Kenneth P. Birman; Thomas A. Joseph

We describe applications of a virtually synchronous environment for distributed programming, which underlies a collection of distributed programming tools in the ISIS2 system. A virtually synchronous environment allows processes to be structured into process groups, and makes events like broadcasts to the group as an entity, group membership changes, and even migration of an activity from one place to another appear to occur instantaneously — in other words, synchronously. A major advantage to this approach is that many aspects of a distributed application can be treated independently without compromising correctness. Moreover, user code that is designed as if the system were synchronous can often be executed concurrently. We argue that this approach to building distributed and fault-tolerant software is more straightforward, more flexible, and more likely to yield correct solutions than alternative approaches.

Lecture Notes in Computer Science | 1997

Building Secure and Reliable Network Applications

Kenneth P. Birman

ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of “properties” – a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcoming limited levels of message loss, disorder and even duplication would be easy to develop (Figure 4-4). For each process to which it issues requests, a client process maintains a message sequence number. Each message transmitted carries a unique sequence number, and (in most RPC protocols) a time stamp from a global clock – one that returns roughly the same value throughout the network, up to clock synchronization limits. This information can be used by the server to detect very old or duplicate copies of messages, which are discarded, and to identify received messages using what are called acknowledgment protocol-messages. The basic idea, then, is that the client process transmits its request and, until acknowledgments have been received, continues to retransmit the same messages periodically. The server collects messages and, when the full request has been received, performs the appropriate procedure invocation. When it transmits its reply, the same sort of reliable communication protocol is used. Often, the acknowledgement is delayed briefly in the hope that the reply will be sent soon, and can be used in place of a separate acknowledgement. Kenneth P. Birman Building Secure and Reliable Network Applications 90 90 A number of important optimizations have been proposed by developers of RPC-oriented distributed computing environments. For example, if one request will require the transmission of multiple messages, because the request is large, it is common to inhibit the sending of acknowledgments during the transmission of the burst of messages. In this case, a negative acknowledgement is sent if the receiver detects a missing packet; a single ack confirms reception of the entire burst when all packets have been successfully received (Figure 4-5). Similarly, it is common to delay the transmission of acknowledgment packets in the hope that the reply message itself can be transmitted instead of an acknowledgment: obviously, the receipt of a reply implies that the corresponding request was delivered and executed. Process and machine failures, unfortunately, render this very simple approach inadequate. The essential problem is that because communication is over unreliable networking technologies, when a process is unable to communicate with some other process, there is no way to determine whether the problem is a network failure, a machine failure, or both (if a process fails but the machine remains operational the operating system will often provide some status information, permitting this one case to be accurately sensed). When an RPC protocol fails by timing out, but the client or server (or both) remains operational, it is impossible to know what has occurred. Perhaps the request was never received, perhaps it was received and executed but the reply was lost, and perhaps the client or server crashed while the protocol was executing. This creates a substantial challenge for the application programmer who wishes to build an application that will operate reliably despite failures of some of the services upon which it depends. A related problem concerns the issue of what are called exactly once semantics. When a programmer employs LPC, the invoked procedure will be executed exactly once for each invocation. In the case of RPC, however, it is not evident that this problem can be solved. Consider a process c that issues an RPC to a service offered by process s. Depending upon the assumptions we make, it may be very difficult even to guarantee that s performs this request at most once. (Obviously, the possibility of a failure precludes a solution in which s would perform the operation exactly once). To understand the origin of the problem, consider the possible behaviors of an arbitrary communication network. Messages can be lost in transmission, and as we have seen this can prevent process c from accurately detecting failures of process s. But, the network might also misbehave by client server ack reply ack request Figure 4-4: Simple RPC interaction, showing packets that contain data (dark) and acknowledgements (light)

international workshop on peer to peer systems | 2003

Kelips: Building an efficient and stable P2P DHT through increased memory and background overhead

Indranil Gupta; Kenneth P. Birman; Prakash Linga; Alan J. Demers; Robbert van Renesse

A peer-to-peer (p2p) distributed hash table (DHT) system allows hosts to join and fail silently (or leave), as well as to insert and retrieve files (objects). This paper explores a new point in design space in which increased memory usage and constant background communication overheads are tolerated to reduce file lookup times and increase stability to failures and churn. Our system, called Kelips, uses peer-to-peer gossip to partially replicate file index information. In Kelips, (a) under normal conditions, file lookups are resolved within 1 RPC, independent of system size, and (b) membership changes (e.g., even when a large number of nodes fail) are detected and disseminated to the system quickly. Per-node memory requirements are small in medium-sized systems. When there are failures, lookup success is ensured through query rerouting. Kelips achieves load balancing comparable to existing systems. Locality is supported by using topologically aware gossip mechanisms. Initial results of an ongoing experimental study are also discussed.

symposium on operating systems principles | 1985

Replication and fault-tolerance in the ISIS system

Kenneth P. Birman

The ISIS system transforms abstract type specifications into fault-tolerant distributed implementations while insulating users from the mechanisms used to achieve fault-tolerance. This paper discusses techniques for obtaining a fault-tolerant implementation from a non-distributed specification and for achieving improved performance by concurrently updating replicated data. The system itself is based on a small set of communication primitives, which are interesting because they achieve high levels of concurrency while respecting higher level ordering requirements. The performance of distributed fault-tolerant services running on this initial version of ISIS is found to be nearly as good as that of non-distributed, fault-intolerant ones.

Explore More