David L. Oppenheimer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David L. Oppenheimer is active.

Explore More

Publication

Featured researches published by David L. Oppenheimer.

high performance distributed computing | 2005

Design and implementation tradeoffs for wide-area resource discovery

David L. Oppenheimer; Jeannie R. Albrecht; David A. Patterson; Amin Vahdat

This paper describes the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired resources as a topology of interconnected groups with required intragroup, intergroup, and per-node characteristics, along with the utility that the application derives from various ranges of values of those characteristics. This design gives users the flexibility to find geographically distributed resources for applications that are sensitive to both node and network characteristics, and allows the system to rank acceptable configurations based on their quality for that application. We explore a variety of architectures to deliver SWORDs functionality in a scalable and highly-available manner. A 1000-node ModelNet evaluation using a workload of measurements collected from PlanetLab shows that an architecture based on 4-node server cluster sites at network peering facilities outperforms a decentralized DHT-based resource discovery infrastructure for all but the smallest number of sites. While such a centralized architecture shows significant promise, we find that our decentralized implementation, both in emulation and running continuously on over 200 PlanetLab nodes, performs well while benefiting from the DHTs self-healing properties.

workshop on hot topics in operating systems | 1999

ISTORE: introspective storage for data-intensive network services

Aaron B. Brown; David L. Oppenheimer; Kimberly Keeton; Randi Thomas; John Kubiatowicz; David A. Patterson

Todays fast-growing data-intensive network services place heavy demands on the back-end servers that support them. This paper introduces ISTORE, a novel server architecture that couples LEGO-like plug-and-play hardware with a generic framework for constructing adaptive software that leverages continuous self-monitoring. ISTORE exploits introspection to provide high availability, performance, and scalability while drastically reducing the cost and complexity of administration. An ISTORE-based server monitors and adapts to changes in the imposed workload and to unexpected system events such as hardware failure. This adaptability is enabled by a combination of intelligent self-monitoring hardware components and an extensible software framework that allows the target application to specify monitoring and adaptation policies to the system.

IEEE Transactions on Computers | 2002

ROC-1: hardware support for recovery-oriented computing

David L. Oppenheimer; Aaron B. Brown; J. Beck; Daniel Hettena; Jon Kuroda; Noah Treuhaft; David A. Patterson; Katherine A. Yelick

We introduce the ROC-1 hardware platform, a large-scale cluster system designed to provide high availability for Internet service applications. The ROC-1 prototype embodies our philosophy of recovery-oriented computing (ROC) by emphasizing detection and recovery from the failures that inevitably occur in Internet service environments, rather than simple avoidance of such failures. ROC-1 promises greater availability than existing server systems by incorporating four techniques applied from the ground up to both hardware and software: redundancy and isolation, online self-testing and verification, support for problem diagnosis and concern for human interaction with the system.

international conference on supercomputing | 1999

Shared virtual memory with automatic update support

Liviu Iftode; Matthias A. Blumrich; Cezary Dubnicki; David L. Oppenheimer; Jaswinder Pal Singh; Kai Li

Shared virtual memory systems provide the abstraction of a shared address space on top of a messagepassing communication architecture. The overall performance of an SVM system therefore depends on both the raw performance of the underlying communication mechanism and the efficiency with which the SVM protocol uses that mechanism. The Automatic Update Release Consistency (AURC) protocol was proposed to take advantage of simple memory-mapped communication and automatic update support to accelerate a shared virtual memory protocol. However, there has not yet been a real system on which an implementation of this protocol could be evaluated. This paper reports our evaluation of AURC on the SHRIMP multicomputer, the only hardware platform that supports an automatic update mechanism. Automatic update propagates local memory writes to remote memory locations automatically. We compare the AURC protocol with its all-software counterpart protocol, Home-based Lazy Release Consistency (HLRC). By integrating AU support into the protocol as well, the AURC protocol can improve performance. For applications with write-write false sharing, an AU-based multiple-writer protocol can significantly outperform an all-software home-based multiple-writer LRC protocol that uses diffs. For applications without much writewrite false sharing, the two protocols perform similarly. Our results also show that write-through caching and automatic update traffic does not perturb the computation, validating the implementation as achieving its goals.

acm sigops european workshop | 2002

Studying and using failure data from large-scale internet services

David L. Oppenheimer; David A. Patterson

Large-scale Internet services are the newest and arguably the most commercially important class of systems requiring 24x7 availability. As a result, very little information has been published about their causes of failure. In an attempt to address this deficiency, we have analyzed detailed failure reports from three large-scale Internet services. Our goals are to (1) identify the major factors contributing to user-visible failures, (2) evaluate the (potential) effectiveness of various techniques for preventing and mitigating service failure, and (3) build a fault model for service-level dependability and recovery benchmarks. Our initial results indicate that operator error and network problems are the leading contributors to user-visible failures, that failures in custom-written front-end software are significant, and that online testing and more thoroughly exposing and handling component failures would reduce failure rates in at least one service.

symposium on operating systems principles | 2005

Service placement in shared wide-area platforms

David L. Oppenheimer; Brent N. Chun; David A. Patterson; Alex C. Snoeren; Amin Vahdat

Federated geographically-distributed computing platforms such as PlanetLab [1] and the Grid [2, 3] have recently become popular for evaluating and deploying network services and scientific computations. As the size, reach, and user population of such infrastructures grow, resource discovery and resource selection become increasingly important. Although a number of resource discovery and allocation services have been built, there is little data on the utilization of the distributed computing platforms they target. Yet the design and efficacy of such services depends on the characteristics of the target platform.

Scientific Programming | 1999

U-Net/SLEc A Java-based user-customizable virtual network interface

Matt Welsh; David L. Oppenheimer; David E. Culler

We describe U-Net/SLE (Safe Language Extensions), a user-level network interface architecture which enables per-application customization of communication semantics through downloading of user extension applets, implemented as Java classfiles, to the network interface. This architecture permits applications to safely specify code to be executed within the NI on message transmission and reception. By leveraging the existing U-Net model, applications may implement protocol code at the user level, within the NI, or using some combination of the two. Our current implementation, using the Myricom Myrinet interface and a small Java Virtual Machine subset, allows host communication overhead to be reduced and improves the overlap of communication and computation during protocol processing.

usenix symposium on internet technologies and systems | 2003

Why do internet services fail, and what can be done about it?

David L. Oppenheimer; Archana Ganapathi; David A. Patterson

Archive | 2002

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies

David Patterson; Aaron B. Brown; Pete Broadwell; George Candea; Mike Chen; James W. Cutler; Patricia Enriquez; Armando Fox; Matthew Merzbacher; David L. Oppenheimer; Naveen Sastry; William H. Tetzlaff; Jonathan Traupman; Noah Treuhaft; David A. Patterson

WORLDS | 2004