Is this you? Create Your Porfile

Brent N. Chun

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brent N. Chun is active.

Explore More

Publication

Featured researches published by Brent N. Chun.

parallel computing | 2004

The ganglia distributed monitoring system: design, implementation, and experience

Matthew L. Massie; Brent N. Chun; David E. Culler

Abstract Ganglia is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. This paper presents the design, implementation, and evaluation of Ganglia along with experience gained through real world deployments on systems of widely varying scale, configurations, and target application domains over the last two and a half years.

acm special interest group on data communication | 2003

PlanetLab: an overlay testbed for broad-coverage services

Brent N. Chun; David E. Culler; Timothy Roscoe; Andy C. Bavier; Larry L. Peterson; Mike Wawrzoniak; Mic Bowman

PlanetLab is a global overlay network for developing and accessing broad-coverage network services. Our goal is to grow to 1000 geographically distributed nodes, connected by a disverse collection of links. PlanetLab allows multiple service to run concurrently and continuously, each in its own slice of PlanetLab. This paper discribes our initial implementation of PlanetLab, including the mechanisms used to impelment virtualization, and the collection of core services used to manage PlanetLab.

symposium on operating systems principles | 2003

SHARP: an architecture for secure resource peering

Yun Fu; Jeffrey S. Chase; Brent N. Chun; Stephen Schwab; Amin Vahdat

This paper presents Sharp, a framework for secure distributed resource management in an Internet-scale computing infrastructure. The cornerstone of Sharp is a construct to represent cryptographically protected resource <it>claims</it>---promises or rights to control resources for designated time intervals---together with secure mechanisms to subdivide and delegate claims across a network of resource managers. These mechanisms enable flexible <it>resource peering</it>: sites may trade their resources with peering partners or contribute them to a federation according to local policies. A separation of claims into <it>tickets</it> and <it>leases</it> allows coordinated resource management across the system while preserving site autonomy and local control over resources. Sharp also introduces mechanisms for controlled, accountable <it>oversubscription</it> of resource claims as a fundamental tool for dependable, efficient resource management. We present experimental results from a Sharp prototype for PlanetLab, and illustrate its use with a decentralized barter economy for global PlanetLab resources. The results demonstrate the power and practicality of the architecture, and the effectiveness of oversubscription for protecting resource availability in the presence of failures.

cluster computing and the grid | 2002

User-Centric Performance Analysis of Market-Based Cluster Batch Schedulers

Brent N. Chun; David E. Culler

This paper presents a performance analysis of market-based batch schedulers for clusters of workstations. In contrast to previous work, we use user-centric performance metrics as the basis for system evaluation. Each user is modeled as having a utility function for each job which measures value delivered to the user as function of execution time. Summing over all utility functions in the workload, we use aggregate utility as a measure of overall value delivered to users. With aggregate utility as the performance metric, simulations are used to quantify the performance of both market-based and traditional batch scheduling algorithms under a variety of synthetic work-loads. Results show that an auction-based batch scheduling algorithm improves performance by a factor of up to 2-5x for sequential workloads and up to 14x for highly parallel workloads compared to traditional scheduling algorithms.

hawaii international conference on system sciences | 2004

Decentralized trust management and accountability in federated systems

Brent N. Chun; Andy C. Bavier

In this paper, we describe three key problems for trust management in federated systems and present a layered architecture for addressing them. The three problems we address include how to express and verify trust in a flexible and scalable manner, how to monitor the use of trust relationships over time, and how to manage and reevaluate trust relationships based on historical traces of past behavior. While previous work provides the basis for expressing and verifying trust, it does not address the concurrent problems of how to continuously monitor and manage trust relationships over time. These problems close the loop on trust management and are especially relevant in the context of federated systems where remote resources can be acquired across multiple administrative domains and used in potentially undesirable ways (e.g., to launch denial-of-service attacks).

international symposium on microarchitecture | 1998

Virtual network transport protocols for Myrinet

Brent N. Chun; Alan M. Mainwaring; David E. Culler

Bringing direct and protected network multiprogramming into mainstream cluster computing requires innovations in three key areas: application programming interfaces, network virtualization systems, and lightweight communication protocols for high-speed interconnects. The AM-II API extends traditional active messages with support for client-server computing and facilitates the construction of parallel clients and distributed servers. Our virtual network segment driver enables a large number of arbitrary sequential and parallel applications to access network interface resources directly in a concurrent but fully protected manner. The NIC-to-NIC communication protocols provide reliable and at-most-once message delivery between communication endpoints. The NIC-to-NIC protocols perform well as the number of endpoints and the number of hosts in the cluster are scaled. The flexibility afforded by the underlying protocols enables a diverse set of timely research efforts. Other Berkeley researchers are actively using this system to investigate implicit techniques for the coscheduling of communicating processes, an essential part of high-performance communications in multiprogrammed clusters of uni- and multiprocessor servers. Other researchers are extending the active message protocols described here for clusters of symmetric multiprocessors, using so-called multiprotocol techniques and multiple network interfaces per machine.

parallel computing | 2000

REXEC: A Decentralized, Secure Remote Execution Environment for Clusters

Brent N. Chun; David E. Culler

Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for clusters built using contemporary hardware and operating systems. Implementations are either too old and/or not publicly available, require use of operating systems which are not supported by modern hardware, or simply do not meet the functional requirements demanded by practical use in real world settings. To address these issues, we designed REXEC, a decentralized, secure remote execution facility. It provides high availability, scalability, transparent remote execution, dynamic cluster configuration, decoupled node discovery and selection, a well-defined failure and cleanup model, parallel and distributed program support, and strong authentication and encryption. The system is implemented and is currently installed and in use on a 32-node cluster of 2-way SMPs running the Linux 2.2.5 operating system.

acm special interest group on data communication | 2005

Addressing strategic behavior in a deployed microeconomic resource allocator

Chaki Ng; Philip Buonadonna; Brent N. Chun; Alex C. Snoeren; Amin Vahdat

While market-based systems have long been proposed as solutions for distributed resource allocation, few have been deployed for production use in real computer systems. Towards this end, we present our initial experience using Mirage, a microeconomic resource allocation system based on a repeated combinatorial auction. Mirage allocates time on a heavily-used 148-node wireless sensor network testbed. In particular, we focus on observed strategic user behavior over a four-month period in which 312,148 node hours were allocated across 11 research projects. Based on these results, we present a set of key challenges for market-based resource allocation systems based on repeated combinatorial auctions. Finally, we propose refinements to the systems current auction scheme to mitigate the strategies observed to date and also comment on some initial steps toward building an approximately strategyproof repeated combinatorial auction.

symposium on operating systems principles | 2005

Service placement in shared wide-area platforms

David L. Oppenheimer; Brent N. Chun; David A. Patterson; Alex C. Snoeren; Amin Vahdat

Federated geographically-distributed computing platforms such as PlanetLab [1] and the Grid [2, 3] have recently become popular for evaluating and deploying network services and scientific computations. As the size, reach, and user population of such infrastructures grow, resource discovery and resource selection become increasingly important. Although a number of resource discovery and allocation services have been built, there is little data on the utilization of the distributed computing platforms they target. Yet the design and efficacy of such services depends on the characteristics of the target platform.

acm symposium on parallel algorithms and architectures | 1997

System area network mapping

Brent N. Chun; Alan M. Mainwaring; Saul Schleimer; Daniel Shawcross Wilkerson

This paper presents a network mapping algorithm and proves its correctness assuming a traffic-free network. Respecting well-defined parameters, the algorithm produces a graph isomorphic to N F, where N is the network of switches and hosts and F is the set of switches connected by a switch-bridge to the set of hosts I-I. We show its performance on a Myrinet system-area network with a fat-tree-like topology. It can map 36 nodes, 13 switches and 64 links in 248 ms and 100 nodes, 40 switches, and 193 linksin981 rns. From such maps, the system computes mutually deadlock-free routes and distributes them to all network interfaces. Switched, multi-gigabyte per second, system area networks are the enabling building-blocks for networks of workstations. Because of their core role, these networks should be dynamically recontigurable, automatically adapting to the addition or removal of hosts, switches and links.

Explore More