William E. Allcock
Argonne National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William E. Allcock.
conference on high performance computing (supercomputing) | 2005
William E. Allcock; John Bresnahan; Rajkumar Kettimuthu; Michael Link; Catalin L. Dumitrescu; Ioan Raicu; Ian T. Foster
The GridFTP extensions to the File Transfer Protocol define a general-purpose mechanism for secure, reliable, high-performance data movement. We report here on the Globus striped GridFTP framework, a set of client and server libraries designed to support the construction of data-intensive tools and applications. We describe the design of both this framework and a striped GridFTP server constructed within the framework. We show that this server is faster than other FTP servers in both single-process and striped configurations, achieving, for example, speeds of 27.3 Gbit/s memory-to-memory and 17 Gbit/s disk-to-disk over a 60 millisecond round trip time, 30 Gbit/s network. In another experiment, we show that the server can support 1800 concurrent clients without excessive load. We argue that this combination of performance and modular structure make the Globus GridFTP framework both a good foundation on which to build tools and applications, and a unique testbed for the study of innovative data management techniques and network protocols.
ieee international conference on high performance computing data and analytics | 2009
Samuel Lang; Philip H. Carns; Robert Latham; Robert B. Ross; Kevin Harms; William E. Allcock
Todays top high performance computing systems run applications with hundreds of thousands of processes, contain hundreds of storage nodes, and must meet massive I/O requirements for capacity and performance. These leadership-class systems face daunting challenges to deploying scalable I/O systems. In this paper we present a case study of the I/O challenges to performance and scalability on Intrepid, the IBM Blue Gene/P system at the Argonne Leadership Computing Facility. Listed in the top 5 fastest supercomputers of 2008, Intrepid runs computational science applications with intensive demands on the I/O system. We show that Intrepids file and storage system sustain high performance under varying workloads as the applications scale with the number of processes.
ACM Transactions on Storage | 2011
Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross
Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.
ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH: VII International Workshop; ACAT 2000 | 2002
William E. Allcock; Ian T. Foster; Steven Tuecke; Ann L. Chervenak; Carl Kesselman
We describe work being performed in the Globus project to develop enabling protocols and services for distributed data-intensive science. These services include: * High-performance, secure data transfer protocols based on FTP, plus a range of libraries and tools that use these protocols * Replica catalog services supporting the creation and location of file replicas in distributed systems These components leverage the substantial body of “Grid” services and protocols developed within the Globus project and by its collaborators, and are being used in a number of data-intensive application projects.
local computer networks | 2002
Ravi K. Madduri; Cynthia S. Hood; William E. Allcock
Grid-based computing environments are becoming increasingly popular for scientific computing. One of the key issues for scientific computing is the efficient transfer of large amounts of data across the Grid. In this paper we present a reliable file transfer (RFT) service that significantly improves the efficiency of large-scale file transfer. RFT can detect a variety of failures and restart the file transfer from the point of failure. It also has capabilities for improving transfer performance through TCP tuning.
ieee conference on mass storage systems and technologies | 2011
Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross
Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.
international parallel and distributed processing symposium | 2005
William E. Allcock; John Bresnahan; K. Kettimuthu; Joseph M. Link
In distributed heterogeneous grid environments the protocols used to exchange bits are crucial. As researchers work hard to discover the best new protocol for the grid, application developers struggle with ways to use these new protocols. A stable, consistent, and intuitive framework is needed to aid in the implementation and use of these protocols. While the application must not be burdened with the protocol details some of it may need to be exposed to take advantage of potential optimizations. In this paper we examine how the Globus XIO API provides this framework. We explore the performance implications of using this abstraction layer and the benefits gained in application as well as protocol development.
high performance distributed computing | 2002
William E. Allcock; Joseph Bester; John Bresnahan; Ian T. Foster; Jarek Gawor; Joseph A. Insley; Joseph M. Link; Michael E. Papka
Grid applications can combine the use of computation, storage, network, and other resources. These resources are often geographically distributed, adding to application complexity and thus the difficulty of understanding application performance. We present GridMapper, a tool for monitoring and visualizing the behavior of such distributed systems. GridMapper builds on basic mechanisms for registering, discovering, and accessing performance information sources, as well as for mapping from domain names to physical locations. The visualization system itself then supports the automatic layout of distributed sets of such sources and animation of their activities. We use a set of examples to illustrate how the system can provide valuable insights into the behavior and performance of a range of different applications.
ieee international conference on services computing | 2004
Honghai Zhang; Katarzyna Keahey; William E. Allcock
Over the last decade, grids have become a successful tool for providing distributed environments for secure and coordinated execution of applications. The successful deployment of many realistic applications in such environments on a large scale has motivated their use in experimental science [L. C. Pearlman et al., (2004), K. Keahey et al. (2004)] where grid-based computations are used to assist in ongoing experiments. In such scenarios, quality of service (QoS) guarantees on execution as well as data transfer is desirable. The recently proposed WS-Agreement model [K. Czajkowski et al. K. Keahey et al. (2004)] provides an infrastructure within which such quality of service can be negotiated and obtained. We have designed and implemented a data transfer service that exposes an interface based on this model and defines agreements which guarantee that, within a certain confidence level, file transfer can be completed under a specified time. The data transfer service accepts a clients request for data transfer and makes an agreement with the client based on QoS metrics (such as the transfer time and confidence level with which the service can be provided). In our approach we use prediction as a base for formulating an agreement with the client, and we combine prediction and rate limiting to adoptively ensure that the agreement is met.
international conference on cluster computing | 2015
Gideon Juve; Benjamín Tovar; Rafael Ferreira da Silva; Dariusz Król; Douglas Thain; Ewa Deelman; William E. Allcock; Miron Livny
Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources used by computational tasks. We focus on tasks running in distributed execution systems, in which a task requests the resources it needs, and the execution system ensures the availability of such resources. This presents two non-trivial problems: how to measure the resources consumed by a task, and how to monitor and report resource exhaustion in a robust and timely manner. For both of these tasks, operating systems have a variety of mechanisms with different degrees of availability, accuracy, overhead, and intrusiveness. We describe various forms of monitoring and the available mechanisms in contemporary operating systems. We then present two specific monitoring tools that choose different tradeoffs in overhead and accuracy, and evaluate them on a selection of benchmarks.