David A Dillow | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David A Dillow is active.

Explore More

Publication

Featured researches published by David A Dillow.

petascale data storage workshop | 2010

Workload characterization of a leadership class storage cluster

Young-Jae Kim; Raghul Gunasekaran; Galen M. Shipman; David A Dillow; Zhe Zhang; Bradley W. Settlemyer

Understanding workload characteristics is critical for optimizing and improving the performance of current systems and software, and architecting new storage systems based on observed workload patterns. In this paper, we characterize the scientific workloads of the worlds fastest HPC (High Performance Computing) storage cluster, Spider, at the Oak Ridge Leadership Computing Facility (OLCF). Spider provides an aggregate bandwidth of over 240 GB/s with over 10 petabytes of RAID 6 formatted capacity. OLCFs flagship petascale simulation platform, Jaguar, and other large HPC clusters, in total over 250 thousands compute cores, depend on Spider for their I/O needs. We characterize the system utilization, the demands of reads and writes, idle time, and the distribution of read requests to write requests for the storage system observed over a period of 6 months. From this study we develop synthesized workloads and we show that the read and write I/O bandwidth usage as well as the inter-arrival time of requests can be modeled as a Pareto distribution.

ieee international conference on high performance computing data and analytics | 2012

Characterizing output bottlenecks in a supercomputer

Bing Xie; Jeffrey S. Chase; David A Dillow; Oleg Drokin; Scott Klasky; H Sarp Oral; Norbert Podhorszki

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.

high performance interconnects | 2011

The Common Communication Interface (CCI)

Scott Atchley; David A Dillow; Galen M. Shipman; Patrick Geoffray; Jeffrey M. Squyres; George Bosilca; Ronald G. Minnich

There are many APIs for connecting and exchanging data between network peers. Each interface varies wildly based on metrics including performance, portability, and complexity. Specifically, many interfaces make design or implementation choices emphasizing some of the more desirable metrics (e.g., performance) while sacrificing others (e.g., portability). As a direct result, software developers building large, network-based applications are forced to choose a specific network API based on a complex, multi-dimensional set of criteria. Such trade-offs inevitably result in an interface that fails to deliver some desirable features. In this paper, we introduce a novel interface that both supports many features that have become standard (or otherwise generally expected) in other communication interfaces, and strives to export a small, yet powerful, interface. This new interface draws upon years of experience from network-oriented software development best practices to systems-level implementations. The goal is to create a relatively simple, high-level communication interface with low barriers to adoption while still providing important features such as scalability, resiliency, and performance. The result is the Common Communications Interface (CCI): an intuitive API that is portable, efficient, scalable, and robust to meet the needs of network-intensive applications common in HPC and cloud computing.

ieee international conference on high performance computing data and analytics | 2014

Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems

Sarp Oral; James A Simmons; Jason J Hill; Dustin B Leverman; Feiyi Wang; Matt Ezell; Ross Miller; Douglas Fuller; Raghul Gunasekaran; Young-Jae Kim; Saurabh Gupta; Devesh Tiwari; Sudharshan S. Vazhkudai; James H. Rogers; David A Dillow; Galen M. Shipman; Arthur S. Bland

The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.

IEEE Transactions on Computers | 2014

Coordinating Garbage Collectionfor Arrays of Solid-State Drives

Young-Jae Kim; Junghee Lee; H Sarp Oral; David A Dillow; Feiyi Wang; Galen M. Shipman

Although solid-state drives (SSDs) offer significant performance improvements over hard disk drives (HDDs) for a number of workloads, they can exhibit substantial variance in request latency and throughput as a result of garbage collection (GC). When GC conflicts with an I/O stream, the stream can make no forward progress until the GC cycle completes. GC cycles are scheduled by logic internal to the SSD based on several factors such as the pattern, frequency, and volume of write requests. When SSDs are used in a RAID with currently available technology, the lack of coordination of the SSD-local GC cycles amplifies this performance variance. We propose a global garbage collection (GGC) mechanism to improve response times and reduce performance variability for a RAID of SSDs. We include a high-level design of SSD-aware RAID controller and GGC-capable SSD devices and algorithms to coordinate the GGC cycles. We develop reactive and proactive GC coordination algorithms and evaluate their I/O performance and block erase counts for various workloads. Our simulations show that GC coordination by a reactive scheme improves average response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write-dominated workloads, response time was improved by 69 percent and performance variability was reduced by 71 percent. We show that a proactive GC coordination algorithm can further improve the I/O response times by up to 9 percent and the performance variability by up to 15 percent. We also observe that it could increase the lifetimes of SSDs with some workloads (e.g., Financial) by reducing the number of block erase counts by up to 79 percent relative to a reactive algorithm for write-dominant enterprise workloads.

international performance computing and communications conference | 2011

Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems

David A Dillow; Galen M. Shipman; H Sarp Oral; Zhe Zhang; Young-Jae Kim

As storage systems get larger to meet the demands of petascale systems, careful planning must be applied to avoid congestion points and extract the maximum performance. In addition, the large data sets generated by such systems makes it desirable for all compute resources to have common access to this data without needing to copy it to each machine. This paper describes a method of placing I/O close to the storage nodes to minimize contention on Crays SeaStar2+ network, and extends it to a routed Lustre configuration to gain the same benefits when running against a center-wide file system. Our experiments using half of the resources of Spider — the center-wide file system at the Oak Ridge Leadership Computing Facility — show that I/O write bandwidth can be improved by up to 45% (from 71.9 to 104 GB/s) for a direct-attached configuration and by 137% (47.6 GB/s to 115 GB/s) for a routed configuration. We demonstrated up to 20.7% reduction in run-time for production scientific applications. With the full Spider system, we demonstrated over 240 GB/s of aggregate bandwidth using our techniques.

ieee conference on mass storage systems and technologies | 2011

Harmonia: A globally coordinated garbage collector for arrays of Solid-State Drives

Young-Jae Kim; H Sarp Oral; Galen M. Shipman; Junghee Lee; David A Dillow; Feiyi Wang

Solid-State Drives (SSDs) offer significant performance improvements over hard disk drives (HDD) on a number of workloads. The frequency of garbage collection (GC) activity is directly correlated with the pattern, frequency, and volume of write requests, and scheduling of GC is controlled by logic internal to the SSD. SSDs can exhibit significant performance degradations when garbage collection (GC) conflicts with an ongoing I/O request stream. When using SSDs in a RAID array, the lack of coordination of the local GC processes amplifies these performance degradations. No RAID controller or SSD available today has the technology to overcome this limitation. This paper presents Harmonia, a Global Garbage Collection (GGC) mechanism to improve response times and reduce performance variability for a RAID array of SSDs. Our proposal includes a high-level design of SSD-aware RAID controller and GGC-capable SSD devices, as well as algorithms to coordinate the global GC cycles. Our simulations show that this design improves response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write dominant workloads response time was improved by 69% while performance variability was reduced by 71%.

petascale data storage workshop | 2013

Asynchronous object storage with QoS for scientific and commercial big data

Michael J. Brim; David A Dillow; Sarp Oral; Bradley W. Settlemyer; Feiyi Wang

This paper presents our design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload domains are used to motivate the key abstractions used in the application programming interface (API). The architecture of the Scalable Object Store (SOS), a prototype object storage system that supports the APIs facilities, is presented. The SOS serves as a vehicle for future research into scalable and resilient big data object storage. We briefly review our research into providing efficient storage servers capable of providing quality of service (QoS) contracts relevant for big data use cases.

ieee international conference on high performance computing, data, and analytics | 2017

Output Performance Study on a Production Petascale Filesystem

Bing Xie; Jeffrey S. Chase; David A Dillow; Scott Klasky; Jay F. Lofstead; H Sarp Oral; Norbert Podhorszki

This paper reports our observations from a top-tier supercomputer Titan and its Lustre parallel file stores under production load. In summary, we find that supercomputer file systems are highly variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write-sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O middleware system—distributes the I/O load so that each client writes separate files on multiple targets, and each target stores files for multiple clients, in a balanced way. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good spots” in the machine or in the file system: component performance is driven by transient load conditions, and past performance is not a useful predictor of future performance. For example, we do not observe regular diurnal load patterns.

storage network architecture and parallel i/os | 2011