Quincey Koziol
Lawrence Berkeley National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Quincey Koziol.
ieee international conference on high performance computing data and analytics | 2013
Babak Behzad; Huong Luu; Joseph Huchette; Surendra Byna; Prabhat; Ruth A. Aydt; Quincey Koziol; Marc Snir
We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, and 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. We consistently demonstrate I/O write speedups between 2× and 100× for test configurations.
ieee international conference on high performance computing data and analytics | 2016
Jay F. Lofstead; Ivo Jimenez; Carlos Maltzahn; Quincey Koziol; John M. Bent; Eric Barton
The DOE Extreme-Scale Technology Acceleration Fast Forward Storage and IO Stack project is going to have significant impact on storage systems design within and beyond the HPC community. With phase two of the project starting, it is an excellent opportunity to explore the complete design and how it will address the needs of extreme scale platforms. This paper examines each layer of the proposed stack in some detail along with cross-cutting topics, such as transactions and metadata management. This paper not only provides a timely summary of important aspects of the design specifications but also captures the underlying reasoning that is not available elsewhere. We encourage the broader community to understand the design, intent, and future directions to foster discussion guiding phase two and the ultimate production storage stack based on this work. An initial performance evaluation of the early prototype implementation is also provided to validate the presented design.
high performance distributed computing | 2013
Babak Behzad; Joseph Huchette; Huong Luu; Ruth A. Aydt; Surendra Byna; Yushu Yao; Quincey Koziol; Prabhat
The modern parallel I/O stack consists of several software layers with complex inter-dependencies and performance characteristics. While each layer exposes tunable parameters, it is often unclear to users how different parameter settings interact with each other and affect overall I/O performance. As a result, users often resort to default system settings, which typically obtain poor I/O bandwidth. In this research, we develop a benchmark guided auto-tuning framework for tuning the HDF5, MPI-IO, and Lustre layers on production supercomputing facilities. Our framework consists of three main components. H5Tuner uses a control file to adjust I/O parameters without modifying or recompiling the application. H5PerfCapture records performance metrics for HDF5 and MPI-IO. H5Evolve uses a genetic algorithm to explore the parameter space to determine well-performing configurations. We demonstrate I/O performance results for three HDF5 application-based benchmarks on a Sun HPC system. All the benchmarks running on 512 MPI processes perform 3X to 5.5X faster with the auto-tuned I/O parameters compared to a configuration with default system parameters.
International Journal of Big Data Intelligence | 2016
Vishwanath Venkatesan; Mohamad Chaarawi; Quincey Koziol; Neil Fortner; Edgar Gabriel
Data-intensive applications are largely influenced by the I/O performance on high performance computing systems, and the scalability of such applications primarily depends on the scalability of the I/O subsystem itself. To mitigate the effects of I/O operations, recent high end systems make use of staging nodes to delegate I/O requests and thus decouple I/O and compute operations. The data staging layers however, lack the benefit that could be obtained from the collective I/O style optimisations that client side parallel I/O operations provide. In this paper, we present the compactor framework developed as a part of the exascale fastforward I/O stack. The compactor framework introduces optimisations at the data staging nodes, including collective buffering across requests from multiple processes, write stealing to service read requests at the staging node, and write morphing to optimise multiple write requests from the same process. The compactor framework is evaluated on a PVFS2 file system using micro-benchmarks, the flash I/O benchmark and a parallel image processing application. Our results indicate significant performance benefits of up to 70% due to the optimisations of the compactor.
international conference on cluster computing | 2014
Jay F. Lofstead; Ivo Jimenez; Carlos Maltzahn; Quincey Koziol; John M. Bent; Eric Barton
Current production HPC IO stack design is unlikely to offer sufficient features and performance to adequately serve extreme scale science platform requirements as well as Big Data problems. A joint effort between the US Department of Energys Office of Advanced Simulation and Computing and Advanced Scientific Computing Research commissioned a project to develop a design and prototype for an IO stack suitable for the extreme scale environment. It will be referred to as the Fast Forward Storage and IO (FFSIO) project. This is a joint effort led by Lawrence Livermore National Laboratory, with the DOE Data Management Nexus leads Rob Ross and Gary Grider as coordinators and contract lead Mark Gary.
ieee international conference on high performance computing data and analytics | 2012
Babak Behzad; Joey Huchette; Huong Luu; Ruth A. Aydt; Quincey Koziol; Prabhat; Surendra Byna; Mohamad Chaarawi; Yushu Yao
Parallel I/O is an unavoidable part of modern high-performance computing (HPC), but its system-wide dependencies means it has eluded optimization across platforms and applications. This can introduce bottlenecks in otherwise computationally efficient code, especially as scientific computing becomes increasingly data-driven. Various studies have shown that dramatic improvements are possible when the parameters are set appropriately. However, as a result of having multiple layers in the HPC I/O stack - each with its own optimization parameters-and nontrivial execution time for a test run, finding the optimal parameter values is a very complex problem. Additionally, optimal sets do not necessarily translate between use cases, since tuning I/O performance can be highly dependent on the individual application, the problem size, and the compute platform being used. Tunable parameters are exposed primarily at three levels in the I/O stack: the system, middleware, and high-level data-organization layers. HPC systems need a parallel file system, such as Lustre, to intelligently store data in a parallelized fashion. Middleware communication layers, such as MPI-IO, support this kind of parallel I/O and offer a variety of optimizations, such as collective buffering. Scientists and application developers often use HDF5, a high-level cross-platform I/O library that offers a hierarchical object-database representation of scientific data.
Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS10),, Heraklion, Crete, Greece, September 24, 2010 | 2010
Mark Howison; Quincey Koziol; David Knaak; John Mainzer; John Shalf
Archive | 2011
Arie Shoshani; Terence Critchlow; Scott Klasky; James P. Ahrens; E. Wes Bethel; Hank Childs; Jian Huang; Kenneth I. Joy; Quincey Koziol; Gerald Fredrick Lofstead; Jeremy Meredith; Kenneth Moreland; George Ostrouchov; Michael E. Papka; Venkatram Vishwanath; Matthew Wolf; Nicholas Wright; Kesheng Wu
cluster computing and the grid | 2018
Houjun Tang; Suren Byna; Francois Tessier; Teng Wang; Bin Dong; Jingqing Mu; Quincey Koziol; Jerome Soumagne; Venkatram Vishwanath; Jialin Liu; Richard Warren
international conference on cluster computing | 2017
Houjun Tang; Suren Byna; Bin Dong; Jialin Liu; Quincey Koziol