Kevin Harms | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevin Harms is active.

Explore More

Publication

Featured researches published by Kevin Harms.

ieee international conference on high performance computing data and analytics | 2009

I/O performance challenges at leadership scale

Samuel Lang; Philip H. Carns; Robert Latham; Robert B. Ross; Kevin Harms; William E. Allcock

Todays top high performance computing systems run applications with hundreds of thousands of processes, contain hundreds of storage nodes, and must meet massive I/O requirements for capacity and performance. These leadership-class systems face daunting challenges to deploying scalable I/O systems. In this paper we present a case study of the I/O challenges to performance and scalability on Intrepid, the IBM Blue Gene/P system at the Argonne Leadership Computing Facility. Listed in the top 5 fastest supercomputers of 2008, Intrepid runs computational science applications with intensive demands on the I/O system. We show that Intrepids file and storage system sustain high performance under varying workloads as the applications scale with the number of processes.

ACM Transactions on Storage | 2011

Understanding and Improving Computational Science Storage Access through Continuous Characterization

Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross

Computational science applications are driving a demand for increasingly powerful storage systems. While many techniques are available for capturing the I/O behavior of individual application trial runs and specific components of the storage system, continuous characterization of a production system remains a daunting challenge for systems with hundreds of thousands of compute cores and multiple petabytes of storage. As a result, these storage systems are often designed without a clear understanding of the diverse computational science workloads they will support.

ieee conference on mass storage systems and technologies | 2011

Understanding and improving computational science storage access through continuous characterization

Philip H. Carns; Kevin Harms; William E. Allcock; Charles Bacon; Samuel Lang; Robert Latham; Robert B. Ross

high performance distributed computing | 2010

Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network

Rajkumar Kettimuthu; Alex Sim; Dan Gunter; Bill Allcock; Peer-Timo Bremer; John Bresnahan; Andrew Cherry; Lisa Childers; Eli Dart; Ian T. Foster; Kevin Harms; Jason Hick; Jason Lee; Michael Link; Jeff Long; Keith Miller; Vijaya Natarajan; Valerio Pascucci; Ken Raffenetti; David Ressman; Dean N. Williams; Loren Wilson; Linda Winkler

In preparation for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report, the climate community will run the Coupled Model Intercomparison Project phase 5 (CMIP-5) experiments, which are designed to answer crucial questions about future regional climate change and the results of carbon feedback for different mitigation scenarios. The CMIP-5 experiments will generate petabytes of data that must be replicated seamlessly, reliably, and quickly to hundreds of research teams around the globe. As an end-to-end test of the technologies that will be used to perform this task, a multi-disciplinary team of researchers moved a small portion (10 TB) of the multimodel Coupled Model Intercomparison Project, Phase 3 data set used in the IPCC Fourth Assessment Report from three sources---the Argonne Leadership Computing Facility (ALCF), Lawrence Livermore National Laboratory (LLNL) and National Energy Research Scientific Computing Center (NERSC)---to the 2009 Supercomputing conference (SC09) show floor in Portland, Oregon, over circuits provided by DOEs ESnet. The team achieved a sustained data rate of 15 Gb/s on a 20 Gb/s network. More important, this effort provided critical feedback on how to deploy, tune, and monitor the middleware that will be used to replicate the upcoming petascale climate datasets. We report on obstacles overcome and the key lessons learned from this successful bandwidth challenge effort.

parallel, distributed and network-based processing | 2014

Scalable Parallel I/O on a Blue Gene/Q Supercomputer Using Compression, Topology-Aware Data Aggregation, and Subfiling

Huy Bui; Hal Finkel; Venkatram Vishwanath; Salman Habib; Katrin Heitmann; Jason Leigh; Michael E. Papka; Kevin Harms

In this paper, we propose an approach to improving the I/O performance of an IBM Blue Gene/Q supercomputing system using a novel framework that can be integrated into high performance applications. We take advantage of the systems tremendous computing resources and high interconnection bandwidth among compute nodes to efficiently exploit I/O bandwidth. This approach focuses on lossless data compression, topology-aware data movement, and subfiling. The efficacy of this solution is demonstrated using microbenchmarks and an application-level benchmark.

ieee international conference on high performance computing data and analytics | 2012

A Case for Optimistic Coordination in HPC Storage Systems

Philip H. Carns; Kevin Harms; Dries Kimpe; Justin M. Wozniak; Robert B. Ross; Lee Ward; Matthew L. Curry; Ruth Klundt; Geoff Danielson; Cengiz Karakoyunlu; John A. Chandy; Bradley W. Settlemyer; William Gropp

High-performance computing (HPC) storage systems rely on access coordination to ensure that concurrent updates do not produce incoherent results. HPC storage systems typically employ pessimistic distributed locking to provide this functionality in cases where applications cannot perform their own coordination. This approach, however, introduces significant performance overhead and complicates fault handling. In this work we evaluate the viability of optimistic conditional storage operations as an alternative to distributed locking in HPC storage systems. We investigate design strategies and compare the two approaches in a prototype object storage system using a parallel read/modify/write benchmark. Our prototype illustrates that conditional operations can be easily integrated into distributed object storage systems and can outperform standard coordination primitives for simple update workloads. Our experiments show that conditional updates can achieve over two orders of magnitude higher performance than pessimistic locking for some parallel read/modify/write workloads.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2014

A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems

Shane Snyder; Philip H. Carns; Jonathan Jenkins; Kevin Harms; Robert B. Ross; Misbah Mubarak; Christopher D. Carothers

Fault response strategies are crucial to maintaining performance and availability in HPC storage systems, and the first responsibility of a successful fault response strategy is to detect failures and maintain an accurate view of group membership. This is a nontrivial problem given the unreliable nature of communication networks and other system components. As with many engineering problems, trade-offs must be made to account for the competing goals of fault detection efficiency and accuracy.

networking architecture and storages | 2012

AESOP: Expressing Concurrency in High-Performance System Software

Dries Kimpe; Philip H. Carns; Kevin Harms; Justin M. Wozniak; Samuel Lang; Robert B. Ross

High-performance computing (HPC) and distributed systems rely on a diverse collection of system soft-ware to provide application services, including file systems, schedulers, and web services. Such system software services must manage highly concurrent requests, interact with a wide range of resources, and scale well in order to be successful. Unfortunately, no single programming model for distributed system software currently offers optimal performance and productivity for all these tasks. While numerous libraries, languages, and language extensions have been developed in recent years to simplify parallel computation, they do not address the challenges of distributed system software in which concurrency control involves a variety of hardware and network devices, not just computational resources. In this work we present AESOP, a new programming language and programming model designed to implement distributed system software with high development productivity and run-time efficiency. AESOP is a superset of the C language that describes blocks of code to be executed concurrently without dictating whether that concurrency will be provided by a threading, event, or other model. This decoupling enables system software to adjust to different architectures, device APIs, and workloads without any change to the core algorithm implementation. AESOP also provides additional language constructs to simplify common system software development tasks. We evaluate AESOP by implementing a basic file server and comparing its performance, memory efficiency, and developer productivity with several thread-based and event-based implementations. AESOP is shown to provide competitive performance to traditional distributed system software development models while at the same time reducing code complexity and enhancing developer productivity.

Proceedings of the 5th Workshop on Extreme-Scale Programming Tools | 2016

Modular HPC I/O characterization with Darshan

Shane Snyder; Philip H. Carns; Kevin Harms; Robert B. Ross; Glenn K. Lockwood; Nicholas J. Wright

Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on a number of different compute platforms in their lifetime. These large-scale HPC platforms employ increasingly complex I/O subsystems to provide a suitable level of I/O performance to applications. Tuning I/O workloads for such a system is nontrivial, and the results generally are not portable to other HPC systems. I/O profiling tools can help to address this challenge, but most existing tools only instrument specific components within the I/O subsystem that provide a limited perspective on I/O performance. The increasing diversity of scientific applications and computing platforms calls for greater flexibility and scope in I/O characterization.In this work, we consider how the I/O profiling tool Darshan can be improved to allow for more flexible, comprehensive instrumentation of current and future HPC I/O workloads. We evaluate the performance and scalability of our design to ensure that it is lightweight enough for full-time deployment on production HPC systems. We also present two case studies illustrating how a more comprehensive instrumentation of application I/O workloads can enable insights into I/O behavior that were not previously possible. Our results indicate that Darshans modular instrumentation methods can provide valuable feedback to both users and system administrators, while imposing negligible overheads on user applications.

Volume 2: Emissions Control Systems; Instrumentation, Controls, and Hybrids; Numerical Simulation; Engine Design and Mechanical Development | 2015

Development of a Stiffness-Based Chemistry Load Balancing Scheme, and Optimization of I/O and Communication, to Enable Massively Parallel High-Fidelity Internal Combustion Engine Simulations

Janardhan Kodavasal; Kevin Harms; Priyesh Srivastava; Sibendu Som; Shaoping Quan; Keith Richards; Marta García

A closed-cycle gasoline compression ignition engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q supercomputer. The test case has 9 million cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output performance resulted in a significant speedup in reading restart files, and in an over 100-times speedup in writing restart files and files for post-processing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over 3-times faster run-time near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.Copyright

Explore More