Shane Snyder | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shane Snyder is active.

Explore More

Publication

Featured researches published by Shane Snyder.

ieee international conference on high performance computing data and analytics | 2015

Techniques for modeling large-scale HPC I/O workloads

Shane Snyder; Philip H. Carns; Robert Latham; Misbah Mubarak; Robert B. Ross; Christopher D. Carothers; Babak Behzad; Huong Luu; Surendra Byna; Prabhat

Accurate analysis of HPC storage system designs is contingent on the use of I/O workloads that are truly representative of expected use. However, I/O analyses are generally bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, despite the fact that no single workload modeling technique is appropriate for all use cases. In this work, we present the design of IOWA, a novel I/O workload abstraction that allows arbitrary workload consumer components to obtain I/O workloads from a range of diverse input sources. Thus, researchers can choose specific I/O workload generators based on the resources they have available and the type of evaluation they wish to perform. As part of this research, we also outline the design of three distinct workload generation methods, based on I/O traces, synthetic I/O kernels, and I/O characterizations. We analyze and contrast each of these workload generation techniques in the context of storage system simulation models as well as production storage system measurements. We found that each generator mechanism offers varying levels of accuracy, flexibility, and breadth of use that should be considered before performing I/O analyses. We also recommend a set of best practices for HPC I/O workload modeling based on challenges that we encountered while performing our evaluation.

International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems | 2014

A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems

Shane Snyder; Philip H. Carns; Jonathan Jenkins; Kevin Harms; Robert B. Ross; Misbah Mubarak; Christopher D. Carothers

Fault response strategies are crucial to maintaining performance and availability in HPC storage systems, and the first responsibility of a successful fault response strategy is to detect failures and maintain an accurate view of group membership. This is a nontrivial problem given the unreliable nature of communication networks and other system components. As with many engineering problems, trade-offs must be made to account for the competing goals of fault detection efficiency and accuracy.

Proceedings of the 5th Workshop on Extreme-Scale Programming Tools | 2016

Modular HPC I/O characterization with Darshan

Shane Snyder; Philip H. Carns; Kevin Harms; Robert B. Ross; Glenn K. Lockwood; Nicholas J. Wright

Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on a number of different compute platforms in their lifetime. These large-scale HPC platforms employ increasingly complex I/O subsystems to provide a suitable level of I/O performance to applications. Tuning I/O workloads for such a system is nontrivial, and the results generally are not portable to other HPC systems. I/O profiling tools can help to address this challenge, but most existing tools only instrument specific components within the I/O subsystem that provide a limited perspective on I/O performance. The increasing diversity of scientific applications and computing platforms calls for greater flexibility and scope in I/O characterization.In this work, we consider how the I/O profiling tool Darshan can be improved to allow for more flexible, comprehensive instrumentation of current and future HPC I/O workloads. We evaluate the performance and scalability of our design to ensure that it is lightweight enough for full-time deployment on production HPC systems. We also present two case studies illustrating how a more comprehensive instrumentation of application I/O workloads can enable insights into I/O behavior that were not previously possible. Our results indicate that Darshans modular instrumentation methods can provide valuable feedback to both users and system administrators, while imposing negligible overheads on user applications.

international conference on cluster computing | 2017

Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers

Misbah Mubarak; Philip H. Carns; Jonathan Jenkins; Jianping Kelvin Li; Nikhil Jain; Shane Snyder; Robert B. Ross; Christopher D. Carothers; Abhinav Bhatele; Kwan-Liu Ma

HPC systems have shifted to burst buffer storage and high radix interconnect topologies in order to meet the challenges of large-scale, data-intensive scientific computing. Both of these technologies have been studied in detail independently, but the interaction between them is not well understood. I/O traffic and communication traffic from concurrently scheduled applications may interfere with each other in unexpected ways, and this behavior may vary considerably depending on resource allocation, scheduling, and routing policies.In this work, we analyze I/O and network traffic interference on burst-buffer-equipped dragonfly-based systems using the high-resolution packet-level simulations provided by the CODES storage and interconnect simulation framework. The analysis is performed using realistic I/O workload sizes, a variety of resource allocation and network routing strategies employed in production environments, and a dragonfly network configuration modeled after current vendor options. We analyze the impact of interference on both I/O and communication traffic.We observe that although average network packet latency is stable across a wide variety of configurations, the maximum network packet latency in the presence of concurrent I/O traffic is highly sensitive to subtle policy changes. Our simulations reveal a worst-case single packet latency of 4,700 times the average latency for sub-optimal configurations. While a topology-aware mapping of compute nodes to burst buffer storage nodes can minimize the variation in maximum packet latency, it can slow down the I/O traffic by creating contention on the burst buffer nodes. Overall, balancing I/O and network performance requires careful selection of routing, data placement, and job placement policies.

Proceedings of the 4th Workshop on Extreme Scale Programming Tools | 2015

HPC I/O trace extrapolation

Xiaoqing Luo; Frank Mueller; Philip H. Carns; John Jenkins; Robert Latham; Robert B. Ross; Shane Snyder

Todays rapid development of supercomputers has caused I/O performance to become a major performance bottleneck for many scientific applications. Trace analysis tools have thus become vital for diagnosing root causes of I/O problems. This work contributes an I/O tracing framework with elastic traces. After gathering a set of smaller traces, we extrapolate the application trace to a large numbers of nodes. The traces can in principle be extrapolated even beyond the scale of present-day systems. Experiments with I/O benchmarks on up to 320 processors indicate that extrapolated I/O trace replays closely resemble the I/O behavior of equivalent applications.

networking architecture and storages | 2017

Analysis and Correlation of Application I/O Performance and System-Wide I/O Activity

Sandeep Madireddy; Prasanna Balaprakash; Philip H. Carns; Robert Latham; Robert B. Ross; Shane Snyder; Stefan M. Wild

Storage resources in high-performance computing are shared across all user applications. Consequently, storage performance can vary markedly, depending not only on an applications workload but also on what other activity is concurrently running across the system. This variability in storage performance is directly reflected in overall execution time variability, thus confounding efforts to predict job performance for scheduling or capacity planning. I/O variability also complicates the seemingly straightforward process of performance measurement when evaluating application optimizations. In this work we present a methodology to measure I/O contention with more rigor than in prior work. We apply statistical techniques to gain insight from application-level statistics and storage-side logging. We examine different correlation metrics for relating system workload to job I/O performance and identify an effective and generally applicable metric for measuring job I/O performance. We further demonstrate that the system-wide monitoring granularity can directly affect the strength of correlation observed. Insufficient granularity and measurements can hide the correlations between application I/O performance and system-wide I/O activity.

international parallel and distributed processing symposium | 2017

ScalaIOExtrap: Elastic I/O Tracing and Extrapolation

Xiaoqing Luo; Frank Mueller; Philip H. Carns; Jonathan Jenkins; Robert Latham; Robert B. Ross; Shane Snyder

Today’s rapid development of supercomputers has caused I/O performance to become a major performance bottleneck for many scientific applications. Trace analysis tools have thus become vital for diagnosing root causes of I/O problems. This work contributes an I/O tracing framework with (a) techniques to gather a set of lossless, elastic I/O trace files for small number of nodes, (b) a mathematical model to analyze trace data and extrapolate it to larger number of nodes, and (c) a replay engine for the extrapolated trace file to verify its accuracy. The traces can in principle be extrapolated even beyond the scale of presentday systems and provide a test if applications scale in terms of I/O. We conducted our experiments on three platforms: a commodity Linux cluster, an IBM BG/Q system, and a discrete event simulation of an IBM BG/P system. We investigate a combination of synthetic benchmarks on all platforms as well as a production scientific application on the BG/Q system. The extrapolated I/O trace replays closely resemble the I/O behavior of equivalent applications in all cases.

Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems | 2017

UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis

Glenn K. Lockwood; Wucherl Yoo; Suren Byna; Nicholas J. Wright; Shane Snyder; Kevin Harms; Zachary Nault; Philip H. Carns

I/O efficiency is essential to productivity in scientific computing, especially as many scientific domains become more data-intensive. Many characterization tools have been used to elucidate specific aspects of parallel I/O performance, but analyzing components of complex I/O subsystems in isolation fails to provide insight into critical questions: how do the I/O components interact, what are reasonable expectations for application performance, and what are the underlying causes of I/O performance problems? To address these questions while capitalizing on existing component-level characterization tools, we propose an approach that combines on-demand, modular synthesis of I/O characterization data into a unified monitoring and metrics interface (UMAMI) to provide a normalized, holistic view of I/O behavior. We evaluate the feasibility of this approach by applying it to a month-long benchmarking study on two distinct large-scale computing platforms. We present three case studies that highlight the importance of analyzing application I/O performance in context with both contemporaneous and historical component metrics, and we provide new insights into the factors affecting I/O performance. By demonstrating the generality of our approach, we lay the groundwork for a production-grade framework for holistic I/O analysis.

ieee international conference on high performance computing, data, and analytics | 2018

Machine Learning Based Parallel I/O Predictive Modeling: A Case Study on Lustre File Systems

Sandeep Madireddy; Prasanna Balaprakash; Philip H. Carns; Robert Latham; Robert B. Ross; Shane Snyder; Stefan M. Wild

Parallel I/O hardware and software infrastructure is a key contributor to performance variability for applications running on large-scale HPC systems. This variability confounds efforts to predict application performance for characterization, modeling, optimization, and job scheduling. We propose a modeling approach that improves predictive ability by explicitly treating the variability and by leveraging the sensitivity of application parameters on performance to group applications with similar characteristics. We develop a Gaussian process-based machine learning algorithm to model I/O performance and its variability as a function of application and file system characteristics. We demonstrate the effectiveness of the proposed approach using data collected from the Edison system at the National Energy Research Scientific Computing Center. The results show that the proposed sensitivity-based models are better at prediction when compared with application-partitioned or unpartitioned models. We highlight modeling techniques that are robust to the outliers that can occur in production parallel file systems. Using the developed metrics and modeling approach, we provide insights into the file system metrics that have a significant impact on I/O performance.

operating systems design and implementation | 2016