Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthew L. Curry is active.

Publication


Featured researches published by Matthew L. Curry.


acm international conference on systems and storage | 2012

GPUstore: harnessing GPU computing for storage systems in the OS kernel

Weibin Sun; Robert Ricci; Matthew L. Curry

Many storage systems include computationally expensive components. Examples include encryption for confidentiality, checksums for integrity, and error correcting codes for reliability. As storage systems become larger, faster, and serve more clients, the demands placed on their computational components increase and they can become performance bottlenecks. Many of these computational tasks are inherently parallel: they can be run independently for different blocks, files, or I/O requests. This makes them a good fit for GPUs, a class of processor designed specifically for high degrees of parallelism: consumer-grade GPUs have hundreds of cores and are capable of running hundreds of thousands of concurrent threads. However, because the software frameworks built for GPUs have been designed primarily for the long-running, data-intensive workloads seen in graphics or high-performance computing, they are not well-suited to the needs of storage systems. In this paper, we present GPUstore, a framework for integrating GPU computing into storage systems. GPUstore is designed to match the programming models already used these systems. We have prototyped GPUstore in the Linux kernel and demonstrate its use in three storage subsystems: file-level encryption, block-level encryption, and RAID 6 data recovery. Comparing our GPU-accelerated drivers with the mature CPU-based implementations in the Linux kernel, we show performance improvements of up to an order of magnitude.


Concurrency and Computation: Practice and Experience | 2011

Gibraltar: A Reed-Solomon coding library for storage applications on programmable graphics processors

Matthew L. Curry; Anthony Skjellum; H. Lee Ward; Ron Brightwell

Reed–Solomon coding is a method for generating arbitrary amounts of erasure correction information from original data via matrix–vector multiplication in finite fields. Previous work has shown that modern CPUs are not well‐matched to this type of computation, requiring applications that depend on Reed–Solomon coding at high speeds (such as high‐performance storage arrays) to use hardware implementations. This work demonstrates that high performance is possible with current cost‐effective graphics processing units across a wide range of operating conditions and describes how performance will likely evolve in similar architectures. It describes the characteristics of the graphics processing unit architecture that enable high‐speed Reed–Solomon coding. A high‐performance practical library, Gibraltar, has been prototyped that performs Reed–Solomon coding on graphics processors in a manner suitable for storage arrays, along with applications with similar data resiliency needs. This library enables variably resilient erasure correcting codes to be used in a broad range of applications. Its performance is compared with that of a widely available CPU implementation, and a rationale for its API is presented. Its practicality is demonstrated through a usage example. Copyright


ieee international conference on high performance computing data and analytics | 2012

A Case for Optimistic Coordination in HPC Storage Systems

Philip H. Carns; Kevin Harms; Dries Kimpe; Justin M. Wozniak; Robert B. Ross; Lee Ward; Matthew L. Curry; Ruth Klundt; Geoff Danielson; Cengiz Karakoyunlu; John A. Chandy; Bradley W. Settlemyer; William Gropp

High-performance computing (HPC) storage systems rely on access coordination to ensure that concurrent updates do not produce incoherent results. HPC storage systems typically employ pessimistic distributed locking to provide this functionality in cases where applications cannot perform their own coordination. This approach, however, introduces significant performance overhead and complicates fault handling. In this work we evaluate the viability of optimistic conditional storage operations as an alternative to distributed locking in HPC storage systems. We investigate design strategies and compare the two approaches in a prototype object storage system using a parallel read/modify/write benchmark. Our prototype illustrates that conditional operations can be easily integrated into distributed object storage systems and can outperform standard coordination primitives for simple update workloads. Our experiments show that conditional updates can achieve over two orders of magnitude higher performance than pessimistic locking for some parallel read/modify/write workloads.


Archive | 2012

Report of experiments and evidence for ASC L2 milestone 4467 : demonstration of a legacy application's path to exascale.

Matthew L. Curry; Kurt Brian Ferreira; Kevin Pedretti; Vitus J. Leung; Kenneth Moreland; Gerald Fredrick Lofstead; Ann C. Gentile; Ruth Klundt; H. Lee Ward; James H. Laros; Karl Scott Hemmert; Nathan D. Fabian; Michael J. Levenhagen; Ronald B. Brightwell; Richard Frederick Barrett; Kyle Bruce Wheeler; Suzanne M. Kelly; Arun F. Rodrigues; James M. Brandt; David C. Thompson; John P. VanDyke; Ron A. Oldfield; Thomas Tucker

This report documents thirteen of Sandias contributions to the Computational Systems and Software Environment (CSSE) within the Advanced Simulation and Computing (ASC) program between fiscal years 2009 and 2012. It describes their impact on ASC applications. Most contributions are implemented in lower software levels allowing for application improvement without source code changes. Improvements are identified in such areas as reduced run time, characterizing power usage, and Input/Output (I/O). Other experiments are more forward looking, demonstrating potential bottlenecks using mini-application versions of the legacy codes and simulating their network activity on Exascale-class hardware. The purpose of this report is to prove that the team has completed milestone 4467-Demonstration of a Legacy Applications Path to Exascale. Cielo is expected to be the last capability system on which existing ASC codes can run without significant modifications. This assertion will be tested to determine where the breaking point is for an existing highly scalable application. The goal is to stretch the performance boundaries of the application by applying recent CSSE RD in areas such as resilience, power, I/O, visualization services, SMARTMAP, lightweight LWKs, virtualization, simulation, and feedback loops. Dedicated system time reservations and/or CCC allocations will be used to quantify the impact of system-level changes to extend the life and performance of the ASC code base. Finally, a simulation of anticipated exascale-class hardware will be performed using SST to supplement the calculations. Determine where the breaking point is for an existing highly scalable application: Chapter 15 presented the CSSE work that sought to identify the breaking point in two ASC legacy applications-Charon and CTH. Their mini-app versions were also employed to complete the task. There is no single breaking point as more than one issue was found with the two codes. The results were that applications can expect to encounter performance issues related to the computing environment, system software, and algorithms. Careful profiling of runtime performance will be needed to identify the source of an issue, in strong combination with knowledge of system software and application source code.


petascale data storage workshop | 2011

Power use of disk subsystems in supercomputers

Matthew L. Curry; H. Lee Ward; Gary Grider; Jill B. Gemmill; Jay Harris; David Martinez

Exascale will present many challenges to the HPC community, but the primary problem will likely be power consumption. Current petascale systems already use a significant fraction of the power that an exascale system will be allotted. In this paper, we show measurements for real I/O power use in three large systems. We show that I/O power use is proportionally fairly low per machine, between 4.4 and 5.5% of the total consumption. We use these measurements to motivate a burst-buffer checkpointing solution for power-efficient I/O at exascale. We estimated this solution to use approximately 6.6% of the exascale machine power budget, which is on par with todays systems.


Archive | 2015

Motivation and Design of the Sirocco Storage System Version 1.0.

Matthew L. Curry; H. Lee Ward; Geoffrey Charles Danielson

Sirocco is a massively parallel, high performance storage system for the exascale era. It emphasizes client-to-client coordination, low server-side coupling, and free data movement to improve resilience and performance. Its architecture is inspired by peer-to-peer and victim- cache architectures. By leveraging these ideas, Sirocco natively supports several media types, including RAM, flash, disk, and archival storage, with automatic migration between levels. Sirocco also includes storage interfaces and support that are more advanced than typical block storage. Sirocco enables clients to efficiently use key-value storage or block-based storage with the same interface. It also provides several levels of transactional data updates within a single storage command, including full ACID-compliant updates. This transaction support extends to updating several objects within a single transaction. Further support is provided for con- currency control, enabling greater performance for workloads while providing safe concurrent modification. By pioneering these and other technologies and techniques in the storage system, Sirocco is poised to fulfill a need for a massively scalable, write-optimized storage system for exascale systems. This is version 1.0 of a document reflecting the current and planned state of Sirocco. Further versions of this document will be accessible at http://www.cs.sandia.gov/Scalable_IO/ sirocco .


petascale data storage workshop | 2013

Fourier-assisted machine learning of hard disk drive access time models

Adam Crume; Carlos Maltzahn; Lee Ward; Thomas M. Kroeger; Matthew L. Curry; Ron A. Oldfield

Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. Others have created behavioral models of hard disk drive performance, but none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We show how hard disk drive access times can be predicted to within 0:83 ms using a neural net after these frequencies are found using Fourier analysis.


ACM Transactions on Storage | 2014

A Lightweight Data Location Service for Nondeterministic Exascale Storage Systems

Zhiwei Sun; Anthony Skjellum; Lee Ward; Matthew L. Curry

In this article, we present LWDLS, a lightweight data location service designed for Exascale storage systems (storage systems with order of 1018 bytes) and geo-distributed storage systems (large storage systems with physically distributed locations). LWDLS provides a search-based data location solution, and enables free data placement, movement, and replication. In LWDLS, probe and prune protocols are introduced that reduce topology mismatch, and a heuristic flooding search algorithm (HFS) is presented that achieves higher search efficiency than pure flooding search while having comparable search speed and coverage to the pure flooding search. LWDLS is lightweight and scalable in terms of incorporating low overhead, high search efficiency, no global state, and avoiding periodic messages. LWDLS is fully distributed and can be used in nondeterministic storage systems and in deterministic storage systems to deal with cases where search is needed. Extensive simulations modeling large-scale High Performance Computing (HPC) storage environments provide representative performance outcomes. Performance is evaluated by metrics including search scope, search efficiency, and average neighbor distance. Results show that LWDLS is able to locate data efficiently with low cost of state maintenance in arbitrary network environments. Through these simulations, we demonstrate the effectiveness of protocols and search algorithm of LWDLS.


ieee international conference on high performance computing, data, and analytics | 2016

An Overview of the Sirocco Parallel Storage System

Matthew L. Curry; H. Lee Ward; Geoff Danielson; Jay F. Lofstead

Sirocco is a massively parallel, high performance storage system that breaks from the classical Zebra-style file system design paradigm. Its architecture is inspired by peer-to-peer and victim-cache architectures, and emphasizes client-to-client coordination, low server-side coupling, and free data movement and placement. By leveraging these ideas, Sirocco natively supports automatic migration between several media types, including RAM, flash, disk, and archival storage.


ieee conference on mass storage systems and technologies | 2014

Automatic generation of behavioral hard disk drive access time models

Adam Crume; Carlos Maltzahn; Lee Ward; Thomas M. Kroeger; Matthew L. Curry

Predicting access times is a crucial part of predicting hard disk drive performance. Existing approaches use white-box modeling and require intimate knowledge of the internal layout of the drive, which can take months to extract. Automatically learning this behavior is a much more desirable approach, requiring less expert knowledge, fewer assumptions, and less time. While previous research has created black-box models of hard disk drive performance, none have shown low per-request errors. A barrier to machine learning of access times has been the existence of periodic behavior with high, unknown frequencies. We identify these high frequencies with Fourier analysis and include them explicitly as input to the model. In this paper we focus on the simulation of access times for random read workloads within a single zone. We are able to automatically generate and tune request-level access time models with mean absolute error less than 0.15 ms. To our knowledge this is the first time such a fidelity has been achieved with modern disk drives using machine learning. We are confident that our approach forms the core for automatic generation of access time models that include other workloads and span across entire disk drives, but more work remains.

Collaboration


Dive into the Matthew L. Curry's collaboration.

Top Co-Authors

Avatar

H. Lee Ward

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Anthony Skjellum

University of Alabama at Birmingham

View shared research outputs
Top Co-Authors

Avatar

Ron A. Oldfield

University of Texas at El Paso

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adam Crume

University of California

View shared research outputs
Top Co-Authors

Avatar

Lee Ward

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Ruth Klundt

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Patrick M. Widener

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Thomas M. Kroeger

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Geoff Danielson

Sandia National Laboratories

View shared research outputs
Researchain Logo
Decentralizing Knowledge