H Sarp Oral | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where H Sarp Oral is active.

Explore More

Publication

Featured researches published by H Sarp Oral.

international symposium on performance analysis of systems and software | 2011

A semi-preemptive garbage collector for solid state drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Feiyi Wang; Jongman Kim

NAND flash memory is a preferred storage media for various platforms ranging from embedded systems to enterprise-scale systems. Flash devices do not have any mechanical moving parts and provide low-latency access. They also require less power compared to rotating media. Unlike hard disks, flash devices use out-of-update operations and they require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semi-preemptive GC scheme that can preempt on-going GC processing and service pending I/O requests in the queue. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-preemptive GC sheme with realistic workloads demonstrate both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-preemptive GC scheme.

ieee international conference on high performance computing data and analytics | 2012

Characterizing output bottlenecks in a supercomputer

Bing Xie; Jeffrey S. Chase; David A Dillow; Oleg Drokin; Scott Klasky; H Sarp Oral; Norbert Podhorszki

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2013

Preemptible I/O Scheduling of Garbage Collection for Solid State Drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Jongman Kim

Unlike hard disks, flash devices use out-of-place updates operations and require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in an I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semipreemptible GC (PGC) scheme that allows GC processing to be preempted while pending I/O requests in the queue are serviced. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-PGC scheme with realistic workloads demonstrates both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-PGC scheme. In addition, we explore opportunities of a new NAND flash device that supports suspend/resume commands for read, write, and erase operations for fully PGC (F-PGC). Our experiments with an F-PGC enabled flash device show that request response time can be improved by up to 14.57% compared to semi-PGC.

european conference on parallel processing | 2008

Empirical Analysis of a Large-Scale Hierarchical Storage System

Weikuan Yu; H Sarp Oral; R. Shane Canon; Jeffrey S. Vetter; Ramanan Sankaran

To prepare for future peta- or exa-scale computing, it is important to gain a good understanding on what impacts a hierarchical storage system would have on the performance of data-intensive applications, and accordingly, how to leverage its strengths and mitigate possible risks. To this aim, this paper adopts a user-level perspective to empirically reveal the implications of storage organization to parallel programs running on Jaguar at the Oak Ridge National Laboratory. We first describe the hierarchical configuration of Jaguars storage system. Then we evaluate the performance of individual storage components. In addition, we examine the scalability of metadata- and data-intensive benchmarks over Jaguar. We have discovered that the file distribution pattern can impact the aggregated I/O bandwidth. Based on our analysis, we have demonstrated that it is possible to improve the scalability of a representative application S3D by as much as 15%.

international conference on cluster computing | 2015

TRIO: Burst Buffer Based I/O Orchestration

Teng Wang; H Sarp Oral; Michael Pritchard; Bin Wang; Weikuan Yu

The growing computing power on leadership HPC systems is often accompanied by ever-escalating failure rates. Checkpointing is a common defensive mechanism used by scientific applications for failure recovery. However, directly writing the large and bursty checkpointing dataset to parallel file systems can incur significant I/O contention on storage servers. Such contention in turn degrades bandwidth utilization of storage servers and prolongs the average job I/O time of concurrent applications. Recently burst buffers have been proposed as an intermediate layer to absorb the bursty I/O traffic from compute nodes to storage backend. But an I/O orchestration mechanism is still desirable to efficiently move checkpointing data from burst buffers to storage backend. In this paper, we propose a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers. Meanwhile, TRIO coordinates the flushing orders among concurrent burst buffers to alleviate the contention on storage server. Our experimental results demonstrated that TRIO could efficiently utilize storage bandwidth and reduce the average job I/O time by 37% on average for data-intensive applications in typical checkpointing scenarios.

IEEE Transactions on Computers | 2014

Coordinating Garbage Collectionfor Arrays of Solid-State Drives

Young-Jae Kim; Junghee Lee; H Sarp Oral; David A Dillow; Feiyi Wang; Galen M. Shipman

Although solid-state drives (SSDs) offer significant performance improvements over hard disk drives (HDDs) for a number of workloads, they can exhibit substantial variance in request latency and throughput as a result of garbage collection (GC). When GC conflicts with an I/O stream, the stream can make no forward progress until the GC cycle completes. GC cycles are scheduled by logic internal to the SSD based on several factors such as the pattern, frequency, and volume of write requests. When SSDs are used in a RAID with currently available technology, the lack of coordination of the SSD-local GC cycles amplifies this performance variance. We propose a global garbage collection (GGC) mechanism to improve response times and reduce performance variability for a RAID of SSDs. We include a high-level design of SSD-aware RAID controller and GGC-capable SSD devices and algorithms to coordinate the GGC cycles. We develop reactive and proactive GC coordination algorithms and evaluate their I/O performance and block erase counts for various workloads. Our simulations show that GC coordination by a reactive scheme improves average response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write-dominated workloads, response time was improved by 69 percent and performance variability was reduced by 71 percent. We show that a proactive GC coordination algorithm can further improve the I/O response times by up to 9 percent and the performance variability by up to 15 percent. We also observe that it could increase the lifetimes of SSDs with some workloads (e.g., Financial) by reducing the number of block erase counts by up to 79 percent relative to a reactive algorithm for write-dominant enterprise workloads.

international performance computing and communications conference | 2011

Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems

David A Dillow; Galen M. Shipman; H Sarp Oral; Zhe Zhang; Young-Jae Kim

As storage systems get larger to meet the demands of petascale systems, careful planning must be applied to avoid congestion points and extract the maximum performance. In addition, the large data sets generated by such systems makes it desirable for all compute resources to have common access to this data without needing to copy it to each machine. This paper describes a method of placing I/O close to the storage nodes to minimize contention on Crays SeaStar2+ network, and extends it to a routed Lustre configuration to gain the same benefits when running against a center-wide file system. Our experiments using half of the resources of Spider — the center-wide file system at the Oak Ridge Leadership Computing Facility — show that I/O write bandwidth can be improved by up to 45% (from 71.9 to 104 GB/s) for a direct-attached configuration and by 137% (47.6 GB/s to 115 GB/s) for a routed configuration. We demonstrated up to 20.7% reduction in run-time for production scientific applications. With the full Spider system, we demonstrated over 240 GB/s of aggregate bandwidth using our techniques.

ieee conference on mass storage systems and technologies | 2011

Harmonia: A globally coordinated garbage collector for arrays of Solid-State Drives

Young-Jae Kim; H Sarp Oral; Galen M. Shipman; Junghee Lee; David A Dillow; Feiyi Wang

Solid-State Drives (SSDs) offer significant performance improvements over hard disk drives (HDD) on a number of workloads. The frequency of garbage collection (GC) activity is directly correlated with the pattern, frequency, and volume of write requests, and scheduling of GC is controlled by logic internal to the SSD. SSDs can exhibit significant performance degradations when garbage collection (GC) conflicts with an ongoing I/O request stream. When using SSDs in a RAID array, the lack of coordination of the local GC processes amplifies these performance degradations. No RAID controller or SSD available today has the technology to overcome this limitation. This paper presents Harmonia, a Global Garbage Collection (GGC) mechanism to improve response times and reduce performance variability for a RAID array of SSDs. Our proposal includes a high-level design of SSD-aware RAID controller and GGC-capable SSD devices, as well as algorithms to coordinate the global GC cycles. Our simulations show that this design improves response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write dominant workloads response time was improved by 69% while performance variability was reduced by 71%.

ieee conference on mass storage systems and technologies | 2014

SSD-Optimized Workload Placement with Adaptive Learning and Classification in HPC Environments

Lipeng Wan; Zheng Lu; Qing Cao; Feiyi Wang; H Sarp Oral; Bradley W. Settlemyer

In recent years, non-volatile memory devices such as SSD drives have emerged as a viable storage solution due to their increasing capacity and decreasing cost. Due to the unique capability and capacity requirements in large scale HPC (High Performance Computing) storage environment, a hybrid configuration (SSD and HDD) may represent one of the most available and balanced solutions considering the cost and performance. Under this setting, effective data placement as well as movement with controlled overhead become a pressing challenge. In this paper, we propose an integrated object placement and movement framework and adaptive learning algorithms to address these issues. Specifically, we present a method that shuffle data objects across storage tiers to optimize the data access performance. The method also integrates an adaptive learning algorithm where realtime classification is employed to predict the popularity of data object accesses, so that they can be placed on, or migrate between SSD or HDD drives in the most efficient manner. We discuss preliminary results based on this approach using a simulator we developed to show that the proposed methods can dynamically adapt storage placements and access pattern as workloads evolve to achieve the best system level performance such as throughput.

Archive | 2014

A Report on Simulation-Driven Reliability and Failure Analysis of Large-Scale Storage Systems

Lipeng Wan; Feiyi Wang; H Sarp Oral; Sudharshan S. Vazhkudai; Qing Cao

High-performance computing (HPC) storage systems provide data availability and reliability using various hardware and software fault tolerance techniques. Usually, reliability and availability are calculated at the subsystem or component level using limited metrics such as, mean time to failure (MTTF) or mean time to data loss (MTTDL). This often means settling on simple and disconnected failure models (such as exponential failure rate) to achieve tractable and close-formed solutions. However, such models have been shown to be insufficient in assessing end-to-end storage system reliability and availability. We propose a generic simulation framework aimed at analyzing the reliability and availability of storage systems at scale, and investigating what-if scenarios. The framework is designed for an end-to-end storage system, accommodating the various components and subsystems, their interconnections, failure patterns and propagation, and performs dependency analysis to capture a wide-range of failure cases. We evaluate the framework against a large-scale storage system that is in production and analyze its failure projections toward and beyond the end of lifecycle. We also examine the potential operational impact by studying how different types of components affect the overall system reliability and availability, and present the preliminary results

Explore More