Is this you? Create Your Porfile

Sarp Oral

Oak Ridge National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarp Oral is active.

Explore More

Publication

Featured researches published by Sarp Oral.

ieee international conference on high performance computing data and analytics | 2014

Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems

Sarp Oral; James A Simmons; Jason J Hill; Dustin B Leverman; Feiyi Wang; Matt Ezell; Ross Miller; Douglas Fuller; Raghul Gunasekaran; Young-Jae Kim; Saurabh Gupta; Devesh Tiwari; Sudharshan S. Vazhkudai; James H. Rogers; David A Dillow; Galen M. Shipman; Arthur S. Bland

The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.

petascale data storage workshop | 2013

Asynchronous object storage with QoS for scientific and commercial big data

Michael J. Brim; David A Dillow; Sarp Oral; Bradley W. Settlemyer; Feiyi Wang

This paper presents our design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload domains are used to motivate the key abstractions used in the application programming interface (API). The architecture of the Scalable Object Store (SOS), a prototype object storage system that supports the APIs facilities, is presented. The SOS serves as a vehicle for future research into scalable and resilient big data object storage. We briefly review our research into providing efficient storage servers capable of providing quality of service (QoS) contracts relevant for big data use cases.

petascale data storage workshop | 2015

Comparative I/O workload characterization of two leadership class storage clusters

Raghul Gunasekaran; Sarp Oral; Jason J Hill; Ross Miller; Feiyi Wang; Dustin B Leverman

The Oak Ridge Leadership Computing Facility (OLCF) is a leader in large-scale parallel file system development, design, deployment and continuous operation. For the last decade, the OLCF has designed and deployed two large center-wide parallel file systems. The first instantiation, Spider 1, served the Jaguar supercomputer and its predecessor, Spider 2, now serves the Titan supercomputer, among many other OLCF computational resources. The OLCF has been rigorously collecting file and storage system statistics from these Spider systems since their transition to production state. In this paper we present the collected I/O workload statistics from the Spider 2 system and compare it to the Spider 1 data. Our analysis show that the Spider 2 workload is more more write-heavy I/O compared to Spider 1 (75% vs. 60%, respectively). The data also show the OLCF storage policies such as periodic purges are effectively managing the capacity resource of Spider 2. Furthermore, due to improvements in tdm_multipath and ib_srp software, we are utilizing the Spider 2 system bandwidth and latency resources more effectively. The Spider 2 bandwidth usage statistics shows that our system is working within the design specifications. However, it is also evident that our scientific applications can be more effectively served by a burst buffer storage layer. All the data has been collected by monitoring tools developed for the Spider ecosystem. We believe the observed data set and insights will help us better design the next-generation Spider file and storage system. It will also be helpful to the larger community for building more effective large-scale file and storage systems.

petascale data storage workshop | 2013

Performance and scalability evaluation of the Ceph parallel file system

Feiyi Wang; Mark Nelson; Sarp Oral; Scott Atchley; Sage A. Weil; Bradley W. Settlemyer; Blake A Caldwell; Jason J Hill

Ceph is an emerging open-source parallel distributed file and storage system. By design, Ceph leverages unreliable commodity storage and network hardware, and provides reliability and fault-tolerance via controlled object placement and data replication. This paper presents our file and block I/O performance and scalability evaluation of Ceph for scientific high-performance computing (HPC) environments. Our work makes two unique contributions. First, our evaluation is performed under a realistic setup for a large-scale capability HPC environment using a commercial high-end storage system. Second, our path of investigation, tuning efforts, and findings made direct contributions to Cephs development and improved code quality, scalability, and performance. These changes should benefit both Ceph and the HPC community at large.

high performance distributed computing | 2017

Predicting Output Performance of a Petascale Supercomputer

Bing Xie; Yezhou Huang; Jeffrey S. Chase; Jong Youl Choi; Scott Klasky; Jay F. Lofstead; Sarp Oral

In this paper, we develop a predictive model useful for output performance prediction of supercomputer file systems under production load. Our target environment is Titan---the 3rd fastest supercomputer in the world---and its Lustre-based multi-stage write path. We observe from Titan that although output performance is highly variable at small time scales, the mean performance is stable and consistent over typical application run times. Moreover, we find that output performance is non-linearly related to its correlated parameters due to interference and saturation on individual stages on the path. These observations enable us to build a predictive model of expected write times of output patterns and I/O configurations, using feature transformations to capture non-linear relationships. We identify the candidate features based on the structure of the Lustre/Titan write path, and use feature transformation functions to produce a model space with 135,000 candidate models. By searching for the minimal mean square error in this space we identify a good model and show that it is effective.

Journal of Parallel and Distributed Computing | 2017

Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems

Lipeng Wan; Qing Cao; Feiyi Wang; Sarp Oral

Non-volatile devices, such as SSDs, will be an integral part of the deepening storage hierarchy on large-scale HPC systems. These devices can be on the compute nodes as part of a distributed burst buffer service or they can be external. Wherever they are located in the hierarchy, one critical design issue is the SSD endurance under the write-heavy workloads, such as the checkpoint I/O for scientific applications. For these environments, it is widely assumed that checkpoint operations can occur once every 60 min and for each checkpoint step as much as half of the system memory can be written out. Unfortunately, for large-scale HPC applications, the burst buffer SSDs can be worn out much more quickly given the extensive amount of data written at every checkpoint step. One possible solution is to control the amount of data written by reducing the checkpoint frequency. However, a direct effect caused by reduced checkpoint frequency is the increased vulnerability window of system failures and therefore potentially wasted computation time, especially for large-scale compute jobs.In this paper, we propose a new checkpoint placement optimization model which collaboratively utilizes both the burst buffer and the parallel file system to store the checkpoints, with design goals of maximizing computation efficiency while guaranteeing the SSD endurance requirements. Moreover, we present an adaptive algorithm which can dynamically adjust the checkpoint placement based on the systems dynamic runtime characteristics and continuously optimize the burst buffer utilization. The evaluation results show that by using our adaptive checkpoint placement algorithm we can guarantee the burst buffer endurance with at most 5% performance degradation per application and less than 3% for the entire system. A thorough analysis of both failure patterns and runtime characteristics of HPC systems.A new checkpoint placement model for optimizing large-scale hierarchical storage systems usage.A novel adaptive algorithm that can dynamically optimize the checkpoint placement.

Concurrency and Computation: Practice and Experience | 2018

Are we witnessing the spectre of an HPC meltdown?: Are We Witnessing the Spectre of an HPC Meltdown?

Verónica G. Vergara Larrea; Michael J. Brim; Wayne Joubert; Swen Boehm; Matthew B. Baker; Oscar R. Hernandez; Sarp Oral; James A Simmons; Don Maxwell

We measure and analyze the performance observed when running applications and benchmarks before and after the Meltdown and Spectre fixes have been applied to the Cray supercomputers and supporting systems at the Oak Ridge Leadership Computing Facility (OLCF). Of particular interest is the effect of these fixes on applications selected from the OLCF portfolio when running at scale. This comprehensive study presents results from experiments run on Titan, Eos, Cumulus, and Percival supercomputers at the OLCF. The results from this study are useful for HPC users running on Cray supercomputers and serve to better understand the impact that these two vulnerabilities have on diverse HPC workloads at scale.

ieee international conference on high performance computing data and analytics | 2017

GUIDE: a scalable information directory service to collect, federate, and analyze logs for operational insights into a leadership HPC facility

Sudharshan S. Vazhkudai; Ross Miller; Devesh Tiwari; Christopher Zimmer; Feiyi Wang; Sarp Oral; Raghul Gunasekaran; Deryl Steinert

In this paper, we describe the GUIDE framework used to collect, federate, and analyze log data from the Oak Ridge Leadership Computing Facility (OLCF), and how we use that data to derive insights into facility operations. We collect system logs and extract monitoring data at every level of the various OLCF subsystems, and have developed a suite of pre-processing tools to make the raw data consumable. The cleansed logs are then ingested and federated into a central, scalable data warehouse, Splunk, that offers storage, indexing, querying, and visualization capabilities. We have further developed and deployed a set of tools to analyze these multiple disparate log streams in concert and derive operational insights. We describe our experience from developing and deploying the GUIDE infrastructure, and deriving valuable insights on the various subsystems, based on two years of operations in the production OLCF environment.

Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems | 2017

Diving into petascale production file systems through large scale profiling and analysis

Feiyi Wang; Hyogi Sim; Cameron Harr; Sarp Oral

As leadership computing facilities grow their storage capacity into the multi- petabyte range, the number of files and directories leap into the scale of billions. A complete profiling of such a parallel file system in a production environment presents a unique challenge. On one hand, the time, resources, and negative performance impact on production users can make regular profiling difficult. On the other hand, the result of such profiling can yield much needed understanding of the file systems general characteristics, as well as provide insight to how users write and access their data on a grand scale. This paper presents a lightweight and scalable profiling solution that can efficiently walk, analyze, and profile multi-petabyte parallel file systems. This tool has been deployed and is in regular use on very large-scale production parallel file systems at both Oak Ridge National Labs Oak Ridge Leadership Facility (OLCF) and Lawrence Livermore National Labs Livermore Computing (LC) facilities. We present the results of our initial analysis on the data collected from these two large-scale production systems, organized into three use cases: (1) file system snapshot and composition, (2) striping pattern analysis for Lustre, and (3) simulated storage capacity utilization in preparation for future file systems. Our analysis shows that on the OLCF file system, over 96% of user files exhibit the default stripe width, potentially limiting performance on large files by underutilizing storage servers and disks. Our simulated block analysis quantitatively shows the space overhead when doing a forklift system migration. It also reveals that due to the difference in system compositions (OLCF vs. LC), we can achieve better performance and space trade-offs by employing different native file system block sizes.

symposium on computer architecture and high performance computing | 2016

Using Balanced Data Placement to Address I/O Contention in Production Environments

Sarah Neuwirth; Feiyi Wang; Sarp Oral; Sudharshan S. Vazhkudai; James H. Rogers; Ulrich Bruening

Designed for capacity and capability, HPC I/O systems are inherently complex and shared among multiple, concurrent jobs competing for resources. Lack of centralized coordination and control often render the end-to-end I/O paths vulnerable to load imbalance and contention. With the emergence of data-intensive HPC applications, storage systems are further contended for performance and scalability. This paper proposes to unify two key approaches to tackle the imbalanced use of I/O resources and to achieve an end-to-end I/O performance improvement in the most transparent way. First, it utilizes a topology-aware, Balanced Placement I/O method (BPIO) for mitigating resource contention. Second, it takes advantage of the platform-neutral ADIOS middleware, which provides a flexible I/O mechanism for scientific applications. By integrating BPIO with ADIOS, referred to as Aequilibro, we obtain an end-to-end and per job I/O performance improvement for ADIOS-enabled HPC applications without requiring any code changes. Aequilibro can be applied to almost any HPC platform and is mostly suitable for systems that lack a centralized file system resource manager. We demonstrate the effectiveness of our integration on the Titan system at the Oak Ridge National Laboratory. Our experiments with a synthetic benchmark and real-world HPC workload show that, even in a noisy production environment, Aequilibro can improve large-scale application performance significantly.

Explore More