Is this you? Create Your Porfile

Feiyi Wang

Oak Ridge National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Feiyi Wang is active.

Explore More

Publication

Featured researches published by Feiyi Wang.

international symposium on performance analysis of systems and software | 2011

A semi-preemptive garbage collector for solid state drives

Junghee Lee; Young-Jae Kim; Galen M. Shipman; H Sarp Oral; Feiyi Wang; Jongman Kim

NAND flash memory is a preferred storage media for various platforms ranging from embedded systems to enterprise-scale systems. Flash devices do not have any mechanical moving parts and provide low-latency access. They also require less power compared to rotating media. Unlike hard disks, flash devices use out-of-update operations and they require a garbage collection (GC) process to reclaim invalid pages to create free blocks. This GC process is a major cause of performance degradation when running concurrently with other I/O operations as internal bandwidth is consumed to reclaim these invalid pages. The invocation of the GC process is generally governed by a low watermark on free blocks and other internal device metrics that different workloads meet at different intervals. This results in I/O performance that is highly dependent on workload characteristics. In this paper, we examine the GC process and propose a semi-preemptive GC scheme that can preempt on-going GC processing and service pending I/O requests in the queue. Moreover, we further enhance flash performance by pipelining internal GC operations and merge them with pending I/O requests whenever possible. Our experimental evaluation of this semi-preemptive GC sheme with realistic workloads demonstrate both improved performance and reduced performance variability. Write-dominant workloads show up to a 66.56% improvement in average response time with a 83.30% reduced variance in response time compared to the non-preemptive GC scheme.

Future Generation Computer Systems | 2014

The Earth System Grid Federation: An open infrastructure for access to distributed geospatial data

Luca Cinquini; Daniel J. Crichton; Chris A. Mattmann; John Harney; Galen M. Shipman; Feiyi Wang; Rachana Ananthakrishnan; Neill Miller; Sebastian Denvil; Mark Morgan; Zed Pobre; Gavin M. Bell; Charles Doutriaux; Robert S. Drach; Dean N. Williams; Philip Kershaw; Stephen Pascoe; Estanislao Gonzalez; Sandro Fiore; Roland Schweitzer

Abstract The Earth System Grid Federation (ESGF) is a multi-agency, international collaboration that aims at developing the software infrastructure needed to facilitate and empower the study of climate change on a global scale. The ESGF’s architecture employs a system of geographically distributed peer nodes, which are independently administered yet united by the adoption of common federation protocols and application programming interfaces (APIs). The cornerstones of its interoperability are the peer-to-peer messaging that is continuously exchanged among all nodes in the federation; a shared architecture and API for search and discovery; and a security infrastructure based on industry standards (OpenID, SSL, GSI and SAML). The ESGF software stack integrates custom components (for data publishing, searching, user interface, security and messaging), developed collaboratively by the team, with popular application engines (Tomcat, Solr) available from the open source community. The full ESGF infrastructure has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the entire Fifth Coupled Model Intercomparison Project (CMIP5) output used by the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) and a suite of satellite observations (obs4MIPs) and reanalysis data sets (ANA4MIPs). This paper presents ESGF as a successful example of integration of disparate open source technologies into a cohesive, wide functional system, and describes our experience in building and operating a distributed and federated infrastructure to serve the needs of the global climate science community.

ieee international conference on high performance computing data and analytics | 2014

Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems

Sarp Oral; James A Simmons; Jason J Hill; Dustin B Leverman; Feiyi Wang; Matt Ezell; Ross Miller; Douglas Fuller; Raghul Gunasekaran; Young-Jae Kim; Saurabh Gupta; Devesh Tiwari; Sudharshan S. Vazhkudai; James H. Rogers; David A Dillow; Galen M. Shipman; Arthur S. Bland

The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.

IEEE Transactions on Computers | 2014

Coordinating Garbage Collectionfor Arrays of Solid-State Drives

Young-Jae Kim; Junghee Lee; H Sarp Oral; David A Dillow; Feiyi Wang; Galen M. Shipman

Although solid-state drives (SSDs) offer significant performance improvements over hard disk drives (HDDs) for a number of workloads, they can exhibit substantial variance in request latency and throughput as a result of garbage collection (GC). When GC conflicts with an I/O stream, the stream can make no forward progress until the GC cycle completes. GC cycles are scheduled by logic internal to the SSD based on several factors such as the pattern, frequency, and volume of write requests. When SSDs are used in a RAID with currently available technology, the lack of coordination of the SSD-local GC cycles amplifies this performance variance. We propose a global garbage collection (GGC) mechanism to improve response times and reduce performance variability for a RAID of SSDs. We include a high-level design of SSD-aware RAID controller and GGC-capable SSD devices and algorithms to coordinate the GGC cycles. We develop reactive and proactive GC coordination algorithms and evaluate their I/O performance and block erase counts for various workloads. Our simulations show that GC coordination by a reactive scheme improves average response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write-dominated workloads, response time was improved by 69 percent and performance variability was reduced by 71 percent. We show that a proactive GC coordination algorithm can further improve the I/O response times by up to 9 percent and the performance variability by up to 15 percent. We also observe that it could increase the lifetimes of SSDs with some workloads (e.g., Financial) by reducing the number of block erase counts by up to 79 percent relative to a reactive algorithm for write-dominant enterprise workloads.

ieee conference on mass storage systems and technologies | 2011

Harmonia: A globally coordinated garbage collector for arrays of Solid-State Drives

Young-Jae Kim; H Sarp Oral; Galen M. Shipman; Junghee Lee; David A Dillow; Feiyi Wang

Solid-State Drives (SSDs) offer significant performance improvements over hard disk drives (HDD) on a number of workloads. The frequency of garbage collection (GC) activity is directly correlated with the pattern, frequency, and volume of write requests, and scheduling of GC is controlled by logic internal to the SSD. SSDs can exhibit significant performance degradations when garbage collection (GC) conflicts with an ongoing I/O request stream. When using SSDs in a RAID array, the lack of coordination of the local GC processes amplifies these performance degradations. No RAID controller or SSD available today has the technology to overcome this limitation. This paper presents Harmonia, a Global Garbage Collection (GGC) mechanism to improve response times and reduce performance variability for a RAID array of SSDs. Our proposal includes a high-level design of SSD-aware RAID controller and GGC-capable SSD devices, as well as algorithms to coordinate the global GC cycles. Our simulations show that this design improves response time and reduces performance variability for a wide variety of enterprise workloads. For bursty, write dominant workloads response time was improved by 69% while performance variability was reduced by 71%.

petascale data storage workshop | 2013

Asynchronous object storage with QoS for scientific and commercial big data

Michael J. Brim; David A Dillow; Sarp Oral; Bradley W. Settlemyer; Feiyi Wang

This paper presents our design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload domains are used to motivate the key abstractions used in the application programming interface (API). The architecture of the Scalable Object Store (SOS), a prototype object storage system that supports the APIs facilities, is presented. The SOS serves as a vehicle for future research into scalable and resilient big data object storage. We briefly review our research into providing efficient storage servers capable of providing quality of service (QoS) contracts relevant for big data use cases.

petascale data storage workshop | 2015

Comparative I/O workload characterization of two leadership class storage clusters

Raghul Gunasekaran; Sarp Oral; Jason J Hill; Ross Miller; Feiyi Wang; Dustin B Leverman

The Oak Ridge Leadership Computing Facility (OLCF) is a leader in large-scale parallel file system development, design, deployment and continuous operation. For the last decade, the OLCF has designed and deployed two large center-wide parallel file systems. The first instantiation, Spider 1, served the Jaguar supercomputer and its predecessor, Spider 2, now serves the Titan supercomputer, among many other OLCF computational resources. The OLCF has been rigorously collecting file and storage system statistics from these Spider systems since their transition to production state. In this paper we present the collected I/O workload statistics from the Spider 2 system and compare it to the Spider 1 data. Our analysis show that the Spider 2 workload is more more write-heavy I/O compared to Spider 1 (75% vs. 60%, respectively). The data also show the OLCF storage policies such as periodic purges are effectively managing the capacity resource of Spider 2. Furthermore, due to improvements in tdm_multipath and ib_srp software, we are utilizing the Spider 2 system bandwidth and latency resources more effectively. The Spider 2 bandwidth usage statistics shows that our system is working within the design specifications. However, it is also evident that our scientific applications can be more effectively served by a burst buffer storage layer. All the data has been collected by monitoring tools developed for the Spider ecosystem. We believe the observed data set and insights will help us better design the next-generation Spider file and storage system. It will also be helpful to the larger community for building more effective large-scale file and storage systems.

petascale data storage workshop | 2013

Performance and scalability evaluation of the Ceph parallel file system

Feiyi Wang; Mark Nelson; Sarp Oral; Scott Atchley; Sage A. Weil; Bradley W. Settlemyer; Blake A Caldwell; Jason J Hill

Ceph is an emerging open-source parallel distributed file and storage system. By design, Ceph leverages unreliable commodity storage and network hardware, and provides reliability and fault-tolerance via controlled object placement and data replication. This paper presents our file and block I/O performance and scalability evaluation of Ceph for scientific high-performance computing (HPC) environments. Our work makes two unique contributions. First, our evaluation is performed under a realistic setup for a large-scale capability HPC environment using a commercial high-end storage system. Second, our path of investigation, tuning efforts, and findings made direct contributions to Cephs development and improved code quality, scalability, and performance. These changes should benefit both Ceph and the HPC community at large.

international conference on parallel and distributed systems | 2014

Improving large-scale storage system performance via topology-aware and balanced data placement

Feiyi Wang; Sarp Oral; Saurabh Gupta; Devesh Tiwari; Sudharshan S. Vazhkudai

With the advent of big data, the I/O subsystems of large-scale compute clusters are becoming a center of focus. More applications are putting greater demands on end-to-end I/O performance. These subsystems are often complex in design. They comprise of multiple hardware and software layers to cope with the increasing capacity, capability, and scalability requirements of data intensive applications. However, the sharing nature of storage resources and the intrinsic interactions across these layers make it a great challenge to realize end-to-end performance gains. This paper proposes a topology-aware strategy to balance the load across resources, to improve the per-application I/O performance. We demonstrate the effectiveness of our algorithm on an extreme-scale compute cluster, Titan, at the Oak Ridge Leadership Computing Facility (OLCF). Our experiments with both synthetic benchmarks and a real-world application show that, even under congestion, our proposed algorithm can improve large-scale application I/O performance significantly, resulting in both a reduction in application run time as well as a higher resolution of simulation run.

ieee conference on mass storage systems and technologies | 2014

SSD-Optimized Workload Placement with Adaptive Learning and Classification in HPC Environments

Lipeng Wan; Zheng Lu; Qing Cao; Feiyi Wang; H Sarp Oral; Bradley W. Settlemyer

In recent years, non-volatile memory devices such as SSD drives have emerged as a viable storage solution due to their increasing capacity and decreasing cost. Due to the unique capability and capacity requirements in large scale HPC (High Performance Computing) storage environment, a hybrid configuration (SSD and HDD) may represent one of the most available and balanced solutions considering the cost and performance. Under this setting, effective data placement as well as movement with controlled overhead become a pressing challenge. In this paper, we propose an integrated object placement and movement framework and adaptive learning algorithms to address these issues. Specifically, we present a method that shuffle data objects across storage tiers to optimize the data access performance. The method also integrates an adaptive learning algorithm where realtime classification is employed to predict the popularity of data object accesses, so that they can be placed on, or migrate between SSD or HDD drives in the most efficient manner. We discuss preliminary results based on this approach using a simulator we developed to show that the proposed methods can dynamically adapt storage placements and access pattern as workloads evolve to achieve the best system level performance such as throughput.

Explore More