Is this you? Create Your Porfile

Guanying Wu

Virginia Commonwealth University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guanying Wu is active.

Explore More

Publication

Featured researches published by Guanying Wu.

european conference on computer systems | 2012

Delta-FTL: improving SSD lifetime via exploiting content locality

Guanying Wu; Xubin He

NAND flash-based SSDs suffer from limited lifetime due to the fact that NAND flash can only be programmed or erased for limited times. Among various approaches to address this problem, we propose to reduce the number of writes to the flash via exploiting the content locality between the write data and its corresponding old version in the flash. This content locality means, the new version, i.e., the content of a new write request, shares some extent of similarity with its old version. The information redundancy existing in the difference (delta) between the new and old data leads to a small compression ratio. The key idea of our approach, named Delta-FTL (Delta Flash Translation Layer), is to store this compressed delta in the SSD, instead of the original new data, in order to reduce the number of writes committed to the flash. This write reduction further extends the lifetime of SSDs due to less frequent garbage collection process, which is a significant write amplification factor in SSDs. Experimental results based on our Delta-FTL prototype show that Delta-FTL can significantly reduce the number of writes and garbage collection operations and thus improve SSD lifetime at a cost of trivial overhead on read latency performance.

dependable systems and networks | 2011

HDP code: A Horizontal-Diagonal Parity Code to Optimize I/O load balancing in RAID-6

Chentao Wu; Xubin He; Guanying Wu; Shenggang Wan; Xiaohua Liu; Qiang Cao; Changsheng Xie

With higher reliability requirements in clusters and data centers, RAID-6 has gained popularity due to its capability to tolerate concurrent failures of any two disks, which has been shown to be of increasing importance in large scale storage systems. Among various implementations of erasure codes in RAID-6, a typical set of codes known as Maximum Distance Separable (MDS) codes aim to offer data protection against disk failures with optimal storage efficiency. However, because of the limitation of horizontal parity or diagonal/anti-diagonal parities used in MDS codes, storage systems based on RAID-6 suffers from unbalanced I/O and thus low performance and reliability. To address this issue, in this paper, we propose a new parity called Horizontal-Diagonal Parity (HDP), which takes advantages of both horizontal and diagonal/anti-diagonal parities. The corresponding MDS code, called HDP code, distributes parity elements uniformly in each disk to balance the I/O workloads. HDP also achieves high reliability via speeding up the recovery under single or double disk failure. Our analysis shows that HDP provides better balanced I/O and higher reliability compared to other popular MDS codes.

ieee conference on mass storage systems and technologies | 2010

BPAC: An adaptive write buffer management scheme for flash-based Solid State Drives

Guanying Wu; Benjamin Eckart; Xubin He

Solid State Drives (SSDs) have shown promise to be a candidate to replace traditional hard disk drives, but due to certain physical characteristics of NAND flash, there are some challenging areas of improvement and further research. We focus on the layout and management of the small amount of RAM that serves as a cache between the SSD and the system that uses it. Of the techniques that have previously been proposed to manage this cache, we identify several sources of inefficient cache space management due to the way pages are clustered in blocks and the limited replacement policy. We develop a hybrid page/block architecture along with an advanced replacement policy, called BPAC, or Block-Page Adaptive Cache, to exploit both temporal and spatial locality. Our technique involves adaptively partitioning the SSD on-disk cache to separately hold pages with high temporal locality in a page list and clusters of pages with low temporal but high spatial locality in a block list. We run trace-driven simulations to verify our design and find that it outperforms other popular flash-aware cache schemes under different workloads.

ACM Transactions on Storage | 2012

An adaptive write buffer management scheme for flash-based SSDs

Guanying Wu; Xubin He; Benjamin Eckart

Solid State Drives (SSDs) have shown promise to be a candidate to replace traditional hard disk drives. The benefits of SSDs over HDDs include better durability, higher performance, and lower power consumption, but due to certain physical characteristics of NAND flash, which comprise SSDs, there are some challenging areas of improvement and further research. We focus on the layout and management of the small amount of RAM that serves as a cache between the SSD and the system that uses it. Of the techniques that have previously been proposed to manage this cache, we identify several sources of inefficient cache space management due to the way pages are clustered in blocks and the limited replacement policy. We find that in many traces hot pages reside in otherwise cold blocks, and that the spatial locality of most clusters can be fully exploited in a limited time period, so we develop a hybrid page/block architecture along with an advanced replacement policy, called BPAC, or Block-Page Adaptive Cache, to exploit both temporal and spatial locality. Our technique involves adaptively partitioning the SSD on-disk cache to separately hold pages with high temporal locality in a page list and clusters of pages with low temporal but high spatial locality in a block list. In addition, we have developed a novel mechanism for flash-based SSDs to characterize the spatial locality of the disk I/O workload and an approach to dynamically identify the set of low spatial locality clusters. We run trace-driven simulations to verify our design and find that it outperforms other popular flash-aware cache schemes under different workloads. For instance, compared to a popular flash aware cache algorithm BPLRU, BPAC reduces the number of cache evictions by up to 79.6% and 34% on average.

modeling, analysis, and simulation on computer and telecommunication systems | 2010

DiffECC: Improving SSD Read Performance Using Differentiated Error Correction Coding Schemes

Guanying Wu; Xubin He; Ningde Xie; Tong Zhang

This paper presents a cross-layer co-design approach to reduce SSD read response latency. The key is to cohesively exploit the NAND ???ash memory device write speed vs. raw storage reliability trade-off at the physical layer and run-time data access workload variation at the system level. Leveraging run-time data access workload variation, we can opportunistically slow down NAND ???ash memory write speed and hence improve NAND ???ash memory raw storage reliability. This naturally enables an opportunistic use of weaker error correction schemes that can directly reduce SSD read access latency. We develop a disk-level scheduling scheme to effectively smooth the write workload in order to maximize the occurrence of run-time opportunistic NAND ???ash memory write slow down. Using 2 bits/cell NAND ???ash memory with BCH-based error correction correction as a test vehicle, we carry out extensive simulations over various workloads and demonstrate that this developed cross-layer co-design solution can reduce the average SSD read latency by up to 96%.

european conference on computer systems | 2014

An aggressive worn-out flash block management scheme to alleviate SSD performance degradation

Ping Huang; Guanying Wu; Xubin He; Weijun Xiao

Since NAND flash cannot be updated in place, SSDs must perform all writes in pre-erased pages. Consequently, pages containing superseded data must be invalidated and garbage collected. This garbage collection adds significant cost in terms of the extra writes necessary to relocate valid pages from erasure candidates to clean blocks, causing the well-known write amplification problem. SSDs reserve a certain amount of flash space which is invisible to users, called over-provisioning space, to alleviate the write amplification problem. However, NAND blocks can support only a limited number of program/erase cycles. As blocks are retired due to exceeding the limit, the reduced size of the over-provisioning pool leads to degraded SSD performance. In this work, we propose a novel system design that we call the Smart Retirement FTL (SR-FTL) to reuse the flash blocks which have been cycled to the maximum specified P/E endurance. We take advantage of the fact that the specified P/E limit guarantees data retention time of at least one year while most active data becomes stale in a period much shorter than one year, as observed in a variety of disk workloads. Our approach aggressively manages worn blocks to store data that requires only short retention time. In the meantime, the data reliability on worn blocks is carefully guaranteed. We evaluate the SR-FTL by both simulation on an SSD simulator and prototype implementation on an OpenSSD platform. Experimental results show that the SR-FTL successfully maintains consistent over-provisioning space levels as blocks wear and thus the degree of SSD performance degradation near end-of-life. In addition, we show that our scheme reduces block wear near end-of-life by as much as 84% in some scenarios.

ACM Transactions on Design Automation of Electronic Systems | 2013

Exploiting workload dynamics to improve SSD read latency via differentiated error correction codes

Guanying Wu; Xubin He; Ningde Xie; Tong Zhang

This article presents a cross-layer codesign approach to reduce SSD read response latency. The key is to cohesively exploit the NAND flash memory device write speed vs. raw storage reliability trade-off at the physical layer and runtime data access workload dynamics at the system level. Leveraging runtime data access workload variation, we can opportunistically slow down NAND flash memory write speed and hence improve NAND flash memory raw storage reliability. This naturally enables an opportunistic use of weaker error correction schemes that can directly reduce SSD read access latency. We develop a disk-level scheduling scheme to effectively smooth the write workload in order to maximize the occurrence of runtime opportunistic NAND flash memory write slowdown. Using 2 bits/cell NAND flash memory with BCH-based error correction correction as a test vehicle, we carry out extensive simulations over various workloads and demonstrate that this developed cross-layer co-design solution can reduce the average SSD read latency by up to 59.4% without sacrificing the write throughput performance.

southeastern symposium on system theory | 2009

Evaluation of the impacts of data hot-spots in disk arrays on performance and availability

Jeremy Langston; Guanying Wu; Xubin He

This paper investigates the impacts of data hot spots on performance and system availability of disk arrays. We perform experiments on RAID 5 to show quantitatively how hot-spots affect the performance in terms of I/O throughput and availability in terms of mean time to repair, MTTR. We find that for small access workloads, the performance, and system availability, is dominated by the access pattern (read/write), with negligible change due to increases in hot-spot activity. Workloads with larger accesses see a much larger performance and availability impact with increased hot-spot activity.

Journal of Systems Architecture | 2014

Reducing SSD access latency via NAND flash program and erase suspension

Guanying Wu; Ping Huang; Xubin He

Abstract In NAND flash memory, once a page program or block erase (P/E) command is issued to a NAND flash chip, the subsequent read requests have to wait until the time-consuming P/E operation to complete. Preliminary results show that the lengthy P/E operations increase the read latency by 2× on average. This increased read latency caused by the contention may significantly degrade the overall system performance. Inspired by the internal mechanism of NAND flash P/E algorithms, we propose in this paper a low-overhead P/E suspension scheme, which suspends the on-going P/E to service pending reads and resumes the suspended P/E afterwards. Having reads enjoy the highest priority, we further extend our approach by making writes be able to preempt the erase operations in order to improve the write latency performance. In our experiments, we simulate a realistic SSD model that adopts multi-chip/channel and evaluate both SLC and MLC NAND flash as storage materials of diverse performance. Experimental results show the proposed technique achieves a near-optimal performance on servicing read requests. The write latency is significantly reduced as well. Specifically, the read latency is reduced on average by 46.5% compared to RPS (Read Priority Scheduling) and when using write–suspend–erase the write latency is reduced by 13.6% relative to FIFO.

file and storage technologies | 2012