Suzhen Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suzhen Wu is active.

Explore More

Publication

Featured researches published by Suzhen Wu.

international parallel and distributed processing symposium | 2010

HPDA: A hybrid parity-based disk array for enhanced performance and reliability

Bo Mao; Hong Jiang; Dan Feng; Suzhen Wu; Jianxi Chen; Lingfang Zeng; Lei Tian

A single flash-based Solid State Drive (SSD) can not satisfy the capacity, performance and reliability requirements of a modern storage system supporting increasingly demanding data-intensive computing applications. Applying RAID schemes to SSDs to meet these requirements, while a logical and viable solution, faces many challenges. In this paper, we propose a Hybrid Parity-based Disk Array architecture, HPDA, which combines a group of SSDs and two hard disk drives (HDDs) to improve the performance and reliability of SSD-based storage systems. In HPDA, the SSDs (data disks) and part of one HDD (parity disk) compose a RAID4 disk array. Meanwhile, a second HDD and the free space of the parity disk are mirrored to form a RAID1-style write buffer that temporarily absorbs the small write requests and acts as a surrogate set during recovery when a disk fails. The write data is reclaimed back to the data disks during the lightly loaded or idle periods of the system. Reliability analysis shows that the reliability of HPDA, in terms of MTTDL (Mean Time To Data Loss), is better than that of either pure HDD-based or SSD-based disk array. Our prototype implementation of HPDA and performance evaluations show that HPDA significantly outperforms either HDD-based or SSD-based disk array.

modeling, analysis, and simulation on computer and telecommunication systems | 2008

GRAID: A Green RAID Storage Architecture with Improved Energy Efficiency and Reliability

Bo Mao; Dan Feng; Hong Jiang; Suzhen Wu; Jianxi Chen; Lingfang Zeng

Existing power-aware optimization schemes for disk-array systems tend to strike a delicate balance between energy consumption and performance while ignoring reliability. To achieve a reasonably good trade-off among these three important design objectives in this paper we introduce an energy efficient disk array architecture, called a Green RAID (or GRAID), which extends the data mirroring redundancy of RAID 10 by incorporating a dedicated log disk. The goal of GRAID is to significantly improve energy efficiency or reliability of existing RAID-based systems without noticeably sacrificing their reliability or energy efficiency. The main idea behind GRAID is to update the mirroring disks only periodically while storing all updates since the last mirror-disk update in a log disk, thus being able to spin down all the mirroring disks (or half of the total disks) most of the time to a lower power mode to save energy without sacrificing reliability. Reliability analysis shows that the reliability of GRAID, in terms of MTTDL (Mean Time To Data Loss), is only slightly worse than RAID 10. On the other hand, our prototype implementation of GRAID and performance evaluation show that GRAIDs energy efficiency is significantly better than that of RAID 10 by up to 32.1% and an average of 25.4%.

networking architecture and storages | 2012

SAR: SSD Assisted Restore Optimization for Deduplication-Based Storage Systems in the Cloud

Bo Mao; Hong Jiang; Suzhen Wu; Yinjin Fu; Lei Tian

The explosive growth of digital content results in enormous strains on the storage systems in the cloud environment. The data deduplication technology has been demonstrated to be very effective in shortening the backup window and saving the network bandwidth and storage space in cloud backup, archiving and primary storage systems such as VM platforms. However, the delay and power consumption of the restore operations from a deduplicated storage can be significantly higher than those without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in non-sequential locations on HDDs after deduplication, which can cause a subsequent read operation to invoke many HDD I/O requests involving multiple disk seeks. To address this problem, in this paper we propose SAR, an SSD Assisted Restore scheme, that effectively exploits the high random-read performance and low power-consumption properties of SSDs and the unique data sharing characteristic of deduplication-based storage system by storing in SSDs the unique data chunks with high reference count, small size and non-sequential characteristics. In this way, many critical random-read requests to HDDs are replaced by read requests to SSDs, thus significantly improving the system performance and energy efficiency. The extensive trace-driven and VM restore evaluations on the prototype implementation of SAR show that SAR outperforms the traditional deduplication-based schemes significantly, in terms of both restore performance and energy efficiency.

international parallel and distributed processing symposium | 2014

POD: Performance Oriented I/O Deduplication for Primary Storage Systems in the Cloud

Bo Mao; Hong Jiang; Suzhen Wu; Lei Tian

Recent studies have shown that moderate to high data redundancy clearly exists in primary storage systems in the Cloud. Our experimental studies reveal that data redundancy exhibits a much higher level of intensity on the I/O path than that on disks due to the relatively high temporal access locality associated with small I/O requests to redundant data. On the other hand, we also observe that directly applying data deduplication to primary storage systems in the Cloud will likely cause space contention in memory and data fragmentation on disks. Based on these observations, we propose a Performance-Oriented I/O Deduplication approach, called POD, rather than a capacity-oriented I/O deduplication approach, represented by iDedup, to improve the I/O performance of primary storage systems in the Cloud without sacrificing capacity savings of the latter. The salient feature of POD is its focus on not only the capacity-sensitive large writes and files, as in iDedup, but also the performance-sensitive while capacity-insensitive small writes and files. The experiments conducted on our lightweight prototype implementation of POD show that POD significantly outperforms iDedup in the I/O performance measure by up to 87.9% with an average of 58.8%. Moreover, our evaluation results also show that POD achieves comparable or better capacity savings than iDedup.

international conference on computer design | 2015

Exploiting request characteristics and internal parallelism to improve SSD performance

Bo Mao; Suzhen Wu

In this paper, we propose a new I/O scheduler for SSDs, called Amphibian, which exploits the up-level request characteristics and the low-level internal parallelism of flash chips to improve the performance of SSD-based storage systems. Amphibian includes two parts: the size-based request ordering that gives higher priority to first processing the small requests and the Garbage Collection (GC) aware request dispatching that avoids issuing requests to the flash chips that are in the GC state. By first processing the small requests and avoiding issuing the GC-conflict requests in the I/O waiting queue, the average waiting times of the requests are reduced significantly. The extensive evaluation results show that compared with existing I/O schedulers, Amphibian improves the throughput and the average response times significantly. Consequently, the I/O performance of the SSD-based storage systems is improved.

international conference on parallel and distributed systems | 2009

JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems

Suzhen Wu; Dan Feng; Hong Jiang; Bo Mao; Lingfang Zeng; Jianxi Chen

This paper proposes a simple and practical RAID reconstruction optimization scheme, called JOurnal-guided Reconstruction (JOR). JOR exploits the fact that significant portions of data blocks in typical disk arrays are unused. JOR monitors the storage space utilization status at the block level to guide the reconstruction process so that only failed data on the used stripes is recovered to the spare disk. In JOR, data consistency is ensured by the requirement that all blocks in a disk array be initialized to zero (written with value zero) during synchronization while all blocks in the spare disk also be initialized to zero in the background. JOR can be easily incorporated into any existing reconstruction approach to optimize it, because the former is independent of and orthogonal to the latter. Experimental results obtained from our JOR prototype implementation demonstrate that JOR reduces reconstruction times of two state-of-the-art reconstruction schemes by an amount that is approximately proportional to the percentage of unused storage space while ensuring data consistency.

IEEE Transactions on Computers | 2015

Proactive Data Migration for Improved Storage Availability in Large-Scale Data Centers

Suzhen Wu; Hong Jiang; Bo Mao

In face of high partial and complete disk failure rates and untimely system crashes, the executions of low-priority background tasks become increasingly frequent in large-scale data centers. However, the existing algorithms are all reactive optimizations and only exploit the temporal locality of workloads to reduce the user I/O requests during the low-priority background tasks. To address the problem, this paper proposes Intelligent Data Outsourcing (IDO), a zone-based and proactive data migration optimization, to significantly improve the efficiency of the low-priority background tasks. The main idea of IDO is to proactively identify the hot data zones of RAID-structured storage systems in the normal operational state. By leveraging the prediction tools to identify the upcoming events, IDO proactively migrates the data blocks belonging to the hot data zones on the degraded device to a surrogate RAID set in the large-scale data centers. Upon a disk failure or crash reboot, most user I/O requests addressed to the degraded RAID set can be serviced directly by the surrogate RAID set rather than the much slower degraded RAID set. Consequently, the performance of the background tasks and user I/O performance during the background tasks are improved simultaneously. Our lightweight prototype implementation of IDO and extensive trace-driven experiments on two case studies demonstrate that, compared with the existing state-of-the-art approaches, IDO effectively improves the performance of the low-priority background tasks. Moreover, IDO is portable and can be easily incorporated into any existing algorithms for RAID-structured storage systems.

IEEE Transactions on Computers | 2011

Improving Availability of RAID-Structured Storage Systems by Workload Outsourcing

Suzhen Wu; Hong Jiang; Dan Feng; Lei Tian; Bo Mao

Due to the contention for the shared disk bandwidth, the user I/O intensity can significantly impact the performance of the online low-priority background tasks, thus reducing the reliability and availability of RAID-structured storage systems. In this paper, we propose a novel and practical scheme, called WorkOut (I/O Workload Outsourcing), to significantly boost the performance of those low-priority background tasks. WorkOut effectively outsources all write requests and popular read requests originally targeted at the degraded RAID set that is performing the low-priority background tasks to a surrogate RAID set. The lightweight prototype implementation of WorkOut and extensive trace-driven and benchmark-driven experiments on two case studies demonstrate that, compared with existing approaches, WorkOut effectively improves the performance of the low-priority background tasks, such as RAID reconstruction and RAID resynchronization. Importantly, WorkOut is portable and can be easily incorporated into any existing optimizing algorithms for RAID-structured storage systems.

international parallel and distributed processing symposium | 2015

Improving Storage Availability in Cloud-of-Clouds with Hybrid Redundant Data Distribution

Bo Mao; Suzhen Wu; Hong Jiang

With the increasing utilization and popularity of the cloud infrastructure, more and more data are moved to the cloud storage systems. This makes the availability of cloud storage services critically important, particularly given the fact that outages of cloud storage services have indeed happened from time to time. Thus, solely depending on a single cloud storage provider for storage services can risk violating the service-level agreement (SLA) due to the weakening of service availability. This has led to the notion of Cloud-of-Clouds, where data redundancy is introduced to distribute data among multiple independent cloud storage providers, to address the problem. The key in the effectiveness of the Cloud-of-Clouds approaches lies in how the data redundancy is incorporated and distributed among the clouds. However, the existing Cloud-of-Clouds approaches utilize either replication or erasure codes to redundantly distribute data across multiple clouds, thus incurring either high space or high performance overheads. In this paper, we propose a hybrid redundant data distribution approach, called HyRD, to improve the cloud storage availability in Cloud-of-Clouds by exploiting the workload characteristics and the diversity of cloud providers. In HyRD, large files are distributed in multiple cost-efficient cloud storage providers with erasure-coded data redundancy while small files and file system metadata are replicated on multiple high-performance cloud storage providers. The experiments conducted on our lightweight prototype implementation of HyRD show that HyRD improves the cost efficiency by 33.4% and 20.4%, and reduces the access latency by 58.7% and 34.8% than the DuraCloud and RACS schemes, respectively.

Future Generation Computer Systems | 2017

DAC: Improving storage availability with Deduplication-Assisted Cloud-of-Clouds

Suzhen Wu; Kuan-Ching Li; Bo Mao; Minghong Liao

Abstract With the increasing popularity and rapid development of the cloud storage technology, more and more users are beginning to upload their data to the cloud storage platform. However, solely depending on a particular cloud storage provider has a number of potentially serious problems, such as vendor lock-in, availability and security. To address these problems, we propose a Deduplication-Assisted primary storage system in Cloud-of-Clouds (short for DAC) in this paper. DAC eliminates the redundant data blocks in the cloud computing environment and distributes the data among multiple independent cloud storage providers by exploiting the data reference characteristics. In DAC, the data blocks are stored in multiple cloud storage providers by combing the replication and erasure code schemes. To better utilize the advantages of both replication and erasure code schemes and exploit the reference characteristics in data deduplication, the high referenced data blocks are stored with the replication scheme while the other data blocks are stored with the erasure code scheme. The experiments conducted on our lightweight prototype implementation show that DAC improves the performance and cost efficiency significantly, compared with the existing schemes.

Explore More