Is this you? Create Your Porfile

Yuchong Hu

Huazhong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuchong Hu is active.

Explore More

Publication

Featured researches published by Yuchong Hu.

IEEE Transactions on Information Theory | 2013

Cooperative Regenerating Codes

Kenneth W. Shum; Yuchong Hu

One of the design objectives in distributed storage system is the minimization of the data traffic during the repair of failed storage nodes. By repairing multiple failures simultaneously and cooperatively rather than successively and independently, further reduction of repair traffic is made possible. A closed-form expression of the optimal tradeoff between the repair traffic and the amount of storage in each node for cooperative repair is given. We show that the points on the tradeoff curve can be achieved by linear cooperative regenerating codes, with an explicit bound on the required finite-field size. The proof relies on a max-flow-min-cut-type theorem from combinatorial optimization for submodular flows. Two families of explicit constructions are given.

IEEE Transactions on Computers | 2014

NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds

Henry C. H. Chen; Yuchong Hu; Patrick P. C. Lee; Yang Tang

To provide fault tolerance for cloud storage, recent studies propose to stripe data across multiple cloud vendors. However, if a cloud suffers from a permanent failure and loses all its data, we need to repair the lost data with the help of the other surviving clouds to preserve data redundancy. We present a proxy-based storage system for fault-tolerant multiple-cloud storage called NCCloud, which achieves cost-effective repair for a permanent single-cloud failure. NCCloud is built on top of a network-coding-based storage scheme called the functional minimum-storage regenerating (FMSR) codes, which maintain the same fault tolerance and data redundancy as in traditional erasure codes (e.g., RAID-6), but use less repair traffic and, hence, incur less monetary cost due to data transfer. One key design feature of our FMSR codes is that we relax the encoding requirement of storage nodes during repair, while preserving the benefits of network coding in repair. We implement a proof-of-concept prototype of NCCloud and deploy it atop both local and commercial clouds. We validate that FMSR codes provide significant monetary cost savings in repair over RAID-6 codes, while having comparable response time performance in normal cloud storage operations such as upload/download.

international symposium on information theory | 2011

Exact minimum-repair-bandwidth cooperative regenerating codes for distributed storage systems

Kenneth W. Shum; Yuchong Hu

In order to provide high data reliability, distributed storage systems disperse data with redundancy to multiple storage nodes. Regenerating codes is a new class of erasure codes to introduce redundancy for the purpose of improving the data repair performance in distributed storage. Most of the studies on regenerating codes focus on the single-failure recovery, but it is not uncommon to see two or more node failures at the same time in large storage networks. To exploit the opportunity of repairing multiple failed nodes simultaneously, a cooperative repair mechanism, in the sense that the nodes to be repaired can exchange data among themselves, is investigated. A lower bound on the repair-bandwidth for cooperative repair is derived and a construction of a family of exact cooperative regenerating codes matching this lower bound is presented.

2011 International Symposium on Networking Coding | 2011

NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System

Yuchong Hu; Chiu-man Yu; Yan Kit Li; Patrick P. C. Lee; John C. S. Lui

An emerging application of network coding is to improve the robustness of distributed storage. Recent theoretical work has shown that a class of regenerating codes, which are based on the concept of network coding, can improve the data repair performance over traditional storage schemes such as erasure coding. However, there remain open issues regarding the feasibility of deploying regenerating codes in practical storage systems. We present NCFS, a distributed file system that realizes regenerating codes under real network settings. NCFS transparently stripes data across multiple storage nodes, without requiring the storage nodes to coordinate among themselves. It adopts a layered design that allows extensibility, such that different storage schemes can be readily included into NCFS. We deploy and evaluate our NCFS prototype in different real network settings. In particular, we use NCFS to conduct an empirical study of different storage schemes, including the traditional erasure codes RAID-5 and RAID-6, and a special family of regenerating codes that are based on E-MBR [16]. Our work provides a practical and extensible platform for realizing theories of regenerating codes in distributed file systems.

ieee conference on mass storage systems and technologies | 2012

On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice

Yunfeng Zhu; Patrick P. C. Lee; Yuchong Hu; Liping Xiang; Yinlong Xu

Modern storage systems stripe redundant data across multiple disks to provide availability guarantees against disk failures. One form of data redundancy is based on XOR-based erasure codes, which use only XOR operations for encoding and decoding. In addition to providing failure tolerance, a storage system must also provide fast failure recovery to avoid data unavailability. We consider the problem of speeding up the recovery of a single-disk failure for arbitrary XOR-based erasure codes. We address this problem from both theoretical and practical perspectives. We propose a replace recovery algorithm, which uses a hill-climbing technique to search for a fast recovery solution, such that the solution search can be completed within a short time period. We further implement our replace recovery algorithm atop a parallelized architecture to justify its practicality. We experiment our replace recovery algorithm and its parallelized implementation on a networked storage system testbed, and demonstrate that our replace recovery algorithm uses less recovery time than the conventional approach.

international conference on computer communications | 2013

Analysis and construction of functional regenerating codes with uncoded repair for distributed storage systems

Yuchong Hu; Patrick P. C. Lee; Kenneth W. Shum

Modern distributed storage systems apply redundancy coding techniques to stored data. One form of redundancy is based on regenerating codes, which can minimize the repair bandwidth, i.e., the amount of data transferred when repairing a failed storage node. Existing regenerating codes mainly require surviving storage nodes encode data during repair. In this paper, we study functional minimum storage regenerating (FMSR) codes, which enable uncoded repair without the encoding requirement in surviving nodes, while preserving the minimum repair bandwidth guarantees and also minimizing disk reads. Under double-fault tolerance settings, we formally prove the existence of FMSR codes, and provide a deterministic FMSR code construction that can significantly speed up the repair process. We further implement and evaluate our deterministic FMSR codes to show the benefits. Our work is built atop a practical cloud storage system that implements FMSR codes, and we provide theoretical validation to justify the practicality of FMSR codes.

international symposium on information theory | 2012

Functional-repair-by-transfer regenerating codes

Kenneth W. Shum; Yuchong Hu

In a distributed storage system, a data file is distributed to several storage nodes, such that the original file can be decoded from any subset of the storage nodes of size larger than or equal to a certain threshold. Upon the failure of a storage node, we would like to regenerate it with minimal amount of data transmissions from the surviving nodes to the new node. This performance metric is called the repair-bandwidth. Another performance metric is the disk input/output (I/O) cost, which measures the number of bits a storage node needs to read out from its memory in order to repair the failed node. In this paper, we give examples of linear regenerating codes with minimal disk I/O cost and repair-bandwidth, without any linear mixing in the helping storage nodes.

dependable systems and networks | 2014

Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters

Runhui Li; Patrick P. C. Lee; Yuchong Hu

We have witnessed an increasing adoption of erasure coding in modern clustered storage systems to reduce the storage overhead of traditional 3-way replication. However, it remains an open issue of how to customize the data analytics paradigm for erasure-coded storage, especially when the storage system operates in failure mode. We propose degraded-first scheduling, a new MapReduce scheduling scheme that improves MapReduce performance in erasure-coded clustered storage systems in failure mode. Its main idea is to launch degraded tasks earlier so as to leverage the unused network resources. We conduct mathematical analysis and discrete event simulation to show the performance gain of degraded-first scheduling over Hadoops default locality-first scheduling. We further implement degraded-first scheduling on Hadoop and conduct test bed experiments in a 13-node cluster. We show that degraded-first scheduling reduces the MapReduce runtime of locality-first scheduling.

2011 International Symposium on Networking Coding | 2011

Existence of Minimum-Repair-Bandwidth Cooperative Regenerating Codes

Kenneth W. Shum; Yuchong Hu

In distributed storage systems, a new class of fault-tolerant codes, called regenerating codes, was introduced in order to minimize the traffic required in repairing a failed storage node. Studies of regenerating codes in the literature mainly focus on repairing a single-node failure. Nevertheless, multiple-node failure is common in real systems. In this paper, we consider the problem of regenerating multiple failed nodes simultaneously and cooperatively. We give a lower bound of the repair- bandwidth under cooperative repair. The tightness of this lower bound is proved by constructing a flow in the information flow graph which matches this lower bound. Based on the construction of flow, we prove the existence of linear regenerating codes with repair-bandwidth equal to the lower bound, with an explicit bound on the required finite field size.

dependable systems and networks | 2015

Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems

Runhui Li; Yuchong Hu; Patrick P. C. Lee

To balance performance and storage efficiency, modern clustered file systems often first store data with replication, followed by encoding the replicated data with erasure coding. We argue that the commonly used random replication does not take into account erasure coding in its design, thereby raising both performance and availability issues in the subsequent encoding operation. We propose encoding-aware replication, which carefully places the replicas so as to (i) eliminate cross-rack downloads of data blocks during the encoding operation, (ii) preserve availability without data relocation after the encoding operation, and (iii) maintain load balancing across replicas as in random replication before the encoding operation. We conduct extensive HDFS-based testbed experiments and discrete-event simulations, and demonstrate the performance gains of encoding-aware replication over random replication.

Explore More