[PDF] Optimal Construction of Regenerating Code through Rate-matching in Hostile Networks

Abstract

Regenerating code is a class of code very suitable for distributed storage systems, which can maintain optimal bandwidth and storage space. Two types of important regenerating code have been constructed: the minimum storage regeneration (MSR) code and the minimum bandwidth regeneration (MBR) code. However, in hostile networks where adversaries can compromise storage nodes, the storage capacity of the network can be significantly affected. In this paper, we propose two optimal constructions of regenerating codes through rate-matching that can combat against this kind of adversaries in hostile networks: 2-layer rate-matched regenerating code and m -layer rate-matched regenerating code. For the 2-layer code, we can achieve the optimal storage efficiency for given system requirements. Our comprehensive analysis shows that our code can detect and correct malicious nodes with higher storage efficiency compared to the universally resilient regenerating code which is a straightforward extension of regenerating code with error detection and correction capability. Then we propose the m -layer code by extending the 2-layer code and achieve the optimal error correction efficiency by matching the code rate of each layer's regenerating code. We also demonstrate that the optimized parameter can achieve the maximum storage capacity under the same constraint. Compared to the universally resilient regenerating code, our code can achieve much higher error correction efficiency.

Full PDF

11 Optimal Construction of Regenerating Codethrough Rate-matching in Hostile Networks

Jian Li Tongtong Li Jian Ren

Abstract

Regenerating code is a class of code very suitable for distributed storage systems, which can maintainoptimal bandwidth and storage space. Two types of important regenerating code have been constructed:the minimum storage regeneration (MSR) code and the minimum bandwidth regeneration (MBR) code.However, in hostile networks where adversaries can compromise storage nodes, the storage capacityof the network can be signiﬁcantly affected. In this paper, we propose two optimal constructions ofregenerating codes through rate-matching that can combat against this kind of adversaries in hostilenetworks: 2-layer rate-matched regenerating code and m -layer rate-matched regenerating code. Forthe 2-layer code, we can achieve the optimal storage efﬁciency for given system requirements. Ourcomprehensive analysis shows that our code can detect and correct malicious nodes with higher storageefﬁciency compared to the universally resilient regenerating code which is a straightforward extensionof regenerating code with error detection and correction capability. Then we propose the m -layer codeby extending the 2-layer code and achieve the optimal error correction efﬁciency by matching the coderate of each layer’s regenerating code. We also demonstrate that the optimized parameter can achieve themaximum storage capacity under the same constraint. Compared to the universally resilient regeneratingcode, our code can achieve much higher error correction efﬁciency. Index Terms

Optimal regenerating code, MDS code, error-correction, adversary.

I. I

NTRODUCTION

Distributed storage is a popular method to store ﬁles securely without requiring data encryp-tion. Instead of storing a ﬁle and its replications in multiple servers, we can break the ﬁle into

The authors are with the Department of ECE, Michigan State University, East Lansing, MI 48824-1226. Email: { lijian6,tongli, renjian } @msu.edu November 4, 2015 DRAFT a r X i v : . [ c s . I T ] N ov components and store the components into multiple servers. In this way, both the reliability andthe security of the ﬁle can be increased. A typical approach is to encode the ﬁle using an ( n, k ) Reed-Solomon (RS) code and distribute the encoded ﬁle into n servers. When we need to recoverthe ﬁle, we only need to collect the encoded parts from k servers, which achieves a trade-offbetween reliability and efﬁciency. However, when repairing or regenerating the contents of afailed node, the whole ﬁle has to be recovered ﬁrst, which is a waste of bandwidth.The concept of regenerating code was introduced in [1], where a replacement node is allowedto connect to some individual nodes directly and regenerate a substitute of the failed node, insteadof ﬁrst recovering the original data then regenerating the failed component. Compared to the RScode, regenerating code achieves an optimal tradeoff between bandwidth and storage within theminimum storage regeneration (MSR) and the minimum bandwidth regeneration (MBR) points.However, when malicious behaviors exist in the network, both the regeneration of the failednode or the reconstruction of the original ﬁle will fail. The error resilience of the Reed-Solomencode based regenerating code in the network with errors and erasures was analyzed in [2]. Inour previous work, a Hermitian code based regenerating code was proposed to provide bettererror correction capability compared to the Reed-Solomen code based approach.Inspired by the nice performance of Hermitian code based regenerating codes, in this paper westep forward to further construct optimal regenerating codes which have similar layered structurelike Hermitian code in distributed storage. The main contributions of this paper are: • We propose an optimal construction of 2-layer rate-matched regenerating code. Both theoret-ical analysis and performance evaluation show that this code can achieve storage efﬁciencyhigher than the universally resilient regenerating code proposed in [2]. • We propose an optimal construction of m -layer rate-matched regenerating code. The m -layer code can achieve higher error correction efﬁciency than the code proposed in [2] andthe Hermitian code based regenerating code proposed in [3]. Furthermore, the m -layeredcode is easier to understand and has more ﬂexibility than the Hermitian based code.Here we will focus on error correction and malicious node locating in data regeneration andreconstruction in distributed storage. When no error occurs or no malicious node exists, the dataregeneration and reconstruction can be processed the same as the existing works.It it worth to note that although there are two types of regenerating codes: MSR code andMBR code on the MSR point and MBR point respectively, in this paper we will only focus on November 4, 2015 DRAFT the optimization of the MSR code for the following two reasons:1) The processes and results of the optimization for these two codes are similar. The optimiza-tion for the MSR code can be directly applied to the MBR code with similar optimizationresults.2) The differences between the constructions of MSR code and MBR code have little impacton the optimization proposed in this paper.The rest of this paper is organized as follows: in Section II we introduce the related work.In Section III, the preliminary of this paper is presented. In Section IV, we propose twocomponent codes for the rate-matched regenerating codes. We propose and analyze the 2-layerrate-matched regenerating code in Section V. Then we propose and analyze the m -layer rate-matched regenerating code in Section VI. The paper is concluded in Section VII.II. R ELATED W ORK

When a storage node in the distributed storage network that employing the conventional ( n, k ) RS code (such as OceanStore [4] and Total Recall [5]) fails, the replacement node connects to k nodes and downloads the whole ﬁle to recover the symbols stored in the failed node. Thisapproach is a waste of bandwidth because the whole ﬁle has to be downloaded to recovera fraction of it. To overcome this drawback, Dimakis et al . [1] introduced the conception of { n, k, d, α, β, B } regenerating code based on the network coding. In the context of regeneratingcode, the contents stored in a failed node can be regenerated by the replacement node throughdownloading γ help symbols from d helper nodes. The bandwidth consumption for the failednode regeneration could be far less than the whole ﬁle. A data collector (DC) can reconstructthe original ﬁle stored in the network by downloading α symbols from each of the k storagenodes. In [1], the authors proved that there is a tradeoff between bandwidth γ and per nodestorage α . They found two optimal points: minimum storage regeneration (MSR) and minimumbandwidth regeneration (MBR) points. Currently there are many literatures focusing on theoptimal regenerating codes design: [6]–[17]. In [18], [19] the implementation of the regeneratingcode were studied.The regenerating code can be divided into functional regeneration and exact regeneration. In thefunctional regeneration, the replacement node regenerates a new component that can functionallyreplace the failed component instead of being the same as the original stored component. [20] November 4, 2015 DRAFT formulated the data regeneration as a multicast network coding problem and constructed func-tional regenerating codes. [21] implemented a random linear regenerating codes for distributedstorage systems. [22] proved that by allowing data exchange among the replacement nodes, abetter tradeoff between repair bandwidth γ and per node storage α can be achieved. In theexact regeneration, the replacement node regenerates the exact symbols of a failed node. [23]proposed to reduce the regeneration bandwidth through algebraic alignment. [24] provided a codestructure for exact regeneration using interference alignment technique. [25] presented optimalexact constructions of MBR codes and MSR codes under product-matrix framework. This is theﬁrst work that allows independent selection of the nodes number n in the network.None of these works above considered code regeneration under node corruption or adversarialmanipulation attacks in hostile networks. In fact, all these schemes will fail in both regenerationand reconstruction if there are nodes in the storage cloud sending out incorrect responses to theregeneration and reconstruction requests.In [26], the Byzantine fault tolerance of regenerating codes were studied. In [27], the authorsdiscussed the amount of information that can be safely stored against passive eavesdropping andactive adversarial attacks based on the regeneration structure. In [28], the authors proposed toadd CRC codes in the regenerating code to check the integrity of the data in hostile networks.Unfortunately, the CRC checks can also be manipulated by the malicious nodes, resulting in thefailure of the regeneration and reconstruction. In [29], the authors proposed to add data integrityprotection in distributed storage. In [30], the authors proposed an erasure-coded distributedstorage based on threshold cryptography. In [31], the authors analyzed the veriﬁcation cost forboth the client read and write operation in workloads with idle periods. In [2], the authorsanalyzed the error resilience of the RS code based regenerating code in the network witherrors and erasures. They provided the theoretical error correction capability. In [3] the authorsproposed a Hermitian code based regenerating code, which could provide better error correctioncapability. In [32] the authors proposed the universally secure regenerating code to achieveinformation theoretic data conﬁdentiality. But the extra computational cost and bandwidth haveto be considered for this code. In [33] the authors proposed to apply linear feedback shift register(LFSR) to protect the data conﬁdentiality. November 4, 2015 DRAFT

III. P

RELIMINARY AND A SSUMPTIONS

A. Regenerating Code

Regenerating code introduced in [1] is a linear code over ﬁnite ﬁled F q with a set of parameters { n, k, d, α, β, B } . A ﬁle of size B is stored in n storage nodes, each of which stores α symbols.A replacement node can regenerate the contents of a failed node by downloading β symbolsfrom each of d randomly selected storage nodes. So the total bandwidth needed to regenerate afailed node is γ = dβ . The data collector (DC) can reconstruct the whole ﬁle by downloading α symbols from each of k ≤ d randomly selected storage nodes. In [1], the following theoreticalbound was derived: B ≤ k − (cid:88) i =0 min { α, ( d − i ) β } . (1)From equation (1), a trade-off between the regeneration bandwidth γ and the storage requirement α was derived. γ and α cannot be decreased at the same time. There are two special cases:minimum storage regeneration (MSR) point in which the storage parameter α is minimized; ( α MSR , γ

MSR ) = (cid:18)

Bk , Bdk ( d − k + 1) (cid:19) , (2)and minimum bandwidth regeneration (MBR) point in which the bandwidth γ is minimized: ( α MBR , γ

MBR ) = (cid:18) Bd kd − k + k , Bd kd − k + k (cid:19) . (3) B. System Assumptions and Adversarial Model

In this paper, we assume there is a secure server that is responsible for encoding and dis-tributing the data to storage nodes. Replacement nodes will also be initialized by the secureserver. DC and the secure server can be implemented in the same computer and can never becompromised. We use the notation CH / CL to refer to either the full rate/fractional rate MSR codeor a codeword of the full rate/fractional rate MSR code. The exact meaning can be discriminatedclearly according to the context.We assume some network nodes may be corrupted due to hardware failure or communicationerrors, and/or be compromised by malicious users. As a result, upon request, these nodes maysend out incorrect responses to disrupt the data regeneration and reconstruction. The adversarymodel is the same as [2], We assume that the malicious users can take full control of τ ( τ ≤ n and corresponds to s in [2]) storage nodes and collude to perform attacks. November 4, 2015 DRAFT

We will refer these symbols as bogus symbols without making distinction between the cor-rupted symbols and compromised symbols. We will also use corrupted nodes, malicious nodesand compromised nodes interchangeably without making any distinction.IV. C

OMPONENT C ODES OF R ATE - MATCHED R EGENERATING C ODE

In this section, we will introduce two different component codes for rate-matched MSR codeon the MSR point with d = 2 k − . The code based on the MSR point with d > k − can bederived the same way through truncating operations. In the rate-matched MSR code, there aretwo types of MSR codes with different code rates: full rate code and fractional rate code. A. Full Rate Code1) Encoding:

The full rate code is encoded based on the product-matrix code frameworkin [25]. According to equation (2), we have α H = d/ , β H = 1 for one block of data with thesize B H = ( α + 1) α . The data will be arranged into two α × α symmetric matrices S , S , eachof which will contain B H / data. The codeword CH is deﬁned as CH = [Φ ΛΦ]  S S  = Ψ M H =  ch ... ch n  , (4)where Φ =  . . . g g . . . g α − ... ... ... . . . ... g n − ( g n − ) . . . ( g n − ) α −  (5)is a Vandermonde matrix and Λ = diag [ λ , λ , · · · , λ n ] such that λ i ∈ F q and λ i (cid:54) = λ j for ≤ i, j ≤ n, i (cid:54) = j , g is a primitive element in F q , and any d rows of Ψ are linearly independent.Then each row ch i = ψ i M H ( ≤ i < n ) of the codeword matrix CH will be stored in storagenode i , where the encoding vector ψ i is the i th row of Ψ .

2) Regeneration:

Suppose node z fails, the replacement node z (cid:48) will send regeneration requeststo the rest of n − helper nodes. Upon receiving the regeneration request, helper node i willcalculate and send out the help symbol p i = ch i φ Tz = ψ i M H φ Tz , where φ z is the z th row of Φ . z (cid:48) will perform Algorithm 1 to regenerate the contents of the failed node z . For convenience, November 4, 2015 DRAFT we deﬁne Ψ i → j = (cid:104) ψ Ti , ψ Ti +1 · · · , ψ Tj (cid:105) T , where ψ t is the t th row of Ψ ( i ≤ t ≤ j ) and x ( j ) isthe vector containing the ﬁrst j symbols of M H φ Tz .Suppose p (cid:48) i = p i + e i is the response from the i th helper node. If p i has been modiﬁed by themalicious node i , we have e i ∈ F q \{ } . We can successfully regenerate the symbols in node z when the number of errors in the received help symbols p i (cid:48) from n − helper nodes is lessthan (cid:98) ( n − d − / (cid:99) , where (cid:98)·(cid:99) is the ﬂoor operation. Without loss of generality, we assume ≤ i ≤ n − . Algorithm 1. z (cid:48) regenerates symbols of the failed node z Step 1:

Decode p (cid:48) to p cw , where p (cid:48) = [ p (cid:48) , p (cid:48) , · · · , p (cid:48) n − ] T can be viewed as an MDS code withparameters ( n − , d, n − d ) since Ψ → ( n − · x ( n − = p (cid:48) . Step 2:

Solve Ψ → ( n − · x ( n − = p cw and compute ch z = φ z S + λ z φ z S as described in [25]. Proposition 1.

For regeneration, the full rate code can correct errors from (cid:98) ( n − d − / (cid:99) malicious nodes, where (cid:98)·(cid:99) is the ﬂoor operation.3) Reconstruction: When the DC needs to reconstruct the original ﬁle, it will send reconstruc-tion requests to n storage nodes. Upon receiving the request, node i will send out the symbolvector c i to the DC. Suppose c (cid:48) i = c i + e i is the response from the i th storage node. If c i hasbeen modiﬁed by the malicious node i , we have e i ∈ F αq \{ } .The DC will reconstruct the ﬁle as follows: Let R (cid:48) = [ ch (cid:48) T , ch (cid:48) T , · · · , ch (cid:48) n − T ] T , we have Ψ  S (cid:48) S (cid:48)  = [Φ ΛΦ]  S (cid:48) S (cid:48)  = R (cid:48) , Φ S (cid:48) Φ T + ΛΦ S (cid:48) Φ T = R (cid:48) Φ T . (6)Let C = Φ S (cid:48) Φ T , D = Φ S (cid:48) Φ T , and (cid:98) R (cid:48) = R (cid:48) Φ T , then C + Λ D = (cid:98) R (cid:48) . (7)Since C, D are both symmetric, we can solve the non-diagonal elements of

C, D as follows:  C i,j + λ i · D i,j = (cid:98) R (cid:48) i,j C i,j + λ j · D i,j = (cid:98) R (cid:48) j,i . (8)Because matrices C and D have the same structure, here we only focus on C (corresponding to S (cid:48) ). It is straightforward to see that if node i is malicious and there are errors in the i th row of November 4, 2015 DRAFT R (cid:48) , there will be errors in the i th row of (cid:98) R (cid:48) . Furthermore, there will be errors in the i th row and i th column of C . Deﬁne S (cid:48) Φ T = (cid:98) S (cid:48) , we have Φ (cid:98) S (cid:48) = C . Here we can view each column of C as an ( n − , α, n − α ) MDS code because Φ is a Vandermonde matrix. The length of the codeis n − since the diagonal elements of C is unknown. Suppose node j is a legitimate node,we can decode the MDS code to recover the j th column of C and locate the malicious nodes.Eventually C can be recovered. So the DC can reconstructs S using the method similar to [3],[25], For S , the recovering process is similar. Proposition 2.

For reconstruction, the full rate code can correct errors from (cid:98) ( n − α − / (cid:99) malicious nodes.B. Fractional Rate Code1) Encoding: For the fractional rate code, we also have α L = d/ , β L = 1 for one block ofdata with the size B L =  xd (1 + xd ) / , x ∈ (0 , . α ( α +1) / x − . d (1+( x − . d ) / , x ∈ (0 . , , (9)where x is the match factor of the rate-matched MSR code. It is easy to see that the fractionalrate code will become the full rate code with x = 1 . The data m = [ m , m , . . . , m B L ] ∈ ( F q ) B L will be processed as follows:When x ≤ . , the data will be arranged into a symmetric matrix S of the size α × α : S =  m m . . . m xd . . . m m xd +1 . . . m xd − . . . ... ... . . . ... ... . . . ... m xd m xd − . . . m B L / . . .

00 0 . . . . . . ... ... . . . ... ... . . . ... . . . . . .  . (10)The codeword CL is deﬁned as CL = [Φ ΛΦ]  S  = Ψ M L , (11)where is the α × α zero matrix and Φ , Λ , Ψ are the same as the full rate code. November 4, 2015 DRAFT

When x > . , the ﬁrst α ( α + 1) / data will be arranged into an α × α symmetric matrix S . The rest of the data m α ( α +1) / , . . . , m B L will be arranged into another α × α symmetricmatrix S : S =  m α ( α +1) / . . . m α ( α +1) / xd . . . m α ( α +1) / . . . m α ( α +1) / xd − . . . ... . . . ... ... . . . ... m α ( α +1) / xd . . . m B L / . . . . . . . . . ... . . . ... ... . . . ... . . . . . .  . (12)The codeword CL is deﬁned the same as equation (4) with the same parameters Φ , Λ and Ψ .Then each row cl i ( ≤ i < n ) of the codeword matrix CL will be stored in storage node i respectively, in which the encoding vector ψ i is the i th row of Ψ . Proposition 3.

The fractional rate code can achieve the MSR point in equation (2) since it itencoded under the product-matrix MSR code framework in [25].2) Regeneration:

The regeneration for the fractional rate code is the same as the regenerationfor the full rate code described in Section IV-A2 with only a minor difference. If we deﬁne x ( j ) as the vector containing the ﬁrst j symbols of M L φ Tz , there will be only xd nonzero elements inthe vector. According to Ψ → n − · x ( n − = p (cid:48) , the received symbol vector p (cid:48) for the fractionalrate code in Step 1 of Algorithm 1 can be viewed as an ( n − , xd, n − xd ) MDS code. Since x < , we can detect and correct more errors in data regeneration using the fractional rate codethan using the full rate code. Proposition 4.

For regeneration, the fractional rate code can correct errors from (cid:98) ( n − xd − / (cid:99) malicious nodes.3) Reconstruction: The reconstruction for the fractional rate code is similar to that for thefull rate code described in Section IV-A3. Let R (cid:48) = [ cl (cid:48) T , cl (cid:48) T , · · · , cl (cid:48) Tn − ] T .When the match factor x > . , reconstruction for the fractional rate code is the same to thatfor the full rate code. November 4, 2015 DRAFT0

When x ≤ . , equation (6) can be written as: Φ S (cid:48) = R (cid:48) . (13)So we can view each column of R (cid:48) as an ( n, xd, n − xd + 1) MDS code. After decoding R (cid:48) to R cw , we can recover the data matrix S by solving the equation Φ S = R cw . Meanwhile, if the i th rows of R (cid:48) and R cw are different, we can mark node i as corrupted. Proposition 5.

For reconstruction, when the match factor x > . , the fractional rate codecan correct errors from (cid:98) ( n − α − / (cid:99) malicious nodes. When the match factor x ≤ . , thefractional rate code can correct errors from (cid:98) ( n − xd ) / (cid:99) malicious nodes. V. 2-L

AYER R ATE - MATCHED REGENERATING C ODE

In this section, we will show our ﬁrst optimization of the rate-matched MSR code: 2-layer rate-matched MSR code. In the code design, we utilize two layers of the MSR code: the fractional ratecode for one layer and the full rate code for the other. The purpose of the fractional rate code is tocorrect the erroneous symbols sent by malicious nodes and locate the corresponding maliciousnodes. Then we can treat the errors in the received symbols as erasures when regeneratingwith the full rate code. However, the rates of the two codes must match to achieve an optimalperformance. Here we mainly focus on the rate-matching for data regeneration. We can see inthe later analysis that the performance of data reconstruction can also be improved with thisdesign criterion.We will ﬁrst ﬁx the error correction capabilities of the full rate code and the fractional ratecode. Then we will derive the optimal rate matching criteria to optimize the data storage efﬁciencyunder the ﬁxed error correction capability.

A. Rate Matching

From the analysis above, we know that during regeneration, the fractional rate code can correctup to (cid:98) ( n − xd − / (cid:99) errors, which are more than (cid:98) ( n − d − / (cid:99) errors that the full rate codecan correct. In the 2-layer rate-matched MSR code design, our goal is to match the fractionalrate code with the full rate code. The main task for the fractional rate code is to detect andcorrect errors, while the main task for the full rate code is to maintain the storage efﬁciency.So if the fractional rate code can locate all the malicious nodes, the full rate code can simply November 4, 2015 DRAFT1 treat the symbols received from these malicious nodes as erasures, which requires the minimumredundancy for the full rate code. The full rate code can correct up to n − d − erasures. Thuswe have the following optimal rate-matching equation: (cid:98) ( n − xd − / (cid:99) = n − d − , (14)from which we can derive the match factor x . B. Encoding

To encode a ﬁle with size B F using the 2-layer rate-matched MSR code, the ﬁle will ﬁrst bedivided into θ H blocks of data with the size B H and θ L blocks of data with the size B L , wherethe parameters should satisfy B F = θ H B H + θ L B L . (15)Then the θ H blocks of data will be encoded into code matrices CH , . . . , CH θ H using the fullrate code and the θ L blocks of data will be encoded into code matrices CL , . . . , CL θ L usingthe fractional rate code. To prevent the malicious nodes from corrupting the fractional rate codeonly, the secure server will randomly concatenate all the matrices together to form the ﬁnal n × α ( θ H + θ L ) codeword matrix: CM = [ Perm ( CH , . . . , CH θ H , CL , . . . , CL θ L )] , (16)where Perm ( · ) is the random matrices permutation operation. The secure sever will also recordthe order of the permutation for future code regeneration and reconstruction. Then each row c i = [ Perm ( ch ,i , . . . , ch θ H ,i , cl ,i , . . . , cl θ L ,i )] ( ≤ i ≤ n − ) of the codeword matrix CM willbe stored in storage node i , where ch j,i is the i th row of CH j ( ≤ j ≤ θ H ), and cl j,i is the i th row of CL j ( ≤ j ≤ θ L ). The encoding vector ψ i for storage node i is the i th row of Ψ inequation (4). Therefore, we have the following Theorem. Theorem 1.

The encoding of 2-layer rate-matched MSR code can achieve the MSR point inequation (2) since both the full rate code and the fractional code are MSR codes.

November 4, 2015 DRAFT2

C. Regeneration

Suppose node z fails, the security server will initialize a replacement node z (cid:48) with the orderinformation of the fractional rate code and the full rate code in the 2-layer rate-matched MSRcode. Then the replacement node z (cid:48) will send regeneration requests to the rest of n − helpernodes. Upon receiving the regeneration request, helper node i will calculate and send out thehelp symbol p i = c i φ Tz . z (cid:48) will perform Algorithm 2 to regenerate the contents of the failednode z . After the regeneration is ﬁnished, z (cid:48) will erase the order information. So even if z (cid:48) wascompromised later, the adversary would not get the permutation order of the fractional rate codeand the full rate code. Algorithm 2. z (cid:48) regenerates symbols of the failed node z for the 2-layer rate-matched MSRcode Step 1:

According to the order information, regenerate all the symbols related to the θ L datablocks encoded by the fractional rate code, using Algorithm 1. If errors are detected inthe symbols sent by node i , it will be marked as a malicious node. Step 2:

Regenerate all the symbols related to the θ H data blocks encoded by the full rate code,using Algorithm 1. During the regeneration, all the symbols sent from nodes marked asmalicious nodes will be replaced by erasures (cid:78) .It is easy to see that Algorithm 2 can correct errors and locate malicious node using thefractional rate code while achieve high storage efﬁciency using the full rate code. We summarizethe result as the following Theorem. Theorem 2.

For regeneration, the 2-layer rate-matched MSR code can correct errors from (cid:98) ( n − xd − / (cid:99) malicious nodes.D. Parameters Optimization We have the following design requirements for a given distributed storage system applyingthe 2-layer rate-matched MSR code: • The maximum number of malicious nodes M that the system can detect and locate usingthe fractional rate code. We have (cid:98) ( n − xd − / (cid:99) = M. (17) November 4, 2015 DRAFT3 • The probability P det that the system can detect all the malicious nodes. The detection willbe successful if each malicious node modiﬁes at least one help symbol corresponding tothe fractional rate code and sends it to the replacement node. Suppose the malicious nodesmodify each help symbol to be sent to the replacement node with probability P , we have (1 − (1 − P ) θ L ) M ≥ P det . (18)So there is a trade-off between θ L and θ H : the number of data blocks encoded by the fractionalrate code and the number of data blocks encoded by the full rate code. If we encode using toomuch full rate code, we may not meet the detection probability P det requirement. If too muchfractional rate code is used, the redundancy may be too high.The storage efﬁciency is deﬁned as the ratio between the actual size of data to be stored andthe total storage space needed by the encoded data: δ S = θ H B H + θ L B L ( θ H + θ L ) nα = B F ( θ H + θ L ) nα . (19)Thus we can calculate the optimized parameters x , d , θ H , θ L by maximizing equation (19) underthe constraints deﬁned by equations (14), (15), (17), (18). d and x can be determined by equation (14) and (17): d = n − M − , (20) x = ( n − M − / ( n − M − . (21)Since B F is constant, to maximize δ S is equal to minimize θ H + θ L . So we can rewrite theoptimization problem as follows:Minimize θ H + θ L , subject to (15) and (18) . (22)This is a simple linear programming problem. Here we will show the optimization results directly: θ L = log (1 − P ) (1 − P /M det ) , (23) θ H = ( B F − θ L B L ) /B H . (24)In this paper we assume that we are storing large ﬁles, which means B F > θ L B L . So an optimalsolution for the 2-layer rate-matched MSR code can always be found. We have the followingtheorem: November 4, 2015 DRAFT4

Theorem 3.

When the number of blocks of the fractional rate code θ L equals to log (1 − P ) (1 − P /M det ) and the number of blocks of the full rate code θ H equals to ( B F − θ L B L ) /B H , the 2-layerrate-matched MSR code can achieve the optimal storage efﬁciency.E. Reconstruction When DC needs to reconstruct the original ﬁle, it will send reconstruction requests to n storage nodes. Upon receiving the request, node i will send out the symbol vector c i . Suppose c (cid:48) i = c i + e i is the response from the i th storage node. If c i has been modiﬁed by the maliciousnode i , we have e i ∈ F α ( θ L + θ H ) q \{ } . Since DC has the permutation information of the fractionalrate code and the full rate code, similar to the regeneration of the 2-layer rate-matched MSRcode, DC will perform the reconstruction using Algorithm 3. Algorithm 3.

DC reconstructs the original ﬁle for the 2-layer rate-matched MSR code

Step 1:

According to the order information, reconstruct each of the θ L data blocks encoded bythe fractional rate code and locate the malicious nodes. Step 2:

Reconstruct each of the data blocks encoded by the full rate code. During the recon-struction, all the symbols sent from malicious nodes will be replaced by erasures (cid:78) .In Section V-D, we optimized the parameters for the data regeneration, considering the trade-off between the successful malicious node detection probability and the storage efﬁciency. Fordata reconstruction, we have the following theorem:

Theorem 4 (Optimized Parameters) . When the number of blocks of the fractional rate code θ L equals to log (1 − P ) (1 − P /M det ) and the number of blocks of the full rate code θ H equals to ( B F − θ L B L ) /B H , the 2-layer rate-matched MSR code can guarantee that the same constraintsfor data regeneration (equation (17), (18) ) be satisﬁed for the data reconstruction.Proof: The maximum number of malicious nodes can be detected for the data reconstructionis no smaller than M : if x > . , the number is (cid:98) ( n − α − / (cid:99) . We have (cid:98) ( n − α − / (cid:99) ≥(cid:98) ( n − xd − / (cid:99) = M . If x ≤ . , the number is (cid:98) ( n − xd ) / (cid:99) . We have (cid:98) ( n − xd ) / (cid:99) ≥(cid:98) ( n − xd − / (cid:99) = M .The successful malicious node detection probability for the data reconstruction is larger than P det : the probability is (1 − (1 − P ) αθ L ) M , so we have (1 − (1 − P ) αθ L ) M > (1 − (1 − P ) θ L ) M ≥ P det . November 4, 2015 DRAFT5

Although the rate-matching equation (14) does not apply to the data reconstruction, thereconstruction strategy in Algorithm 3 can still beneﬁt from the different rates of the two codes.When x ≤ . , the fractional rate code can detect and correct (cid:98) ( n − xd ) / (cid:99) malicious nodes,which are more than (cid:98) ( n − d/ − / (cid:99) malicious nodes that the full rate code can detect. When x > . , the full rate code and the fractional rate code can detect and correct the same numberof malicious nodes: (cid:98) ( n − α − / (cid:99) .From the analysis above we can see that the same optimized parameters, which are obtained forthe data regeneration, can maintain the optimized trade-off between the malicious node detectionand storage efﬁciency for the data reconstruction. F. Performance Evaluation

From the analysis above, we know that for a distributed storage system with n storage nodesout of which at most M nodes are malicious, the 2-layer rate-matched MSR code can guaranteedetection and correction of the malicious nodes during the data regeneration and reconstructionwith the probability at least P det .For a distributed storage system with n = 30 , M = 11 and P = 0 . , suppose we have a ﬁlewith the size B F = 14000 M symbols to be stored in the system. The number of the fractionalrate code blocks θ L and the number of the full rate code blocks θ H for different detectionprobabilities P det are shown in Fig. 1. From the ﬁgure we can see that the number of fractionalrate code blocks will increase when the detection probability becomes larger. Accordingly, thenumber of full rate code blocks will decrease.For the universally resilient MSR code constructed in [2], the efﬁciency of the code with thesame regeneration performance as the 2-layer rate-matched MSR code is deﬁned as δ (cid:48) S = α (cid:48) ( α (cid:48) + 1) α (cid:48) n = α (cid:48) + 1 n = xd/ n . (25)In Fig. 2 we will show the efﬁciency ratios η = δ S /δ (cid:48) S between the 2-layer rate-matched MSRcode and the universally resilient MSR code under different detection probabilities P det . Fromthe ﬁgure we can see that the 2-layer rate-matched MSR code has higher efﬁciency than theuniversally resilient MSR code. When the successful malicious nodes detection probability is . , the efﬁciency of the 2-layer rate-matched MSR code is about higher. November 4, 2015 DRAFT6 det nu m be r o f da t a b l o cks  L  H (0.99,146)(0.99,32)(0.999,42)(0.9999,53)(0.99999,63)(0.999,143)(0.9999,140)(0.99999,136) (0.999999,133)(0.999999,73) Fig. 1. The number of fractional/full rate code blocks for different P det VI. m -L AYER R ATE - MATCHED REGENERATING C ODE

In this section, we will show our second optimization of the rate-matched MSR code: m -layerrate-matched MSR code. In the code design, we extend the design concept of the 2-layer rate-matched MSR code. Instead of encoding the data using two MSR codes with different matchfactors, we utilize m layers of the full rate MSR codes with different parameter d , written as d i for layer L i , ≤ i ≤ m , which satisfy d i ≤ d j , ∀ ≤ i ≤ j ≤ m. (26)The data will be divided into m parts and each part will be encoded by a distinct full rate MSRcode. According to the analysis above, the code with a lower code rate has better error correctioncapability.The codewords will be decoded layer by layer in the order from layer L to layer L m . Thatis, the codewords encoded by the full rate MSR code with a lower d will be decoded prior tothose encoded by the full rate MSR code with a higher d . If errors were found by the full rateMSR code with a lower d , the corresponding nodes would be marked as malicious. The symbols November 4, 2015 DRAFT7 det e ff i c i en cy r a t i o  efficiency ration  (0.99, 1.95)(0.999, 1.87)(0.9999, 1.80)(0.99999, 1.74)(0.999999, 1.68) Fig. 2. Efﬁciency ratios between the 2-layer rate-matched MSR code and the normal error correction MSR code for different P det sent from these nodes would be treated as erasures in the subsequent decoding of the full rateMSR codes with higher d ’s. The purpose of this arrangement is to try to correct as many aserroneous symbols sent by malicious nodes and locate the corresponding malicious nodes usingthe full rate MSR code with a lower rate. However, the rates of the m full rate MSR codes mustmatch to achieve an optimal performance. Here we mainly focus on the rate-matching for dataregeneration. We can see in the later analysis that the performance of data reconstruction canalso be improved with this design criterion.The main idea of this optimization is to optimize the overall error correction capability bymatching the code rates of different full rate MSR codes. A. Rate Matching and Parameters Optimization

According to Section IV-A2, the full rate MSR code CH i for layer L i can be viewed as an ( n − , d i , n − d i ) MDS code for ≤ i ≤ m . During the optimization, we set the summation of November 4, 2015 DRAFT8 the d ’s of all the layers to a constant d : m (cid:88) i =1 d i = d . (27)Here we will show the optimization through an illustrative example ﬁrst. Then we will presentthe general result.

1) Optimization for m = 3 : There are three layers of full rate MSR codes for m = 3 : CH , CH and CH .The ﬁrst layer code CH can correct t errors: t = (cid:98) ( n − d − / (cid:99) = ( n − d − − ε ) / , (28)where ε = 0 or depending on whether ( n − d − / is even or odd.By regarding the symbols from the t nodes where errors are found by CH as erasures, thesecond layer code CH can correct t errors: t = (cid:98) ( n − d − − t ) / (cid:99) + t = ( n − d − − t − ε ) / t = (2( n − d ) + n − d − ε − ε − / , (29)where ε = 0 or , with the restriction that n − d − ≥ t , which can be written as: − d + 2 d ≤ n + ε − . (30)The third layer code CH also treat the symbols from the t nodes as erasures. CH can correct t errors: t = (cid:98) ( n − d − − t ) / (cid:99) + t = ( n − d − − t − ε ) / t (31) = (4( n − d ) + 2( n − d ) + n − d − ε − ε − ε − / , where ε = 0 or , with the restriction that n − d − ≥ t , which can be written as: − d − d + 4 d ≤ n + ε + 2 ε − . (32)According to the analysis above, the d ’s of the three layers satisfy: d − d ≤ , (33) d − d ≤ . (34) November 4, 2015 DRAFT9

And we can rewrite equation (27) as: d + d + d ≤ d , (35) − d − d − d ≤ − d . (36)To maximize the error correction capability of the m -layer rate-matched MSR code for m = 3 ,we have to maximize t , the number of errors that the third layer code CH can correct, since t has included all the malicious nodes from which errors are found by the codes of ﬁrst twolayers. With all the constraints listed above, the optimization problem can written as:Maximize t in (31) , subject to (30) , (32) , (33) , (34) , (35) , (36) . (37)Now we have changed this optimization problem into a typical linear programming problem.This linear programming problem has a feasible solution. We solve it using the SIMPLEXalgorithm [34]. When d = d = d = Round ( d /

3) = (cid:101) d , the m -layer rate-matched MSR codecan correct errors from at most (cid:101) t = (7 n − (cid:101) d − ε − ε − ε − / ≥ (7 n − (cid:101) d − / (worst case) (38)malicious nodes, where Round( · ) is the rounding operation.

2) Evaluation of the Optimization for m = 3 : Similar to the storage efﬁciency δ S deﬁned inSection V, here we can deﬁne the error correction efﬁciency δ C of the m -layer rate-matchedMSR code as the ratio between the maximum number of malicious nodes that can be found andthe total number of storage nodes in the network: δ C = (7 n − (cid:101) d − / (8 n ) . (39)The universally resilient MSR code with the same code rate can be viewed as an ( n − , (cid:101) d, n − (cid:101) d ) MDS code which can correct errors from at most ( n − (cid:101) d − / malicious nodes (best case). Sothe error correction efﬁciency δ (cid:48) C is δ (cid:48) C = ( n − (cid:101) d − / (2 n ) . (40)The comparison of the error correction capability between m -layer rate-matched MSR code for m = 3 and universally resilient MSR code is shown in Fig. 3. In this comparison, we set the November 4, 2015 DRAFT0

20 30 40 50 600.20.30.40.50.6 d E rr o r c o rr e c t i on e ff i c i en cy  C  ' C Fig. 3. Comparison of the error correction capability between m -layer rate-matched MSR code for m = 3 and universallyresilient MSR code number of storage nodes in the network n = 30 . From the ﬁgure we can see that the m -layerrate-matched MSR code for m = 3 improves the error correction efﬁciency more than .

3) General Optimization Result:

For the general m -layer rate-matched MSR code, the opti-mization process is similar.The ﬁrst layer code CH can correct t errors as in equation (28). By regarding the symbolsfrom the t i − nodes where errors are found by CH i − as erasures, the i th layer code can correct t i errors for ≤ i ≤ m : t i = (cid:98) ( n − d i − − t i − ) / (cid:99) + t i − = ( n − d i − − t i − − ε i ) / t i − (41) = (cid:32) i (cid:88) j =1 j − ( n − d j ) − i (cid:88) j =1 j − ε j − i + 1 (cid:33) / i , where ε i = 0 or , with the restriction that n − d i − ≥ t i − , which can be written as: − i − (cid:88) j =1 j − d j + 2 i − d i ≤ n + i − (cid:88) j =1 j − ε j − . (42) November 4, 2015 DRAFT1

Similarly, the parameter d of the i th layer for ≤ i ≤ m must satisfy d i − − d i ≤ . (43)And equation (27) can be written as: m (cid:88) j =1 d j ≤ d , (44) − m (cid:88) j =1 d j ≤ − d . (45)We can maximize the error correction capability of the m -layer rate-matched MSR code bymaximizing t m . With all the constrains listed above, the optimization problem can be written as:Maximize t i for i = m in (41) , subject to (42) and (43) for ≤ i ≤ m, (44) , (45) . (46)After verifying that this linear programming problem has a feasible solution, we can use theSIMPLEX algorithm to solve it. The optimization result can be summarized as follows: Theorem 5.

For the regeneration of m -layer rate-matched MSR code, when d i = Round(d / m) = (cid:101) d for ≤ i ≤ m , (47) it can correct errors from at most (cid:101) t m = ((2 m − n − (cid:101) d ) − m (cid:88) j =1 j − ε j − m + 1) / m ≥ ((2 m − n − (cid:101) d ) − m +1 +2) / m (worst case) . (48) malicious nodes. The error correction efﬁciency for the m -layer rate-matched MSR code is δ C = ((2 m − n − (cid:101) d ) − m +1 + 2) / (2 m n ) . (49)This is a monotonically increasing function for m , so we have: Corollary 1.

The error correction efﬁciency of the m -layer rate-matched MSR code increaseswith m, which is the number of layers. November 4, 2015 DRAFT2

Remark 1.

During the optimization, we set the code rate of the rate-matched MSR code toa constant value and maximize the error correction capability. To optimizing the rate-matchedMSR code, we can also set the error correction capability t i for i = m in (41) to a constantvalue t m = t (50) and maximize the code rate. The problem can be written as:Maximize (cid:80) mj =1 d j subject to (42) and (43) for ≤ i ≤ m, (50) . (51) The optimization result is the same as that of (46): when all the d (cid:48) i s for ≤ i ≤ m are thesame, the code rate is maximized. d i , ≤ i ≤ m , satisﬁes the following equation: d i ≥ n − m t + 2 m +1 − m − (worst case) . (52)

4) Evaluation of the Optimization:

Although at the beginning of this section we propose todecode the code with a lower rate ﬁrst in the m -layer rate-matched MSR code, equation (55)shows that we can get the optimized error correction capability when all the rates of the codesin the m -layer code are equal. However, this result is not in conﬂict with our assumption inequation (26). a) Comparison with the Hermitian code based MSR code in [3]: The Hermitian code basedMSR code (H-MSR code) in [3] has better error correction capability than the universally resilientMSR code. However, because the structure of the underlying Hermitian code is predetermined,the error correction capability might not be optimal. In ﬁgure 4, the maximum number ofmalicious nodes from which the errors can be corrected by the H-MSR code is shown. Here weset the parameter q of the Hermitian code [35] from 4 to 16 with a step of 2. In the ﬁgure, wealso plot the performance of the m -layer rate-matched MSR code with the same code rates as theH-MSR code. The comparison result demonstrates that the rate-matched MSR code has bettererror correction capability than the H-MSR code. Moreover, the rate-matched code is easier tounderstand and has more ﬂexibility than the H-MSR code. b) Number of layers and error correction efﬁciency: Since we have seen the advantage ofthe rate-matched MSR code over the universally resilient MSR code in Section VI-A2, here wewill mainly discuss how the number of layers can affect the error correction efﬁciency. The error

November 4, 2015 DRAFT3 M a x i m un no . o f m a li c i ou s node s f r o m w h i c h t he e rr o r s c an be c o rr e c t ed Rate-matched MSRH-MSRNormal Error Correction MSR

Fig. 4. Comparison of error correction capability between the m -layer rate matched MSR code and the H-MSR code correction efﬁciency of the m -layer rate-matched MSR code is shown is Fig. 5, where we set n = 30 and d = 50 . We also plot the error correction efﬁciency δ (cid:48) C of the universally resilientMSR code with same code rates for comparison. From the ﬁgure we can see that when n and d are ﬁxed, the optimal error correction efﬁciency will increase with the number of layers m as in Corollary 1. c) Optimized storage capacity: Moreover, the optimization condition in equation (55) alsoleads to maximum storage capacity besides the optimal error correction capability. We have thefollowing theorem:

Theorem 6.

The m -layer rate-matched MSR code can achieve the maximum storage capacityif the parameter d ’s of all the layers are the same, under the constraint in equation (27).Proof: The code of the i th layer can store one block of data with the size B i = α i ( α i + 1) =( d i / d i / . So the m -layer code can store data with the size B = (cid:80) mi =1 ( d i / d i / .Our goal here is to maximize B under the constraint in equation (27). November 4, 2015 DRAFT4 E rr o r c o rr e c t i on e ff i c i en cy  C  ' C Fig. 5. The optimal error correction efﬁciency of the m -layer rate-matched MSR code under different m for ≤ m ≤ We can use Lagrange multipliers to ﬁnd the point of maximum B . Let Λ L ( d , . . . , d m , λ ) = m (cid:88) i =1 ( d i / d i / λ ( m (cid:88) i =1 d i − d ) . (53)We can ﬁnd the maximum value of B by setting the partial derivatives of this equation to zero: ∂ Λ L ∂d i = d i + 12 − λ = 0 , ∀ ≤ i ≤ m. (54)Here we can see that when all the parameter d ’s of all the layers are the same, we can getthe maximum storage capacity B . This maximization condition coincides with the optimizationcondition for achieving the goal of this section: optimizing the overall error correction capabilityof the rate-matched MSR code. B. Practical Consideration of the Optimization

So far, we implicitly presume that there is only one data block of the size B i = α i ( α i + 1) foreach layer i . In practical distributed storage, it is the parameter d i that is ﬁxed instead of d , thesummation of d i . However, as long as we use m layers of MSR codes with the same parameter d = (cid:101) d , we will still get the optimal solution for d = m (cid:101) d . In fact, the m -layer rate-matched November 4, 2015 DRAFT5 m E rr o r C o rr e c t i on E ff i c i en cy e d =5 e d =10 Fig. 6. The optimal error correction efﬁciency for ≤ m ≤ MSR code here becomes a single full rate MSR code with parameter d = (cid:101) d and m data blocks.And based on the dependent decoding idea we describe at the beginning of Section VI, we canachieve the optimal performance.So when the ﬁle size B F is larger than one data block size (cid:101) B of the single full rate MSRcode with parameter d = (cid:101) d , we will divide the ﬁle into (cid:100) B F / (cid:101) B (cid:101) data blocks and encode themseparately. If we decode these data blocks dependently, we can get the optimal error correctionefﬁciency.

1) Evaluation of the Optimal Error Correction Efﬁciency:

In the practical case, (cid:101) d in equa-tion (49) is ﬁxed. So here we will study the relationship between the number of dependentlydecoding data blocks m and the error correction efﬁciency δ C , which is shown in Fig. 6. Weset n = 30 and (cid:101) d = 5 , . From the ﬁgure we can see that although δ C will become higherwith the increasing of dependently decoding data blocks m , the efﬁciency improvement will benegligible for m ≥ . Actually when m = 7 the efﬁciency has already become of the upperbound of δ C .On the other hand, there exist parallel algorithms for fast MDS code decoding [36]. We candecode blocks of MDS codewords parallel in a pipeline fashion to accelerate the overall decoding November 4, 2015 DRAFT6 speed. The more blocks of codewords we decode parallel, the faster we will ﬁnish the wholedecoding process. For large ﬁles that could be divided into a large amount of data blocks ( θ blocks), we can get a trade-off between the optimal error correction efﬁciency and the decodingspeed by setting the number of dependently decoding data blocks m and the number of paralleldecoding data blocks ρ under the constraint θ = mρ . C. Encoding

From the analysis above we know that to encode a ﬁle with size B F using the optimal m -layerrate-matched MSR code is to encode the ﬁle using a full rate MSR code with predeterminedparameter d = 2 α = (cid:101) d . First the ﬁle will be divided into θ blocks of data with size (cid:101) B , where θ = (cid:100) B F / (cid:101) B (cid:101) . Then the θ blocks of data will be encoded into code matrices CH , . . . , CH θ andform the ﬁnal n × αθ codeword matrix: CM = [ CH , . . . , CH θ ] . Each row c i = [ ch ,i , . . . , ch θ,i ] , ≤ i ≤ n − , of the codeword matrix CM will be stored in storage node i , where ch j,i is the i th row of CH j , ≤ j ≤ θ . The encoding vector ψ i for storage node i is the i th row of Ψ inequation (4). Theorem 7.

The encoding of m-layer rate-matched MSR code can achieve the MSR point inequation (2) since both the full rate code and the fractional code are MSR codes.D. Regeneration

Suppose node z fails, the replacement node z (cid:48) will send regeneration requests to the rest of n − helper nodes. Upon receiving the regeneration request, helper node i will calculate andsend out the help symbol p i = c i φ Tz .As we discuss above, combining both dependent decoding and parallel decoding can achievethe trade-off between optimal error correction efﬁciency and decoding speed. Although all θ blocks of data are encoded with the same MSR code, z (cid:48) will place the received help symbolsinto a 2-dimension lattice with size m × ρ as shown in Fig. 7. In each grid of the lattice thereare n − help symbols corresponding to one data block, received from n − helper nodes. Wecan view each row of the lattice as related to a layer of an m -layer rate-matched MSR codewith ρ blocks of data, which will be decoded parallel. We also view each column of the latticeas related to m layers of an m -layer rate-matched MSR code with one block of data each layer, November 4, 2015 DRAFT7 data block data block data block ρ data block ρ+1 data block ρ+2 data block data block (m-1)ρ+1 data block (m-1)ρ+2 data block mρ Layer 1

Layer 2

Layer m

Parallel decode the rowParallel decode the rowParallel decode the rowParallel decode the row

Dependently decode the column

Note: In each grid i there are n-1 help symbols received from n-1 help nodes, corresponding to data block i

Fig. 7. Lattice of received help symbols for regeneration which will be decoded dependently. z (cid:48) will perform Algorithm 4 to regenerate the contents ofthe failed node z .Arrange the received help symbols according to Fig. 7. Repeat the following steps from Layer to Layer m : Algorithm 4. z (cid:48) regenerates symbols of the failed node z for the m -layer rate-matched MSRcode Step 1:

For a certain grid, if errors are detected in the symbols sent by node i in previous layersof the same column, replace the symbol sent from node i by an erasure (cid:78) . Step 2:

Parallel regenerate all the symbols related to ρ data blocks using the algorithm similarto Algorithm 1 with only one difference: parallel decode all the ρ MDS codes in

Step1 of Algorithm 1.The error correction capability of the regeneration is described in Theorem 5.

November 4, 2015 DRAFT8

E. Reconstruction

When DC needs to reconstruct the original ﬁle, it will send reconstruction requests to n storagenodes. Upon receiving the request, node i will send out the symbol vector c i . Suppose c (cid:48) i = c i + e i is the response from the i th storage node. If c i has been modiﬁed by the malicious node i , wehave e i ∈ F αθq \{ } . The strategy of combining dependent decoding and parallel decoding forreconstruction is similar to that for regeneration. DC will place the received symbols into a2-dimension lattice with size m × ρ . The only difference is that in a grid of the lattice thereare n symbol vectors ch (cid:48) j, , . . . , ch (cid:48) j,n − corresponding to data block j , received from n storagenodes. DC will perform the reconstruction using Algorithm 5.Arrange the received symbols similar to Fig. 7. Here we place received codeword matrix CH (cid:48) j into grid j instead of help symbols received from n-1 help nodes. Repeat the following stepsfrom Layer to Layer m : Algorithm 5.

DC reconstructs the original ﬁle for the m -layer rate-matched MSR code Step 1:

For a certain grid, if errors are detected in the symbols sent by node i in previous layersof the same column, replace symbols sent from node i by erasures (cid:78) . Step 2:

Parallel reconstruct all the symbols of the ρ data blocks using the algorithm similarto Section IV-A3 with only one difference: parallel decode all the MDS codes inSection IV-A3.For data reconstruction, we have the following theorem: Theorem 8 (Optimized Parameters) . For the reconstruction of m -layer rate-matched MSR code,when d i = Round(d / m) = (cid:101) d for ≤ i ≤ m , (55) the number of malicious nodes from which the errors can be corrected is maximized.Proof: From Section VI-A we know that for regeneration of an optimal m -layer rate-matchedMSR code, the parameter d ’s of all the layers are the same, which implies the parameter α ’s of alllayers are also the same. Since the optimization of regeneration is derived based on the decodingof ( n − , d, n − d ) MDS codes and in reconstruction we have to decode ( n − , α, n − α ) MDS

November 4, 2015 DRAFT9 codes, if the parameter α ’s of all the layers are the same, we can achieve the same optimizationresults for reconstruction. VII. C ONCLUSION

In this paper, we develop two rate-matched regenerating codes for malicious nodes detectionand correction in hostile networks: 2-layer rate-matched regenerating code and m -layer rate-matched regenerating code. We propose the encoding, regeneration and reconstruction algorithmsfor both codes. For the 2-layer rate-matched code, we optimize the parameters for the dataregeneration, considering the trade-off between the malicious nodes detection probability and thestorage efﬁciency. Theoretical analysis shows that the code can successfully detect and correctmalicious nodes using the optimized parameters. Our analysis also shows that the code has higherstorage efﬁciency compared to the universally resilient regenerating code ( higher for thedetection probability . ). Then we extend the 2-layer code to m -layer code and optimizethe overall error correction efﬁciency by matching the code rate of each layer’s regeneratingcode. Theoretical analysis shows that the optimized parameter could also achieve the maximumstorage capacity under the same constraint. Furthermore, analysis shows that compared to theuniversally resilient regenerating code, our code can improve the error correction efﬁciency morethan . R EFERENCES [1] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,”

IEEE Transactions on Information Theory , vol. 56, pp. 4539 – 4551, 2010.[2] K. Rashmi, N. Shah, K. Ramchandran, and P. Kumar, “Regenerating codes for errors and erasures in distributed storage,”in

International Symposium on Information Theory (ISIT) 2012 , pp. 1202–1206, 2012.[3] J. Li, T. Li, and J. Ren, “Beyond the mds bound in distributed cloud storage,” in

INFOCOM, 2014 Proceedings IEEE ,pp. 307–315, April 2014.[4] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz, “Maintenance-free global datastorage,”

IEEE Internet Computing , vol. 5, pp. 40 – 49, 2001.[5] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker, “Total recall: System support for automated availabilitymanagement,” in roc. Symp. Netw. Syst. Design Implementation , pp. 337–350, 2004.[6] D. Cullina, A. G. Dimakis, and T. Ho, “Searching for minimum storage regenerating codes,”

Available:arXiv:0910.2245 ,2009.[7] N. Shah, K. Rashmi, P. Kumar, and K. Ramchandran, “Explicit codes minimizing repair bandwidth for distributed storage,”in

Information Theory Workshop (ITW), 2010 IEEE , pp. 1–5, 2010.

November 4, 2015 DRAFT0 [8] C. Suh and K. Ramchandran, “Exact-repair mds codes for distributed storage using interference alignment,” in , pp. 161–165, 2010.[9] Y. Wu, “A construction of systematic mds codes with minimum repair bandwidth,”

IEEE Transactions on InformationTheory , vol. 57, no. 6, pp. 3738–3741, 2011.[10] D. Papailiopoulos, J. Luo, A. Dimakis, C. Huang, and J. Li, “Simple regenerating codes: Network coding for cloud storage,”in

INFOCOM, 2012 Proceedings IEEE , pp. 2801–2805, 2012.[11] S. El Rouayheb and K. Ramchandran, “Fractional repetition codes for repair in distributed storage systems,” in , pp. 1510–1517, 2010.[12] I. Tamo, Z. Wang, and J. Bruck, “Mds array codes with optimal rebuilding,” in , pp. 1240–1244, 2011.[13] V. R. Cadambe, C. Huang, S. A. Jafar, and J. Li, “Optimal repair of mds codes in distributed storage via subspaceinterference alignment,”

Available:arXiv:1106.1250 , 2011.[14] D. Papailiopoulos, A. Dimakis, and V. Cadambe, “Repair optimal erasure codes through hadamard designs,”

IEEETransactions on Information Theory , vol. 59, no. 5, pp. 3021–3037, 2013.[15] N. Shah, K. V. Rashmi, and P. Kumar, “A ﬂexible class of regenerating codes for distributed storage,” in , pp. 1943–1947, 2010.[16] K. Shum and Y. Hu, “Existence of minimum-repair-bandwidth cooperative regenerating codes,” in , pp. 1–6, 2011.[17] A. Wang and Z. Zhang, “Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage,”in

INFOCOM, 2013 Proceedings IEEE , pp. 400–404, 2013.[18] H. Hou, K. W. Shum, M. Chen, and H. Li, “Basic regenerating code: Binary addition and shift for exact repair,” in , pp. 1621–1625, 2013.[19] Y.-L. Chen, G.-M. Li, C.-T. Tsai, S.-M. Yuan, and H.-T. Chiao, “Regenerating code based p2p storage scheme withcaching,” in

ICCIT ’09. Fourth International Conference on Computer Sciences and Convergence Information Technology,2009 , pp. 927–932, 2009.[20] Y. Wu, A. G. Dimakis, and K. Ramchandran, “Deterministic regenerating codes for distributed storage,” in , 2007.[21] A. Duminuco and E. Biersack, “A practical study of regenerating codes for peer-to-peer backup systems,” in

ICDCS ’09.29th IEEE International Conference on Distributed Computing Systems, 2009 , pp. 376 – 384, June 2009.[22] K. Shum, “Cooperative regenerating codes for distributed storage systems,” in , pp. 1–5, 2011.[23] Y. Wu and A. G. Dimakis, “Reducing repair trafﬁc for erasure coding-based storage via interference alignment,” in

IEEEInternational Symposium on Information Theory, 2009. ISIT 2009. , pp. 2276–2280, 2009.[24] N. Shah, K. Rashmi, P. Kumar, and K. Ramchandran, “Interference alignment in regenerating codes for distributed storage:Necessity and code constructions,”

IEEE Transactions on Information Theory , vol. 58, pp. 2134 – 2158, 2012.[25] K. Rashmi, N. Shah, and P. Kumar, “Optimal exact-regenerating codes for distributed storage at the msr and mbr pointsvia a product-matrix construction,”

IEEE Transactions on Information Theory , vol. 57, pp. 5227–5239, 2011.[26] F. Oggier and A. Datta, “Byzantine fault tolerance of regenerating codes,” in , pp. 112–121, 2011.

November 4, 2015 DRAFT1 [27] S. Pawar, S. El Rouayheb, and K. Ramchandran, “Securing dynamic distributed storage systems against eavesdroppingand adversarial attacks,”

IEEE Transactions on Information Theory , vol. 57, pp. 6734 – 6753, 2011.[28] Y. Han, R. Zheng, and W. H. Mow, “Exact regenerating codes for byzantine fault tolerance in distributed storage,” in

Proceedings IEEE INFOCOM , pp. 2498 – 2506, 2012.[29] H. Chen and P. Lee, “Enabling data integrity protection in regenerating-coding-based cloud storage,” in , pp. 51–60, 2012.[30] C. Cachin and S. Tessaro, “Optimal resilience for erasure-coded byzantine distributed storage,” in

DSN 2006. InternationalConference on Dependable Systems and Networks, 2006 , pp. 115–124, 2006.[31] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie, “Lazy veriﬁcation in fault-tolerant distributed storagesystems,” in

SRDS 2005. 24th IEEE Symposium on Reliable Distributed Systems, 2005 , pp. 179–190, 2005.[32] N. B. Shah, K. V. Rashmi, K. Ramchandran, and P. V. Kumar, “Privacy-preserving and secure distributed storage codes,” ∼ nihar/publications/privacy security.pdf/ .[33] J. Li, T. Li, and J. Ren, “Secure regenerating code,” in IEEE GLOBECOM 2014 , pp. 770–774, 2014.[34] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,

Introduction to Algorithms . The MIT Press, 3rd ed., 2009.[35] J. Ren, “On the structure of hermitian codes and decoding for burst errors,”

IEEE Transactions on Information Theory ,vol. 50, pp. 2850– 2854, 2004.[36] D. Dabiri and I. Blake, “Fast parallel algorithms for decoding reed-solomon codes based on remainder polynomials,”

IEEETransactions on Information Theory , vol. 41, pp. 873–885, Jul 1995., vol. 41, pp. 873–885, Jul 1995.