Optimal Construction of Regenerating Code through Rate-matching in Hostile Networks
11 Optimal Construction of Regenerating Codethrough Rate-matching in Hostile Networks
Jian Li Tongtong Li Jian Ren
Abstract
Regenerating code is a class of code very suitable for distributed storage systems, which can maintainoptimal bandwidth and storage space. Two types of important regenerating code have been constructed:the minimum storage regeneration (MSR) code and the minimum bandwidth regeneration (MBR) code.However, in hostile networks where adversaries can compromise storage nodes, the storage capacityof the network can be significantly affected. In this paper, we propose two optimal constructions ofregenerating codes through rate-matching that can combat against this kind of adversaries in hostilenetworks: 2-layer rate-matched regenerating code and m -layer rate-matched regenerating code. Forthe 2-layer code, we can achieve the optimal storage efficiency for given system requirements. Ourcomprehensive analysis shows that our code can detect and correct malicious nodes with higher storageefficiency compared to the universally resilient regenerating code which is a straightforward extensionof regenerating code with error detection and correction capability. Then we propose the m -layer codeby extending the 2-layer code and achieve the optimal error correction efficiency by matching the coderate of each layer’s regenerating code. We also demonstrate that the optimized parameter can achieve themaximum storage capacity under the same constraint. Compared to the universally resilient regeneratingcode, our code can achieve much higher error correction efficiency. Index Terms
Optimal regenerating code, MDS code, error-correction, adversary.
I. I
NTRODUCTION
Distributed storage is a popular method to store files securely without requiring data encryp-tion. Instead of storing a file and its replications in multiple servers, we can break the file into
The authors are with the Department of ECE, Michigan State University, East Lansing, MI 48824-1226. Email: { lijian6,tongli, renjian } @msu.edu November 4, 2015 DRAFT a r X i v : . [ c s . I T ] N ov components and store the components into multiple servers. In this way, both the reliability andthe security of the file can be increased. A typical approach is to encode the file using an ( n, k ) Reed-Solomon (RS) code and distribute the encoded file into n servers. When we need to recoverthe file, we only need to collect the encoded parts from k servers, which achieves a trade-offbetween reliability and efficiency. However, when repairing or regenerating the contents of afailed node, the whole file has to be recovered first, which is a waste of bandwidth.The concept of regenerating code was introduced in [1], where a replacement node is allowedto connect to some individual nodes directly and regenerate a substitute of the failed node, insteadof first recovering the original data then regenerating the failed component. Compared to the RScode, regenerating code achieves an optimal tradeoff between bandwidth and storage within theminimum storage regeneration (MSR) and the minimum bandwidth regeneration (MBR) points.However, when malicious behaviors exist in the network, both the regeneration of the failednode or the reconstruction of the original file will fail. The error resilience of the Reed-Solomencode based regenerating code in the network with errors and erasures was analyzed in [2]. Inour previous work, a Hermitian code based regenerating code was proposed to provide bettererror correction capability compared to the Reed-Solomen code based approach.Inspired by the nice performance of Hermitian code based regenerating codes, in this paper westep forward to further construct optimal regenerating codes which have similar layered structurelike Hermitian code in distributed storage. The main contributions of this paper are: • We propose an optimal construction of 2-layer rate-matched regenerating code. Both theoret-ical analysis and performance evaluation show that this code can achieve storage efficiencyhigher than the universally resilient regenerating code proposed in [2]. • We propose an optimal construction of m -layer rate-matched regenerating code. The m -layer code can achieve higher error correction efficiency than the code proposed in [2] andthe Hermitian code based regenerating code proposed in [3]. Furthermore, the m -layeredcode is easier to understand and has more flexibility than the Hermitian based code.Here we will focus on error correction and malicious node locating in data regeneration andreconstruction in distributed storage. When no error occurs or no malicious node exists, the dataregeneration and reconstruction can be processed the same as the existing works.It it worth to note that although there are two types of regenerating codes: MSR code andMBR code on the MSR point and MBR point respectively, in this paper we will only focus on November 4, 2015 DRAFT the optimization of the MSR code for the following two reasons:1) The processes and results of the optimization for these two codes are similar. The optimiza-tion for the MSR code can be directly applied to the MBR code with similar optimizationresults.2) The differences between the constructions of MSR code and MBR code have little impacton the optimization proposed in this paper.The rest of this paper is organized as follows: in Section II we introduce the related work.In Section III, the preliminary of this paper is presented. In Section IV, we propose twocomponent codes for the rate-matched regenerating codes. We propose and analyze the 2-layerrate-matched regenerating code in Section V. Then we propose and analyze the m -layer rate-matched regenerating code in Section VI. The paper is concluded in Section VII.II. R ELATED W ORK
When a storage node in the distributed storage network that employing the conventional ( n, k ) RS code (such as OceanStore [4] and Total Recall [5]) fails, the replacement node connects to k nodes and downloads the whole file to recover the symbols stored in the failed node. Thisapproach is a waste of bandwidth because the whole file has to be downloaded to recovera fraction of it. To overcome this drawback, Dimakis et al . [1] introduced the conception of { n, k, d, α, β, B } regenerating code based on the network coding. In the context of regeneratingcode, the contents stored in a failed node can be regenerated by the replacement node throughdownloading γ help symbols from d helper nodes. The bandwidth consumption for the failednode regeneration could be far less than the whole file. A data collector (DC) can reconstructthe original file stored in the network by downloading α symbols from each of the k storagenodes. In [1], the authors proved that there is a tradeoff between bandwidth γ and per nodestorage α . They found two optimal points: minimum storage regeneration (MSR) and minimumbandwidth regeneration (MBR) points. Currently there are many literatures focusing on theoptimal regenerating codes design: [6]–[17]. In [18], [19] the implementation of the regeneratingcode were studied.The regenerating code can be divided into functional regeneration and exact regeneration. In thefunctional regeneration, the replacement node regenerates a new component that can functionallyreplace the failed component instead of being the same as the original stored component. [20] November 4, 2015 DRAFT formulated the data regeneration as a multicast network coding problem and constructed func-tional regenerating codes. [21] implemented a random linear regenerating codes for distributedstorage systems. [22] proved that by allowing data exchange among the replacement nodes, abetter tradeoff between repair bandwidth γ and per node storage α can be achieved. In theexact regeneration, the replacement node regenerates the exact symbols of a failed node. [23]proposed to reduce the regeneration bandwidth through algebraic alignment. [24] provided a codestructure for exact regeneration using interference alignment technique. [25] presented optimalexact constructions of MBR codes and MSR codes under product-matrix framework. This is thefirst work that allows independent selection of the nodes number n in the network.None of these works above considered code regeneration under node corruption or adversarialmanipulation attacks in hostile networks. In fact, all these schemes will fail in both regenerationand reconstruction if there are nodes in the storage cloud sending out incorrect responses to theregeneration and reconstruction requests.In [26], the Byzantine fault tolerance of regenerating codes were studied. In [27], the authorsdiscussed the amount of information that can be safely stored against passive eavesdropping andactive adversarial attacks based on the regeneration structure. In [28], the authors proposed toadd CRC codes in the regenerating code to check the integrity of the data in hostile networks.Unfortunately, the CRC checks can also be manipulated by the malicious nodes, resulting in thefailure of the regeneration and reconstruction. In [29], the authors proposed to add data integrityprotection in distributed storage. In [30], the authors proposed an erasure-coded distributedstorage based on threshold cryptography. In [31], the authors analyzed the verification cost forboth the client read and write operation in workloads with idle periods. In [2], the authorsanalyzed the error resilience of the RS code based regenerating code in the network witherrors and erasures. They provided the theoretical error correction capability. In [3] the authorsproposed a Hermitian code based regenerating code, which could provide better error correctioncapability. In [32] the authors proposed the universally secure regenerating code to achieveinformation theoretic data confidentiality. But the extra computational cost and bandwidth haveto be considered for this code. In [33] the authors proposed to apply linear feedback shift register(LFSR) to protect the data confidentiality. November 4, 2015 DRAFT
III. P
RELIMINARY AND A SSUMPTIONS
A. Regenerating Code
Regenerating code introduced in [1] is a linear code over finite filed F q with a set of parameters { n, k, d, α, β, B } . A file of size B is stored in n storage nodes, each of which stores α symbols.A replacement node can regenerate the contents of a failed node by downloading β symbolsfrom each of d randomly selected storage nodes. So the total bandwidth needed to regenerate afailed node is γ = dβ . The data collector (DC) can reconstruct the whole file by downloading α symbols from each of k ≤ d randomly selected storage nodes. In [1], the following theoreticalbound was derived: B ≤ k − (cid:88) i =0 min { α, ( d − i ) β } . (1)From equation (1), a trade-off between the regeneration bandwidth γ and the storage requirement α was derived. γ and α cannot be decreased at the same time. There are two special cases:minimum storage regeneration (MSR) point in which the storage parameter α is minimized; ( α MSR , γ
MSR ) = (cid:18)
Bk , Bdk ( d − k + 1) (cid:19) , (2)and minimum bandwidth regeneration (MBR) point in which the bandwidth γ is minimized: ( α MBR , γ
MBR ) = (cid:18) Bd kd − k + k , Bd kd − k + k (cid:19) . (3) B. System Assumptions and Adversarial Model
In this paper, we assume there is a secure server that is responsible for encoding and dis-tributing the data to storage nodes. Replacement nodes will also be initialized by the secureserver. DC and the secure server can be implemented in the same computer and can never becompromised. We use the notation CH / CL to refer to either the full rate/fractional rate MSR codeor a codeword of the full rate/fractional rate MSR code. The exact meaning can be discriminatedclearly according to the context.We assume some network nodes may be corrupted due to hardware failure or communicationerrors, and/or be compromised by malicious users. As a result, upon request, these nodes maysend out incorrect responses to disrupt the data regeneration and reconstruction. The adversarymodel is the same as [2], We assume that the malicious users can take full control of τ ( τ ≤ n and corresponds to s in [2]) storage nodes and collude to perform attacks. November 4, 2015 DRAFT
We will refer these symbols as bogus symbols without making distinction between the cor-rupted symbols and compromised symbols. We will also use corrupted nodes, malicious nodesand compromised nodes interchangeably without making any distinction.IV. C
OMPONENT C ODES OF R ATE - MATCHED R EGENERATING C ODE
In this section, we will introduce two different component codes for rate-matched MSR codeon the MSR point with d = 2 k − . The code based on the MSR point with d > k − can bederived the same way through truncating operations. In the rate-matched MSR code, there aretwo types of MSR codes with different code rates: full rate code and fractional rate code. A. Full Rate Code1) Encoding:
The full rate code is encoded based on the product-matrix code frameworkin [25]. According to equation (2), we have α H = d/ , β H = 1 for one block of data with thesize B H = ( α + 1) α . The data will be arranged into two α × α symmetric matrices S , S , eachof which will contain B H / data. The codeword CH is defined as CH = [Φ ΛΦ] S S = Ψ M H = ch ... ch n , (4)where Φ = . . . g g . . . g α − ... ... ... . . . ... g n − ( g n − ) . . . ( g n − ) α − (5)is a Vandermonde matrix and Λ = diag [ λ , λ , · · · , λ n ] such that λ i ∈ F q and λ i (cid:54) = λ j for ≤ i, j ≤ n, i (cid:54) = j , g is a primitive element in F q , and any d rows of Ψ are linearly independent.Then each row ch i = ψ i M H ( ≤ i < n ) of the codeword matrix CH will be stored in storagenode i , where the encoding vector ψ i is the i th row of Ψ .
2) Regeneration:
Suppose node z fails, the replacement node z (cid:48) will send regeneration requeststo the rest of n − helper nodes. Upon receiving the regeneration request, helper node i willcalculate and send out the help symbol p i = ch i φ Tz = ψ i M H φ Tz , where φ z is the z th row of Φ . z (cid:48) will perform Algorithm 1 to regenerate the contents of the failed node z . For convenience, November 4, 2015 DRAFT we define Ψ i → j = (cid:104) ψ Ti , ψ Ti +1 · · · , ψ Tj (cid:105) T , where ψ t is the t th row of Ψ ( i ≤ t ≤ j ) and x ( j ) isthe vector containing the first j symbols of M H φ Tz .Suppose p (cid:48) i = p i + e i is the response from the i th helper node. If p i has been modified by themalicious node i , we have e i ∈ F q \{ } . We can successfully regenerate the symbols in node z when the number of errors in the received help symbols p i (cid:48) from n − helper nodes is lessthan (cid:98) ( n − d − / (cid:99) , where (cid:98)·(cid:99) is the floor operation. Without loss of generality, we assume ≤ i ≤ n − . Algorithm 1. z (cid:48) regenerates symbols of the failed node z Step 1:
Decode p (cid:48) to p cw , where p (cid:48) = [ p (cid:48) , p (cid:48) , · · · , p (cid:48) n − ] T can be viewed as an MDS code withparameters ( n − , d, n − d ) since Ψ → ( n − · x ( n − = p (cid:48) . Step 2:
Solve Ψ → ( n − · x ( n − = p cw and compute ch z = φ z S + λ z φ z S as described in [25]. Proposition 1.
For regeneration, the full rate code can correct errors from (cid:98) ( n − d − / (cid:99) malicious nodes, where (cid:98)·(cid:99) is the floor operation.3) Reconstruction: When the DC needs to reconstruct the original file, it will send reconstruc-tion requests to n storage nodes. Upon receiving the request, node i will send out the symbolvector c i to the DC. Suppose c (cid:48) i = c i + e i is the response from the i th storage node. If c i hasbeen modified by the malicious node i , we have e i ∈ F αq \{ } .The DC will reconstruct the file as follows: Let R (cid:48) = [ ch (cid:48) T , ch (cid:48) T , · · · , ch (cid:48) n − T ] T , we have Ψ S (cid:48) S (cid:48) = [Φ ΛΦ] S (cid:48) S (cid:48) = R (cid:48) , Φ S (cid:48) Φ T + ΛΦ S (cid:48) Φ T = R (cid:48) Φ T . (6)Let C = Φ S (cid:48) Φ T , D = Φ S (cid:48) Φ T , and (cid:98) R (cid:48) = R (cid:48) Φ T , then C + Λ D = (cid:98) R (cid:48) . (7)Since C, D are both symmetric, we can solve the non-diagonal elements of
C, D as follows: C i,j + λ i · D i,j = (cid:98) R (cid:48) i,j C i,j + λ j · D i,j = (cid:98) R (cid:48) j,i . (8)Because matrices C and D have the same structure, here we only focus on C (corresponding to S (cid:48) ). It is straightforward to see that if node i is malicious and there are errors in the i th row of November 4, 2015 DRAFT R (cid:48) , there will be errors in the i th row of (cid:98) R (cid:48) . Furthermore, there will be errors in the i th row and i th column of C . Define S (cid:48) Φ T = (cid:98) S (cid:48) , we have Φ (cid:98) S (cid:48) = C . Here we can view each column of C as an ( n − , α, n − α ) MDS code because Φ is a Vandermonde matrix. The length of the codeis n − since the diagonal elements of C is unknown. Suppose node j is a legitimate node,we can decode the MDS code to recover the j th column of C and locate the malicious nodes.Eventually C can be recovered. So the DC can reconstructs S using the method similar to [3],[25], For S , the recovering process is similar. Proposition 2.
For reconstruction, the full rate code can correct errors from (cid:98) ( n − α − / (cid:99) malicious nodes.B. Fractional Rate Code1) Encoding: For the fractional rate code, we also have α L = d/ , β L = 1 for one block ofdata with the size B L = xd (1 + xd ) / , x ∈ (0 , . α ( α +1) / x − . d (1+( x − . d ) / , x ∈ (0 . , , (9)where x is the match factor of the rate-matched MSR code. It is easy to see that the fractionalrate code will become the full rate code with x = 1 . The data m = [ m , m , . . . , m B L ] ∈ ( F q ) B L will be processed as follows:When x ≤ . , the data will be arranged into a symmetric matrix S of the size α × α : S = m m . . . m xd . . . m m xd +1 . . . m xd − . . . ... ... . . . ... ... . . . ... m xd m xd − . . . m B L / . . .
00 0 . . . . . . ... ... . . . ... ... . . . ... . . . . . . . (10)The codeword CL is defined as CL = [Φ ΛΦ] S = Ψ M L , (11)where is the α × α zero matrix and Φ , Λ , Ψ are the same as the full rate code. November 4, 2015 DRAFT
When x > . , the first α ( α + 1) / data will be arranged into an α × α symmetric matrix S . The rest of the data m α ( α +1) / , . . . , m B L will be arranged into another α × α symmetricmatrix S : S = m α ( α +1) / . . . m α ( α +1) / xd . . . m α ( α +1) / . . . m α ( α +1) / xd − . . . ... . . . ... ... . . . ... m α ( α +1) / xd . . . m B L / . . . . . . . . . ... . . . ... ... . . . ... . . . . . . . (12)The codeword CL is defined the same as equation (4) with the same parameters Φ , Λ and Ψ .Then each row cl i ( ≤ i < n ) of the codeword matrix CL will be stored in storage node i respectively, in which the encoding vector ψ i is the i th row of Ψ . Proposition 3.
The fractional rate code can achieve the MSR point in equation (2) since it itencoded under the product-matrix MSR code framework in [25].2) Regeneration:
The regeneration for the fractional rate code is the same as the regenerationfor the full rate code described in Section IV-A2 with only a minor difference. If we define x ( j ) as the vector containing the first j symbols of M L φ Tz , there will be only xd nonzero elements inthe vector. According to Ψ → n − · x ( n − = p (cid:48) , the received symbol vector p (cid:48) for the fractionalrate code in Step 1 of Algorithm 1 can be viewed as an ( n − , xd, n − xd ) MDS code. Since x < , we can detect and correct more errors in data regeneration using the fractional rate codethan using the full rate code. Proposition 4.
For regeneration, the fractional rate code can correct errors from (cid:98) ( n − xd − / (cid:99) malicious nodes.3) Reconstruction: The reconstruction for the fractional rate code is similar to that for thefull rate code described in Section IV-A3. Let R (cid:48) = [ cl (cid:48) T , cl (cid:48) T , · · · , cl (cid:48) Tn − ] T .When the match factor x > . , reconstruction for the fractional rate code is the same to thatfor the full rate code. November 4, 2015 DRAFT0
When x ≤ . , equation (6) can be written as: Φ S (cid:48) = R (cid:48) . (13)So we can view each column of R (cid:48) as an ( n, xd, n − xd + 1) MDS code. After decoding R (cid:48) to R cw , we can recover the data matrix S by solving the equation Φ S = R cw . Meanwhile, if the i th rows of R (cid:48) and R cw are different, we can mark node i as corrupted. Proposition 5.
For reconstruction, when the match factor x > . , the fractional rate codecan correct errors from (cid:98) ( n − α − / (cid:99) malicious nodes. When the match factor x ≤ . , thefractional rate code can correct errors from (cid:98) ( n − xd ) / (cid:99) malicious nodes. V. 2-L
AYER R ATE - MATCHED REGENERATING C ODE
In this section, we will show our first optimization of the rate-matched MSR code: 2-layer rate-matched MSR code. In the code design, we utilize two layers of the MSR code: the fractional ratecode for one layer and the full rate code for the other. The purpose of the fractional rate code is tocorrect the erroneous symbols sent by malicious nodes and locate the corresponding maliciousnodes. Then we can treat the errors in the received symbols as erasures when regeneratingwith the full rate code. However, the rates of the two codes must match to achieve an optimalperformance. Here we mainly focus on the rate-matching for data regeneration. We can see inthe later analysis that the performance of data reconstruction can also be improved with thisdesign criterion.We will first fix the error correction capabilities of the full rate code and the fractional ratecode. Then we will derive the optimal rate matching criteria to optimize the data storage efficiencyunder the fixed error correction capability.
A. Rate Matching
From the analysis above, we know that during regeneration, the fractional rate code can correctup to (cid:98) ( n − xd − / (cid:99) errors, which are more than (cid:98) ( n − d − / (cid:99) errors that the full rate codecan correct. In the 2-layer rate-matched MSR code design, our goal is to match the fractionalrate code with the full rate code. The main task for the fractional rate code is to detect andcorrect errors, while the main task for the full rate code is to maintain the storage efficiency.So if the fractional rate code can locate all the malicious nodes, the full rate code can simply November 4, 2015 DRAFT1 treat the symbols received from these malicious nodes as erasures, which requires the minimumredundancy for the full rate code. The full rate code can correct up to n − d − erasures. Thuswe have the following optimal rate-matching equation: (cid:98) ( n − xd − / (cid:99) = n − d − , (14)from which we can derive the match factor x . B. Encoding
To encode a file with size B F using the 2-layer rate-matched MSR code, the file will first bedivided into θ H blocks of data with the size B H and θ L blocks of data with the size B L , wherethe parameters should satisfy B F = θ H B H + θ L B L . (15)Then the θ H blocks of data will be encoded into code matrices CH , . . . , CH θ H using the fullrate code and the θ L blocks of data will be encoded into code matrices CL , . . . , CL θ L usingthe fractional rate code. To prevent the malicious nodes from corrupting the fractional rate codeonly, the secure server will randomly concatenate all the matrices together to form the final n × α ( θ H + θ L ) codeword matrix: CM = [ Perm ( CH , . . . , CH θ H , CL , . . . , CL θ L )] , (16)where Perm ( · ) is the random matrices permutation operation. The secure sever will also recordthe order of the permutation for future code regeneration and reconstruction. Then each row c i = [ Perm ( ch ,i , . . . , ch θ H ,i , cl ,i , . . . , cl θ L ,i )] ( ≤ i ≤ n − ) of the codeword matrix CM willbe stored in storage node i , where ch j,i is the i th row of CH j ( ≤ j ≤ θ H ), and cl j,i is the i th row of CL j ( ≤ j ≤ θ L ). The encoding vector ψ i for storage node i is the i th row of Ψ inequation (4). Therefore, we have the following Theorem. Theorem 1.
The encoding of 2-layer rate-matched MSR code can achieve the MSR point inequation (2) since both the full rate code and the fractional code are MSR codes.
November 4, 2015 DRAFT2
C. Regeneration
Suppose node z fails, the security server will initialize a replacement node z (cid:48) with the orderinformation of the fractional rate code and the full rate code in the 2-layer rate-matched MSRcode. Then the replacement node z (cid:48) will send regeneration requests to the rest of n − helpernodes. Upon receiving the regeneration request, helper node i will calculate and send out thehelp symbol p i = c i φ Tz . z (cid:48) will perform Algorithm 2 to regenerate the contents of the failednode z . After the regeneration is finished, z (cid:48) will erase the order information. So even if z (cid:48) wascompromised later, the adversary would not get the permutation order of the fractional rate codeand the full rate code. Algorithm 2. z (cid:48) regenerates symbols of the failed node z for the 2-layer rate-matched MSRcode Step 1:
According to the order information, regenerate all the symbols related to the θ L datablocks encoded by the fractional rate code, using Algorithm 1. If errors are detected inthe symbols sent by node i , it will be marked as a malicious node. Step 2:
Regenerate all the symbols related to the θ H data blocks encoded by the full rate code,using Algorithm 1. During the regeneration, all the symbols sent from nodes marked asmalicious nodes will be replaced by erasures (cid:78) .It is easy to see that Algorithm 2 can correct errors and locate malicious node using thefractional rate code while achieve high storage efficiency using the full rate code. We summarizethe result as the following Theorem. Theorem 2.
For regeneration, the 2-layer rate-matched MSR code can correct errors from (cid:98) ( n − xd − / (cid:99) malicious nodes.D. Parameters Optimization We have the following design requirements for a given distributed storage system applyingthe 2-layer rate-matched MSR code: • The maximum number of malicious nodes M that the system can detect and locate usingthe fractional rate code. We have (cid:98) ( n − xd − / (cid:99) = M. (17) November 4, 2015 DRAFT3 • The probability P det that the system can detect all the malicious nodes. The detection willbe successful if each malicious node modifies at least one help symbol corresponding tothe fractional rate code and sends it to the replacement node. Suppose the malicious nodesmodify each help symbol to be sent to the replacement node with probability P , we have (1 − (1 − P ) θ L ) M ≥ P det . (18)So there is a trade-off between θ L and θ H : the number of data blocks encoded by the fractionalrate code and the number of data blocks encoded by the full rate code. If we encode using toomuch full rate code, we may not meet the detection probability P det requirement. If too muchfractional rate code is used, the redundancy may be too high.The storage efficiency is defined as the ratio between the actual size of data to be stored andthe total storage space needed by the encoded data: δ S = θ H B H + θ L B L ( θ H + θ L ) nα = B F ( θ H + θ L ) nα . (19)Thus we can calculate the optimized parameters x , d , θ H , θ L by maximizing equation (19) underthe constraints defined by equations (14), (15), (17), (18). d and x can be determined by equation (14) and (17): d = n − M − , (20) x = ( n − M − / ( n − M − . (21)Since B F is constant, to maximize δ S is equal to minimize θ H + θ L . So we can rewrite theoptimization problem as follows:Minimize θ H + θ L , subject to (15) and (18) . (22)This is a simple linear programming problem. Here we will show the optimization results directly: θ L = log (1 − P ) (1 − P /M det ) , (23) θ H = ( B F − θ L B L ) /B H . (24)In this paper we assume that we are storing large files, which means B F > θ L B L . So an optimalsolution for the 2-layer rate-matched MSR code can always be found. We have the followingtheorem: November 4, 2015 DRAFT4
Theorem 3.
When the number of blocks of the fractional rate code θ L equals to log (1 − P ) (1 − P /M det ) and the number of blocks of the full rate code θ H equals to ( B F − θ L B L ) /B H , the 2-layerrate-matched MSR code can achieve the optimal storage efficiency.E. Reconstruction When DC needs to reconstruct the original file, it will send reconstruction requests to n storage nodes. Upon receiving the request, node i will send out the symbol vector c i . Suppose c (cid:48) i = c i + e i is the response from the i th storage node. If c i has been modified by the maliciousnode i , we have e i ∈ F α ( θ L + θ H ) q \{ } . Since DC has the permutation information of the fractionalrate code and the full rate code, similar to the regeneration of the 2-layer rate-matched MSRcode, DC will perform the reconstruction using Algorithm 3. Algorithm 3.
DC reconstructs the original file for the 2-layer rate-matched MSR code
Step 1:
According to the order information, reconstruct each of the θ L data blocks encoded bythe fractional rate code and locate the malicious nodes. Step 2:
Reconstruct each of the data blocks encoded by the full rate code. During the recon-struction, all the symbols sent from malicious nodes will be replaced by erasures (cid:78) .In Section V-D, we optimized the parameters for the data regeneration, considering the trade-off between the successful malicious node detection probability and the storage efficiency. Fordata reconstruction, we have the following theorem:
Theorem 4 (Optimized Parameters) . When the number of blocks of the fractional rate code θ L equals to log (1 − P ) (1 − P /M det ) and the number of blocks of the full rate code θ H equals to ( B F − θ L B L ) /B H , the 2-layer rate-matched MSR code can guarantee that the same constraintsfor data regeneration (equation (17), (18) ) be satisfied for the data reconstruction.Proof: The maximum number of malicious nodes can be detected for the data reconstructionis no smaller than M : if x > . , the number is (cid:98) ( n − α − / (cid:99) . We have (cid:98) ( n − α − / (cid:99) ≥(cid:98) ( n − xd − / (cid:99) = M . If x ≤ . , the number is (cid:98) ( n − xd ) / (cid:99) . We have (cid:98) ( n − xd ) / (cid:99) ≥(cid:98) ( n − xd − / (cid:99) = M .The successful malicious node detection probability for the data reconstruction is larger than P det : the probability is (1 − (1 − P ) αθ L ) M , so we have (1 − (1 − P ) αθ L ) M > (1 − (1 − P ) θ L ) M ≥ P det . November 4, 2015 DRAFT5
Although the rate-matching equation (14) does not apply to the data reconstruction, thereconstruction strategy in Algorithm 3 can still benefit from the different rates of the two codes.When x ≤ . , the fractional rate code can detect and correct (cid:98) ( n − xd ) / (cid:99) malicious nodes,which are more than (cid:98) ( n − d/ − / (cid:99) malicious nodes that the full rate code can detect. When x > . , the full rate code and the fractional rate code can detect and correct the same numberof malicious nodes: (cid:98) ( n − α − / (cid:99) .From the analysis above we can see that the same optimized parameters, which are obtained forthe data regeneration, can maintain the optimized trade-off between the malicious node detectionand storage efficiency for the data reconstruction. F. Performance Evaluation
From the analysis above, we know that for a distributed storage system with n storage nodesout of which at most M nodes are malicious, the 2-layer rate-matched MSR code can guaranteedetection and correction of the malicious nodes during the data regeneration and reconstructionwith the probability at least P det .For a distributed storage system with n = 30 , M = 11 and P = 0 . , suppose we have a filewith the size B F = 14000 M symbols to be stored in the system. The number of the fractionalrate code blocks θ L and the number of the full rate code blocks θ H for different detectionprobabilities P det are shown in Fig. 1. From the figure we can see that the number of fractionalrate code blocks will increase when the detection probability becomes larger. Accordingly, thenumber of full rate code blocks will decrease.For the universally resilient MSR code constructed in [2], the efficiency of the code with thesame regeneration performance as the 2-layer rate-matched MSR code is defined as δ (cid:48) S = α (cid:48) ( α (cid:48) + 1) α (cid:48) n = α (cid:48) + 1 n = xd/ n . (25)In Fig. 2 we will show the efficiency ratios η = δ S /δ (cid:48) S between the 2-layer rate-matched MSRcode and the universally resilient MSR code under different detection probabilities P det . Fromthe figure we can see that the 2-layer rate-matched MSR code has higher efficiency than theuniversally resilient MSR code. When the successful malicious nodes detection probability is . , the efficiency of the 2-layer rate-matched MSR code is about higher. November 4, 2015 DRAFT6 det nu m be r o f da t a b l o cks L H (0.99,146)(0.99,32)(0.999,42)(0.9999,53)(0.99999,63)(0.999,143)(0.9999,140)(0.99999,136) (0.999999,133)(0.999999,73) Fig. 1. The number of fractional/full rate code blocks for different P det VI. m -L AYER R ATE - MATCHED REGENERATING C ODE
In this section, we will show our second optimization of the rate-matched MSR code: m -layerrate-matched MSR code. In the code design, we extend the design concept of the 2-layer rate-matched MSR code. Instead of encoding the data using two MSR codes with different matchfactors, we utilize m layers of the full rate MSR codes with different parameter d , written as d i for layer L i , ≤ i ≤ m , which satisfy d i ≤ d j , ∀ ≤ i ≤ j ≤ m. (26)The data will be divided into m parts and each part will be encoded by a distinct full rate MSRcode. According to the analysis above, the code with a lower code rate has better error correctioncapability.The codewords will be decoded layer by layer in the order from layer L to layer L m . Thatis, the codewords encoded by the full rate MSR code with a lower d will be decoded prior tothose encoded by the full rate MSR code with a higher d . If errors were found by the full rateMSR code with a lower d , the corresponding nodes would be marked as malicious. The symbols November 4, 2015 DRAFT7 det e ff i c i en cy r a t i o efficiency ration (0.99, 1.95)(0.999, 1.87)(0.9999, 1.80)(0.99999, 1.74)(0.999999, 1.68) Fig. 2. Efficiency ratios between the 2-layer rate-matched MSR code and the normal error correction MSR code for different P det sent from these nodes would be treated as erasures in the subsequent decoding of the full rateMSR codes with higher d ’s. The purpose of this arrangement is to try to correct as many aserroneous symbols sent by malicious nodes and locate the corresponding malicious nodes usingthe full rate MSR code with a lower rate. However, the rates of the m full rate MSR codes mustmatch to achieve an optimal performance. Here we mainly focus on the rate-matching for dataregeneration. We can see in the later analysis that the performance of data reconstruction canalso be improved with this design criterion.The main idea of this optimization is to optimize the overall error correction capability bymatching the code rates of different full rate MSR codes. A. Rate Matching and Parameters Optimization
According to Section IV-A2, the full rate MSR code CH i for layer L i can be viewed as an ( n − , d i , n − d i ) MDS code for ≤ i ≤ m . During the optimization, we set the summation of November 4, 2015 DRAFT8 the d ’s of all the layers to a constant d : m (cid:88) i =1 d i = d . (27)Here we will show the optimization through an illustrative example first. Then we will presentthe general result.
1) Optimization for m = 3 : There are three layers of full rate MSR codes for m = 3 : CH , CH and CH .The first layer code CH can correct t errors: t = (cid:98) ( n − d − / (cid:99) = ( n − d − − ε ) / , (28)where ε = 0 or depending on whether ( n − d − / is even or odd.By regarding the symbols from the t nodes where errors are found by CH as erasures, thesecond layer code CH can correct t errors: t = (cid:98) ( n − d − − t ) / (cid:99) + t = ( n − d − − t − ε ) / t = (2( n − d ) + n − d − ε − ε − / , (29)where ε = 0 or , with the restriction that n − d − ≥ t , which can be written as: − d + 2 d ≤ n + ε − . (30)The third layer code CH also treat the symbols from the t nodes as erasures. CH can correct t errors: t = (cid:98) ( n − d − − t ) / (cid:99) + t = ( n − d − − t − ε ) / t (31) = (4( n − d ) + 2( n − d ) + n − d − ε − ε − ε − / , where ε = 0 or , with the restriction that n − d − ≥ t , which can be written as: − d − d + 4 d ≤ n + ε + 2 ε − . (32)According to the analysis above, the d ’s of the three layers satisfy: d − d ≤ , (33) d − d ≤ . (34) November 4, 2015 DRAFT9
And we can rewrite equation (27) as: d + d + d ≤ d , (35) − d − d − d ≤ − d . (36)To maximize the error correction capability of the m -layer rate-matched MSR code for m = 3 ,we have to maximize t , the number of errors that the third layer code CH can correct, since t has included all the malicious nodes from which errors are found by the codes of first twolayers. With all the constraints listed above, the optimization problem can written as:Maximize t in (31) , subject to (30) , (32) , (33) , (34) , (35) , (36) . (37)Now we have changed this optimization problem into a typical linear programming problem.This linear programming problem has a feasible solution. We solve it using the SIMPLEXalgorithm [34]. When d = d = d = Round ( d /
3) = (cid:101) d , the m -layer rate-matched MSR codecan correct errors from at most (cid:101) t = (7 n − (cid:101) d − ε − ε − ε − / ≥ (7 n − (cid:101) d − / (worst case) (38)malicious nodes, where Round( · ) is the rounding operation.
2) Evaluation of the Optimization for m = 3 : Similar to the storage efficiency δ S defined inSection V, here we can define the error correction efficiency δ C of the m -layer rate-matchedMSR code as the ratio between the maximum number of malicious nodes that can be found andthe total number of storage nodes in the network: δ C = (7 n − (cid:101) d − / (8 n ) . (39)The universally resilient MSR code with the same code rate can be viewed as an ( n − , (cid:101) d, n − (cid:101) d ) MDS code which can correct errors from at most ( n − (cid:101) d − / malicious nodes (best case). Sothe error correction efficiency δ (cid:48) C is δ (cid:48) C = ( n − (cid:101) d − / (2 n ) . (40)The comparison of the error correction capability between m -layer rate-matched MSR code for m = 3 and universally resilient MSR code is shown in Fig. 3. In this comparison, we set the November 4, 2015 DRAFT0
20 30 40 50 600.20.30.40.50.6 d E rr o r c o rr e c t i on e ff i c i en cy C ' C Fig. 3. Comparison of the error correction capability between m -layer rate-matched MSR code for m = 3 and universallyresilient MSR code number of storage nodes in the network n = 30 . From the figure we can see that the m -layerrate-matched MSR code for m = 3 improves the error correction efficiency more than .
3) General Optimization Result:
For the general m -layer rate-matched MSR code, the opti-mization process is similar.The first layer code CH can correct t errors as in equation (28). By regarding the symbolsfrom the t i − nodes where errors are found by CH i − as erasures, the i th layer code can correct t i errors for ≤ i ≤ m : t i = (cid:98) ( n − d i − − t i − ) / (cid:99) + t i − = ( n − d i − − t i − − ε i ) / t i − (41) = (cid:32) i (cid:88) j =1 j − ( n − d j ) − i (cid:88) j =1 j − ε j − i + 1 (cid:33) / i , where ε i = 0 or , with the restriction that n − d i − ≥ t i − , which can be written as: − i − (cid:88) j =1 j − d j + 2 i − d i ≤ n + i − (cid:88) j =1 j − ε j − . (42) November 4, 2015 DRAFT1
Similarly, the parameter d of the i th layer for ≤ i ≤ m must satisfy d i − − d i ≤ . (43)And equation (27) can be written as: m (cid:88) j =1 d j ≤ d , (44) − m (cid:88) j =1 d j ≤ − d . (45)We can maximize the error correction capability of the m -layer rate-matched MSR code bymaximizing t m . With all the constrains listed above, the optimization problem can be written as:Maximize t i for i = m in (41) , subject to (42) and (43) for ≤ i ≤ m, (44) , (45) . (46)After verifying that this linear programming problem has a feasible solution, we can use theSIMPLEX algorithm to solve it. The optimization result can be summarized as follows: Theorem 5.
For the regeneration of m -layer rate-matched MSR code, when d i = Round(d / m) = (cid:101) d for ≤ i ≤ m , (47) it can correct errors from at most (cid:101) t m = ((2 m − n − (cid:101) d ) − m (cid:88) j =1 j − ε j − m + 1) / m ≥ ((2 m − n − (cid:101) d ) − m +1 +2) / m (worst case) . (48) malicious nodes. The error correction efficiency for the m -layer rate-matched MSR code is δ C = ((2 m − n − (cid:101) d ) − m +1 + 2) / (2 m n ) . (49)This is a monotonically increasing function for m , so we have: Corollary 1.
The error correction efficiency of the m -layer rate-matched MSR code increaseswith m, which is the number of layers. November 4, 2015 DRAFT2
Remark 1.
During the optimization, we set the code rate of the rate-matched MSR code toa constant value and maximize the error correction capability. To optimizing the rate-matchedMSR code, we can also set the error correction capability t i for i = m in (41) to a constantvalue t m = t (50) and maximize the code rate. The problem can be written as:Maximize (cid:80) mj =1 d j subject to (42) and (43) for ≤ i ≤ m, (50) . (51) The optimization result is the same as that of (46): when all the d (cid:48) i s for ≤ i ≤ m are thesame, the code rate is maximized. d i , ≤ i ≤ m , satisfies the following equation: d i ≥ n − m t + 2 m +1 − m − (worst case) . (52)
4) Evaluation of the Optimization:
Although at the beginning of this section we propose todecode the code with a lower rate first in the m -layer rate-matched MSR code, equation (55)shows that we can get the optimized error correction capability when all the rates of the codesin the m -layer code are equal. However, this result is not in conflict with our assumption inequation (26). a) Comparison with the Hermitian code based MSR code in [3]: The Hermitian code basedMSR code (H-MSR code) in [3] has better error correction capability than the universally resilientMSR code. However, because the structure of the underlying Hermitian code is predetermined,the error correction capability might not be optimal. In figure 4, the maximum number ofmalicious nodes from which the errors can be corrected by the H-MSR code is shown. Here weset the parameter q of the Hermitian code [35] from 4 to 16 with a step of 2. In the figure, wealso plot the performance of the m -layer rate-matched MSR code with the same code rates as theH-MSR code. The comparison result demonstrates that the rate-matched MSR code has bettererror correction capability than the H-MSR code. Moreover, the rate-matched code is easier tounderstand and has more flexibility than the H-MSR code. b) Number of layers and error correction efficiency: Since we have seen the advantage ofthe rate-matched MSR code over the universally resilient MSR code in Section VI-A2, here wewill mainly discuss how the number of layers can affect the error correction efficiency. The error
November 4, 2015 DRAFT3 M a x i m un no . o f m a li c i ou s node s f r o m w h i c h t he e rr o r s c an be c o rr e c t ed Rate-matched MSRH-MSRNormal Error Correction MSR
Fig. 4. Comparison of error correction capability between the m -layer rate matched MSR code and the H-MSR code correction efficiency of the m -layer rate-matched MSR code is shown is Fig. 5, where we set n = 30 and d = 50 . We also plot the error correction efficiency δ (cid:48) C of the universally resilientMSR code with same code rates for comparison. From the figure we can see that when n and d are fixed, the optimal error correction efficiency will increase with the number of layers m as in Corollary 1. c) Optimized storage capacity: Moreover, the optimization condition in equation (55) alsoleads to maximum storage capacity besides the optimal error correction capability. We have thefollowing theorem:
Theorem 6.
The m -layer rate-matched MSR code can achieve the maximum storage capacityif the parameter d ’s of all the layers are the same, under the constraint in equation (27).Proof: The code of the i th layer can store one block of data with the size B i = α i ( α i + 1) =( d i / d i / . So the m -layer code can store data with the size B = (cid:80) mi =1 ( d i / d i / .Our goal here is to maximize B under the constraint in equation (27). November 4, 2015 DRAFT4 E rr o r c o rr e c t i on e ff i c i en cy C ' C Fig. 5. The optimal error correction efficiency of the m -layer rate-matched MSR code under different m for ≤ m ≤ We can use Lagrange multipliers to find the point of maximum B . Let Λ L ( d , . . . , d m , λ ) = m (cid:88) i =1 ( d i / d i / λ ( m (cid:88) i =1 d i − d ) . (53)We can find the maximum value of B by setting the partial derivatives of this equation to zero: ∂ Λ L ∂d i = d i + 12 − λ = 0 , ∀ ≤ i ≤ m. (54)Here we can see that when all the parameter d ’s of all the layers are the same, we can getthe maximum storage capacity B . This maximization condition coincides with the optimizationcondition for achieving the goal of this section: optimizing the overall error correction capabilityof the rate-matched MSR code. B. Practical Consideration of the Optimization
So far, we implicitly presume that there is only one data block of the size B i = α i ( α i + 1) foreach layer i . In practical distributed storage, it is the parameter d i that is fixed instead of d , thesummation of d i . However, as long as we use m layers of MSR codes with the same parameter d = (cid:101) d , we will still get the optimal solution for d = m (cid:101) d . In fact, the m -layer rate-matched November 4, 2015 DRAFT5 m E rr o r C o rr e c t i on E ff i c i en cy e d =5 e d =10 Fig. 6. The optimal error correction efficiency for ≤ m ≤ MSR code here becomes a single full rate MSR code with parameter d = (cid:101) d and m data blocks.And based on the dependent decoding idea we describe at the beginning of Section VI, we canachieve the optimal performance.So when the file size B F is larger than one data block size (cid:101) B of the single full rate MSRcode with parameter d = (cid:101) d , we will divide the file into (cid:100) B F / (cid:101) B (cid:101) data blocks and encode themseparately. If we decode these data blocks dependently, we can get the optimal error correctionefficiency.
1) Evaluation of the Optimal Error Correction Efficiency:
In the practical case, (cid:101) d in equa-tion (49) is fixed. So here we will study the relationship between the number of dependentlydecoding data blocks m and the error correction efficiency δ C , which is shown in Fig. 6. Weset n = 30 and (cid:101) d = 5 , . From the figure we can see that although δ C will become higherwith the increasing of dependently decoding data blocks m , the efficiency improvement will benegligible for m ≥ . Actually when m = 7 the efficiency has already become of the upperbound of δ C .On the other hand, there exist parallel algorithms for fast MDS code decoding [36]. We candecode blocks of MDS codewords parallel in a pipeline fashion to accelerate the overall decoding November 4, 2015 DRAFT6 speed. The more blocks of codewords we decode parallel, the faster we will finish the wholedecoding process. For large files that could be divided into a large amount of data blocks ( θ blocks), we can get a trade-off between the optimal error correction efficiency and the decodingspeed by setting the number of dependently decoding data blocks m and the number of paralleldecoding data blocks ρ under the constraint θ = mρ . C. Encoding
From the analysis above we know that to encode a file with size B F using the optimal m -layerrate-matched MSR code is to encode the file using a full rate MSR code with predeterminedparameter d = 2 α = (cid:101) d . First the file will be divided into θ blocks of data with size (cid:101) B , where θ = (cid:100) B F / (cid:101) B (cid:101) . Then the θ blocks of data will be encoded into code matrices CH , . . . , CH θ andform the final n × αθ codeword matrix: CM = [ CH , . . . , CH θ ] . Each row c i = [ ch ,i , . . . , ch θ,i ] , ≤ i ≤ n − , of the codeword matrix CM will be stored in storage node i , where ch j,i is the i th row of CH j , ≤ j ≤ θ . The encoding vector ψ i for storage node i is the i th row of Ψ inequation (4). Theorem 7.
The encoding of m-layer rate-matched MSR code can achieve the MSR point inequation (2) since both the full rate code and the fractional code are MSR codes.D. Regeneration
Suppose node z fails, the replacement node z (cid:48) will send regeneration requests to the rest of n − helper nodes. Upon receiving the regeneration request, helper node i will calculate andsend out the help symbol p i = c i φ Tz .As we discuss above, combining both dependent decoding and parallel decoding can achievethe trade-off between optimal error correction efficiency and decoding speed. Although all θ blocks of data are encoded with the same MSR code, z (cid:48) will place the received help symbolsinto a 2-dimension lattice with size m × ρ as shown in Fig. 7. In each grid of the lattice thereare n − help symbols corresponding to one data block, received from n − helper nodes. Wecan view each row of the lattice as related to a layer of an m -layer rate-matched MSR codewith ρ blocks of data, which will be decoded parallel. We also view each column of the latticeas related to m layers of an m -layer rate-matched MSR code with one block of data each layer, November 4, 2015 DRAFT7 data block data block data block ρ data block ρ+1 data block ρ+2 data block data block (m-1)ρ+1 data block (m-1)ρ+2 data block mρ Layer 1
Layer 2
Layer m
Parallel decode the rowParallel decode the rowParallel decode the rowParallel decode the row
Dependently decode the column
Dependently decode the column
Dependently decode the column
Dependently decode the column
Note: In each grid i there are n-1 help symbols received from n-1 help nodes, corresponding to data block i
Fig. 7. Lattice of received help symbols for regeneration which will be decoded dependently. z (cid:48) will perform Algorithm 4 to regenerate the contents ofthe failed node z .Arrange the received help symbols according to Fig. 7. Repeat the following steps from Layer to Layer m : Algorithm 4. z (cid:48) regenerates symbols of the failed node z for the m -layer rate-matched MSRcode Step 1:
For a certain grid, if errors are detected in the symbols sent by node i in previous layersof the same column, replace the symbol sent from node i by an erasure (cid:78) . Step 2:
Parallel regenerate all the symbols related to ρ data blocks using the algorithm similarto Algorithm 1 with only one difference: parallel decode all the ρ MDS codes in
Step1 of Algorithm 1.The error correction capability of the regeneration is described in Theorem 5.
November 4, 2015 DRAFT8
E. Reconstruction
When DC needs to reconstruct the original file, it will send reconstruction requests to n storagenodes. Upon receiving the request, node i will send out the symbol vector c i . Suppose c (cid:48) i = c i + e i is the response from the i th storage node. If c i has been modified by the malicious node i , wehave e i ∈ F αθq \{ } . The strategy of combining dependent decoding and parallel decoding forreconstruction is similar to that for regeneration. DC will place the received symbols into a2-dimension lattice with size m × ρ . The only difference is that in a grid of the lattice thereare n symbol vectors ch (cid:48) j, , . . . , ch (cid:48) j,n − corresponding to data block j , received from n storagenodes. DC will perform the reconstruction using Algorithm 5.Arrange the received symbols similar to Fig. 7. Here we place received codeword matrix CH (cid:48) j into grid j instead of help symbols received from n-1 help nodes. Repeat the following stepsfrom Layer to Layer m : Algorithm 5.
DC reconstructs the original file for the m -layer rate-matched MSR code Step 1:
For a certain grid, if errors are detected in the symbols sent by node i in previous layersof the same column, replace symbols sent from node i by erasures (cid:78) . Step 2:
Parallel reconstruct all the symbols of the ρ data blocks using the algorithm similarto Section IV-A3 with only one difference: parallel decode all the MDS codes inSection IV-A3.For data reconstruction, we have the following theorem: Theorem 8 (Optimized Parameters) . For the reconstruction of m -layer rate-matched MSR code,when d i = Round(d / m) = (cid:101) d for ≤ i ≤ m , (55) the number of malicious nodes from which the errors can be corrected is maximized.Proof: From Section VI-A we know that for regeneration of an optimal m -layer rate-matchedMSR code, the parameter d ’s of all the layers are the same, which implies the parameter α ’s of alllayers are also the same. Since the optimization of regeneration is derived based on the decodingof ( n − , d, n − d ) MDS codes and in reconstruction we have to decode ( n − , α, n − α ) MDS
November 4, 2015 DRAFT9 codes, if the parameter α ’s of all the layers are the same, we can achieve the same optimizationresults for reconstruction. VII. C ONCLUSION
In this paper, we develop two rate-matched regenerating codes for malicious nodes detectionand correction in hostile networks: 2-layer rate-matched regenerating code and m -layer rate-matched regenerating code. We propose the encoding, regeneration and reconstruction algorithmsfor both codes. For the 2-layer rate-matched code, we optimize the parameters for the dataregeneration, considering the trade-off between the malicious nodes detection probability and thestorage efficiency. Theoretical analysis shows that the code can successfully detect and correctmalicious nodes using the optimized parameters. Our analysis also shows that the code has higherstorage efficiency compared to the universally resilient regenerating code ( higher for thedetection probability . ). Then we extend the 2-layer code to m -layer code and optimizethe overall error correction efficiency by matching the code rate of each layer’s regeneratingcode. Theoretical analysis shows that the optimized parameter could also achieve the maximumstorage capacity under the same constraint. Furthermore, analysis shows that compared to theuniversally resilient regenerating code, our code can improve the error correction efficiency morethan . R EFERENCES [1] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,”
IEEE Transactions on Information Theory , vol. 56, pp. 4539 – 4551, 2010.[2] K. Rashmi, N. Shah, K. Ramchandran, and P. Kumar, “Regenerating codes for errors and erasures in distributed storage,”in
International Symposium on Information Theory (ISIT) 2012 , pp. 1202–1206, 2012.[3] J. Li, T. Li, and J. Ren, “Beyond the mds bound in distributed cloud storage,” in
INFOCOM, 2014 Proceedings IEEE ,pp. 307–315, April 2014.[4] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz, “Maintenance-free global datastorage,”
IEEE Internet Computing , vol. 5, pp. 40 – 49, 2001.[5] R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker, “Total recall: System support for automated availabilitymanagement,” in roc. Symp. Netw. Syst. Design Implementation , pp. 337–350, 2004.[6] D. Cullina, A. G. Dimakis, and T. Ho, “Searching for minimum storage regenerating codes,”
Available:arXiv:0910.2245 ,2009.[7] N. Shah, K. Rashmi, P. Kumar, and K. Ramchandran, “Explicit codes minimizing repair bandwidth for distributed storage,”in
Information Theory Workshop (ITW), 2010 IEEE , pp. 1–5, 2010.
November 4, 2015 DRAFT0 [8] C. Suh and K. Ramchandran, “Exact-repair mds codes for distributed storage using interference alignment,” in , pp. 161–165, 2010.[9] Y. Wu, “A construction of systematic mds codes with minimum repair bandwidth,”
IEEE Transactions on InformationTheory , vol. 57, no. 6, pp. 3738–3741, 2011.[10] D. Papailiopoulos, J. Luo, A. Dimakis, C. Huang, and J. Li, “Simple regenerating codes: Network coding for cloud storage,”in
INFOCOM, 2012 Proceedings IEEE , pp. 2801–2805, 2012.[11] S. El Rouayheb and K. Ramchandran, “Fractional repetition codes for repair in distributed storage systems,” in , pp. 1510–1517, 2010.[12] I. Tamo, Z. Wang, and J. Bruck, “Mds array codes with optimal rebuilding,” in , pp. 1240–1244, 2011.[13] V. R. Cadambe, C. Huang, S. A. Jafar, and J. Li, “Optimal repair of mds codes in distributed storage via subspaceinterference alignment,”
Available:arXiv:1106.1250 , 2011.[14] D. Papailiopoulos, A. Dimakis, and V. Cadambe, “Repair optimal erasure codes through hadamard designs,”
IEEETransactions on Information Theory , vol. 59, no. 5, pp. 3021–3037, 2013.[15] N. Shah, K. V. Rashmi, and P. Kumar, “A flexible class of regenerating codes for distributed storage,” in , pp. 1943–1947, 2010.[16] K. Shum and Y. Hu, “Existence of minimum-repair-bandwidth cooperative regenerating codes,” in , pp. 1–6, 2011.[17] A. Wang and Z. Zhang, “Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage,”in
INFOCOM, 2013 Proceedings IEEE , pp. 400–404, 2013.[18] H. Hou, K. W. Shum, M. Chen, and H. Li, “Basic regenerating code: Binary addition and shift for exact repair,” in , pp. 1621–1625, 2013.[19] Y.-L. Chen, G.-M. Li, C.-T. Tsai, S.-M. Yuan, and H.-T. Chiao, “Regenerating code based p2p storage scheme withcaching,” in
ICCIT ’09. Fourth International Conference on Computer Sciences and Convergence Information Technology,2009 , pp. 927–932, 2009.[20] Y. Wu, A. G. Dimakis, and K. Ramchandran, “Deterministic regenerating codes for distributed storage,” in , 2007.[21] A. Duminuco and E. Biersack, “A practical study of regenerating codes for peer-to-peer backup systems,” in
ICDCS ’09.29th IEEE International Conference on Distributed Computing Systems, 2009 , pp. 376 – 384, June 2009.[22] K. Shum, “Cooperative regenerating codes for distributed storage systems,” in , pp. 1–5, 2011.[23] Y. Wu and A. G. Dimakis, “Reducing repair traffic for erasure coding-based storage via interference alignment,” in
IEEEInternational Symposium on Information Theory, 2009. ISIT 2009. , pp. 2276–2280, 2009.[24] N. Shah, K. Rashmi, P. Kumar, and K. Ramchandran, “Interference alignment in regenerating codes for distributed storage:Necessity and code constructions,”
IEEE Transactions on Information Theory , vol. 58, pp. 2134 – 2158, 2012.[25] K. Rashmi, N. Shah, and P. Kumar, “Optimal exact-regenerating codes for distributed storage at the msr and mbr pointsvia a product-matrix construction,”
IEEE Transactions on Information Theory , vol. 57, pp. 5227–5239, 2011.[26] F. Oggier and A. Datta, “Byzantine fault tolerance of regenerating codes,” in , pp. 112–121, 2011.
November 4, 2015 DRAFT1 [27] S. Pawar, S. El Rouayheb, and K. Ramchandran, “Securing dynamic distributed storage systems against eavesdroppingand adversarial attacks,”
IEEE Transactions on Information Theory , vol. 57, pp. 6734 – 6753, 2011.[28] Y. Han, R. Zheng, and W. H. Mow, “Exact regenerating codes for byzantine fault tolerance in distributed storage,” in
Proceedings IEEE INFOCOM , pp. 2498 – 2506, 2012.[29] H. Chen and P. Lee, “Enabling data integrity protection in regenerating-coding-based cloud storage,” in , pp. 51–60, 2012.[30] C. Cachin and S. Tessaro, “Optimal resilience for erasure-coded byzantine distributed storage,” in
DSN 2006. InternationalConference on Dependable Systems and Networks, 2006 , pp. 115–124, 2006.[31] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie, “Lazy verification in fault-tolerant distributed storagesystems,” in
SRDS 2005. 24th IEEE Symposium on Reliable Distributed Systems, 2005 , pp. 179–190, 2005.[32] N. B. Shah, K. V. Rashmi, K. Ramchandran, and P. V. Kumar, “Privacy-preserving and secure distributed storage codes,” ∼ nihar/publications/privacy security.pdf/ .[33] J. Li, T. Li, and J. Ren, “Secure regenerating code,” in IEEE GLOBECOM 2014 , pp. 770–774, 2014.[34] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,
Introduction to Algorithms . The MIT Press, 3rd ed., 2009.[35] J. Ren, “On the structure of hermitian codes and decoding for burst errors,”
IEEE Transactions on Information Theory ,vol. 50, pp. 2850– 2854, 2004.[36] D. Dabiri and I. Blake, “Fast parallel algorithms for decoding reed-solomon codes based on remainder polynomials,”
IEEETransactions on Information Theory , vol. 41, pp. 873–885, Jul 1995., vol. 41, pp. 873–885, Jul 1995.