[PDF] Rebuilding for Array Codes in Distributed Storage Systems

Abstract

In distributed storage systems that use coding, the issue of minimizing the communication required to rebuild a storage node after a failure arises. We consider the problem of repairing an erased node in a distributed storage system that uses an EVENODD code. EVENODD codes are maximum distance separable (MDS) array codes that are used to protect against erasures, and only require XOR operations for encoding and decoding. We show that when there are two redundancy nodes, to rebuild one erased systematic node, only 3/4 of the information needs to be transmitted. Interestingly, in many cases, the required disk I/O is also minimized.

Full PDF

aa r X i v : . [ c s . I T ] S e p Rebuilding for Array Codes in DistributedStorage Systems

Zhiying Wang

Electrical Engineering DepartmentCalifornia Institute of TechnologyPasadena, CA 91125Email: [email protected]

Alexandros G. Dimakis

Electrical Engineering DepartmentUniversity of Southern CaliforniaLos Angeles, CA 90089-2560Email: [email protected]

Jehoshua Bruck

Electrical Engineering DepartmentCalifornia Institute of TechnologyPasadena, CA 91125Email: [email protected]

Abstract —In distributed storage systems that use coding,the issue of minimizing the communication required torebuild a storage node after a failure arises. We considerthe problem of repairing an erased node in a distributedstorage system that uses an EVENODD code. EVENODDcodes are maximum distance separable (MDS) array codesthat are used to protect against erasures, and only requireXOR operations for encoding and decoding. We show thatwhen there are two redundancy nodes, to rebuild one erasedsystematic node, only / of the information needs to betransmitted. Interestingly, in many cases, the required diskI/O is also minimized. I. I

NTRODUCTION

Coding techniques for storage systems have been usedwidely to protect data against errors or erasure for CDs,DVDs, Blu-ray Discs, and SSDs. Assume the data ina storage system is divided into packets of equal sizes.An ( n, k ) block code takes k information packets andencodes them into a total of n packets of the samesize. Among coding schemes, maximum distance sep-arable (MDS) codes offer maximal reliability for a givenredundancy: any k packets are sufﬁcient to retrieve allthe information. Reed-Solomon codes [1] are the mostwell known MDS codes that are used widely in storageand communication applications. Another class of MDScodes are MDS array codes, for example EVENODD [2]and its extension [3], B-code [4], X-code [5], RDP [6],and STAR code [7]. In an array code, each of the packetsconsists of a column of elements (one or more binarybits), and the parities are computed by XORing someinformation bits. These codes have the advantage of lowcomputational complexity over RS codes because theencoding and decoding only involve XOR operations.Distributed storage systems involving storage nodesconnected over networks have recently attracted a lot ofattention. MDS codes can be used for erasure protectionin distributed storage systems where encoded informationis stored in a distributed manner. If no more than n − k storage nodes are lost, then all the information can stillbe recovered from the surviving packets. Suppose onepacket is erased, and instead of retrieving the entire k packets of information, if we are only interested inrepairing the lost packet, then what is smallest amountof transmission needed (called the repair bandwidth )? If we transmit k packets from the other nodes to the erasedone, then by the MDS property, we can certainly repairthis node. But can we transmit less than k packets? Moregenerally, if no more than n − k nodes are erased, whatis the repair bandwidth? This repair problem was ﬁrstraised in [8], and was further studied in several works(e.g. [9]-[14]). A recent survey of this problem can befound in [15]. In [8], a cut-set lower bound for repairbandwidth is derived and in [11][12][13], this lowerbound is matched for exact repair by code constructionsfor k = 2 , , n − and k ≤ n . All of these constructionshowever require large ﬁnite ﬁelds. Very recently it wasestablished that the cut-set bound of [8] is achievable forall values of k and n , [13][14]. However, the proof istheoretical and is based on very large ﬁnite ﬁelds. Hence,it does not provide the basis for constructing practicalcodes with small ﬁnite ﬁelds and high rate.In this paper we take a different route: rather than try-ing to construct MDS codes that are easily repairable, wetry to ﬁnd ways to repair existing codes and speciﬁcallyfocus on the families of MDS array codes. A related andindependent work can be found in [16], where single-diskrecovery for RDP code was studied, and the recoverymethod and repair bandwidth is indeed similar to ourresult. Besides, [16] discussed balancing disk I/O readsin the recovery. Our work discusses the recovery of singleor double disk recovery for EVENODD, X-code, STAR,and RDP code.If the whole data object stored has size M bits,repairing a single erasure naively would require com-municating (and reading) M bits from surviving storagenodes. Here we show that a single failed systematic nodecan be rebuilt after communicating only M + O ( M / ) bits. Note that the cut-set lower bound [8] scales like M + O ( M / ) , so it remains open if the repair com-munication for EVENODD codes can be further reduced.Interestingly our repair scheme also requires signiﬁcantlyless disk I/O reads compared to naively reading the wholedata object.The rest of this paper is organized as follows. InSection II, we are going to deﬁne EVENODD code andthe repair problem. Then the repair of one lost node ispresented in Section III for EVENODD ( k = n − ) andn Section IV for the extended EVENODD ( k < n − ).In Section V, we consider the case with two erased nodesand k = n − . At last, conclusion is made in SectionVI. II. D EFINITIONS An R × n array code contains R rows and n columns(or packets). Each element in the array can be a singlebit or a block of bits. We are going to call an elementa block . In an ( n, k ) array code, k information columns,or systematic columns, are encoded into n columns. The total amount of information is M = Rk blocks.An EVENODD code [2] is a binary MDS array codethat can correct up to 2 column erasures. For a primenumber p ≥ , the code contains R = p − rows and n = p + 2 columns, where the ﬁrst k = p columnsare information and the last two are parity. And theinformation is M = ( p − p blocks.We will write an EVENODD code as: a , a , . . . a ,p b , b , a , a , . . . a ,p b , b , ... ... ... ... ... a p − , a p − , . . . a p − ,p b p − , b p − , And we deﬁne an imaginary row a p,j = 0 , for all j =1 , , . . . , p , where is a block of zeros. The slope 0 or horizontal parity is deﬁned as b i, = p X j =1 a i,j (1)for i = 1 , . . . , p − . The addition here is bit-by-bit XORfor two blocks. A parity block of slope v , − p < v < p and v = 0 is deﬁned as b i,v = p X j =1 a j, + S v = p X j =1 a ,j + S v (2)where S v = a p, + a p − v, + · · · + a

,p = P pj =1 a ,j and < x > = ( x − mod p +1 .Sometimes we omit the “ <> ” notation. When v = 1 ,we call it the slope 1 , or diagonal parity . In EVENODD,parity columns are of slopes 0 and 1.A similar code is RDP [6], where R = p − , n = p +1 ,and k = p − , for a prime number p . The diagonal paritysums up both the corresponding information blocks andone horizontal parity block. Another related code is X-code [5], where the parity blocks are of slope -1 and 1,and are placed as two additional rows, instead of twoparity columns.The code in [3] extended EVENODD to more than2 columns of parity. This code has n = p + r , k = p ,and R = p − . The information columns are the same asEVENODD, but r parity columns of slopes , , . . . , r − are used. It is shown in [3] that such a code is MDS when r ≤ and conditions for a code to be MDS are derivedfor r ≤ . STAR code [7] is an MDS array code with k = p, R = p − , n = p + 3 , and the parity columns are of slope 0,1, and -1.A parity group B i,v of slope v contains a parity block b i,v and the information blocks in the sum in equations(1) (2), i = 1 , , . . . , p − . S v is considered as a singleinformation block. If v = 0 , it is a horizontal paritygroup , and if v = 1 , we call it a diagonal parity group .By (1), each horizontal parity group B i, contains a i, ∈ B k, , for all k = 1 , , . . . , p − . So wesay B i, crosses with B k, , for all k = 1 , , . . . , p − .Conversely, each diagonal parity group B i, contains a k, ∈ B k, , for all k = 1 , , . . . , p − . Therefore, B i, crosses with B k, for all k = 1 , , . . . , p − . Theshared block of two parity groups is called the crossing .Generally, two parity groups B i,v and B k,u cross, for v = u , ≤ i, k ≤ p − . If they cross at a p, = 0 ,we call it a zero crossing . A zero crossing does not reallyexist since the p -th row is imaginary. A zero crossingoccurs if and only if u, v = 0 and < i + v > = < k + u > (3)Moreover, each information block belongs to only oneparity group of slope v .Suppose the n packets are stored in n different nodesin a connected network. Each storage node containsexactly one packet (or one column). Assume n − d nodes are erased, d ≥ k . Suppose we recover the nodessuccessively. For any speciﬁed erased node, how manyblocks from the other storage nodes are needed to recoverit? We can either send data in a single block, or alinear combination of several blocks in one node, bothof which are counted as one block of transmission.The total number of blocks transmitted to recover thespeciﬁed node is called the repair bandwidth γ . The repair problem for distributed storage system asks whatthe smallest γ is, for ﬁxed M, d, k . In [8], a cut-set lowerbound is derived (and is achieved only when each nodetransmits the same number of blocks): γ ∗ = M dk ( d − k + 1) (4)In this paper, we use MDS array codes as distributedstorage codes. We will give repair methods and computethe corresponding bandwidth γ . Example 1.

Consider the EVENODD code with p = 3 .Set a , = a , = 0 for all codewords, then the codewillcontainonly2columnsofinformation.Theresultingcode is a (4 , MDS code and this is called shortenedEVENODD (see Figure 1). It can be veriﬁed that if anynodeiserased,thensending1blockfromeachoftheothernodesissufﬁcienttorecoverit.Andthisactuallymatchesthe bound(4). Figure 1 shows how to recover the ﬁrst orthefourthcolumn.Noticethatasumblockissentinsomecases. For instance, to recover the ﬁrst column, the sum b , + b , issentfromthefourthcolumn. aaabaabaa aabaabaa ! ! ! ! aaabb ! a baa ! aab ! )( bbba ! aaabaabaa aabaabaa ! ! ! ! aaabaabaa ! ! aaaabb ! a a )( bbab aab ! ! Fig. 1. Repair of a (4 , EVENODD code if the ﬁrst column (topgraph) or the fourth column (bottom graph) is erased. In both cases,three blocks are transmitted.

In this paper, shortening of a code is not consideredand we will focus on the recovery of systematic nodes,given that 1 or 2 systematic nodes are erased. And wesend no linear combinations of data except the sum P p − i =1 b i,v from the parity node of slope v , for all v deﬁned in an array code. In addition, we assume thateach node can transmit a different number of blocks.III. R EPAIR FOR C ODES WITH

ARITY N ODES

First, let us consider the repair problem of losingone systematic node, n − d = 1 , and n − k = 2 . Wewill use EVENODD to explain the repair method, andthe recovery will be very similar if RDP or X-code isconsidered.By the symmetry of the code, we assume that theﬁrst column is missing. Each block in the ﬁrst columnmust be recovered through either the horizontal or thediagonal parity group including this block. Suppose weuse x horizontal parity groups and p − − x diagonalparity groups to recover the column, ≤ x ≤ p − .These parity groups include all blocks of the ﬁrst columnexactly once.Notice that S = P p − i =1 b i, + P p − i =1 b i, , so we cansend P p − i =1 b i, from the ( p + 1) -th node, and P p − i =1 b i, from the ( p + 2) -th node, and recover S with 2 blocksof transmission. For the discussion below, assume S isknown.For each horizontal parity group B i, , we send b i, and a i,j , j = 2 , , . . . , p . So we need p blocks. For eachdiagonal parity group B i, , as S is known, we send b i, and a j, , j = 1 , , . . . , i − , i + 1 , . . . , p − ,which is p − blocks in total.If two parity groups cross at one block, there is noneed to send this block twice. As shown in Section II,any horizontal and any diagonal parity group cross at ablock, and each block can be the crossing of two groups at most once. There are x ( p − − x ) crossings. The totalnumber of blocks sent is γ = xp |{z} horizontal + ( p − − x )( p − | {z } diagonal + 2 |{z} S − x ( p − − x ) | {z } crossings = ( p − p + 2 − ( x + 1)( p − − x ) (5) ≥ ( p − p + 2 − ( p − / p − p + 9) / The equality holds when x = ( p − / or x = ( p − / ,where x is an integer.This result states that we only need to send about / of the total amount of information. And the slopes ofthe n chosen parity groups do not matter as long as halfare horizontal and half are diagonal. Moreover, similarrepair bandwidth can be achieved using RDP or X-code.For RDP code, the repair bandwidth is p − which was also derived independently in [16]. For X-code, the repair bandwidth is at most p − p + 54 The derivation for RDP is the following. For RDPcode, the ﬁrst p − columns are information. The p -thcolumn is the horizontal parity. The ( p + 1) -th column isthe slope 1 diagonal parity (including the p -th column).The diagonal starting at a p, = 0 is not included in anydiagonal parities. Suppose the ﬁrst column is erased.Each horizontal or diagonal parity group will require p − blocks of transmission. Every horizontal paritygroup crosses with every diagonal parity group. Suppose ( p − / horizontal parity groups and ( p − / diagonalparity groups are transmitted. Then the total transmissionis γ = ( p − p − | {z } p − parity groups − p − p − | {z } crossings = 3( p − This result is also derived independently in [16].The derivation for X-code is as follows. For X-code,the ( p − -th row is the parity of slope -1, excludingthe p -th row. And the p -th row is the parity of slope 1,excluding the ( p − -th row. Suppose the ﬁrst columnis erased. First notice that for each parity group, p − blocks need to be transmitted. To recover the parity block a p − , , one has to transmit the slope -1 parity groupstarting at a p − , . To recover the parity block a p, , theslope 1 parity group starting at a p, must be transmitted.But it should be noted that by the construction of X-code, this slope 1 parity group essentially is the diagonalstarting at a p − , , except for the ﬁrst element a p, . Zerocrossings happen between two parity groups of slopes -1and 1, starting at a i, and a j, , if < i + j > = p − or < i + j > = p ach slope 1 parity group has no more than 2 zerocrossings with the slope -1 parity groups.Suppose we choose arbitrarily ( p − / slope 1parity groups and ( p − / slope -1 parity groups forthe information blocks in the ﬁrst column. Then notconsidering the parity group containing a p, , the numberof slope 1 and slope -1 parity groups are both ( p − / .Excluding zero crossings, each slope 1 parity groupcrosses with at least ( p − / − p − / slope -1 parity groups. The total transmission is γ ≤ p ( p − | {z } p parity groups − p − p − | {z } crossings = 3 p − p + 54 Also, equation (5) is optimal in some conditions:

Theorem 2.

Thetransmissionbandwidthin(5)isoptimalto recover a systematic node for EVENODD if no linearcombinationsaresentexcept P p − i =1 b i,v ,for v = 0 , . Proof:

To recover a systematic node, say, the ﬁrstnode, parity blocks b i,v , i = 1 , , . . . , p − must besent, where v can be 0 or 1 for each i . This is because a i, is only included in b i, or b i, . Besides, given b i,v ,the whole parity group B i,v must be sent to recoverthe lost block. Therefore, our strategy of choosing x horizontal parity groups and p − − x diagonal paritygroups has the most efﬁcient transmission. Finally, since(5) is minimized over all possible x , it is optimal.The lower bound by (4) is M d ( d − k + 1) k = M ( n − n − k ) k = p ( p − p + 1)2 p = p − where d = n − , n = p + 2 , k = p , and M = p ( p − .It should be noted that (4) assumes that each node sendsthe same number of blocks, but our method does not. Example 3.

Consider the EVENODD code with p = 5 in Figure 2. For ≤ i ≤ , the code has informationblocks a i,j , ≤ j ≤ , and parity blocks b i,v , v = 0 , .Suppose the ﬁrst column is lost. Then by (5), we canchoose parity groups B , , B , , B , , B , . The blockssent are: P p − i =1 b i, , P p − i =1 b i, , b , , b , , b , , b , fromthe parity nodes and a , , a , , a , , a , , a , , a , , a , ,a , , a , , a , fromthesystematic nodes.Altogether,wesend 16 blocks, the number speciﬁed by (5). We cansee that a , is the crossing of B , and B , . Similarly, a , , a , , a , arecrossingsandareonlysentoncefortwoparitygroups. bbaaaaa Systematic

Nodes

Parity

Nodes bbaaaaa bbaaaaa bbaaaaa

Fig. 2. Repair of an EVENODD code with p = 5 . The ﬁrst columnis erased, shown in the box. 14 blocks are transmitted, shown by theblocks on the horizontal or diagonal lines. Each line (with wrap around)is a parity group. 2 blocks in summation form, P p − i =1 b i, , P p − i =1 b i, are also needed but are not shown in the graph. IV. r P ARITY N ODES AND O NE E RASED N ODE

Next we discuss the repair of array codes with r columns of parity, r ≥ . And we consider the recoveryin the case of one missing systematic column. In thissection, we are going to use the extended EVENODDcode [3], i.e. codes with parity columns of slopes , , . . . , r − . Similar results can be derived for STARcode. Suppose the ﬁrst column is erased without loss ofgenerality.Let us ﬁrst assume r = 3 , so the parity columnshave slopes , , . The repair strategy is: sending paritygroups B n + v,v for v = 0 , , and ≤ n + v ≤ p − .Let A = ⌊ ( p − / ⌋ . Notice that ≤ n ≤ A andeach slope has no more than ⌈ ( p − / ⌉ but no lessthan ⌊ ( p − / ⌋ = A parity groups.Since there are three different slopes, there are cross-ings between slope 0 and 1, slope 1 and 2, and slope2 and 0. For any two parity groups B i, and B k, , < k − i > = 1 , so (3) does not hold. Hence no zerocrossing exists for the chosen parity groups. Hence,every crossing corresponds to one block of saving intransmission. However, the total number of crossings isnot equal to the sum of crossings between every twoparity groups with different slopes. Three parity groupswith slopes 0, 1, and 2 may share a common block, whichshould be subtracted from the sum.Notice that the parity group B i,v contains the block a i − vy,y +1 . The modulo function “ <> ” is omitted inthe subscripts. For three transmitted parity groups B n, , B m +1 , , B l +2 , , if there is a common blockin column y + 1 , then it is in row n ≡ m + 1 − y ≡ l + 2 − y ( mod p ) . To solve this, we get y ≡ m − n ) + 1 ≡ l − m ) + 1 ( mod p ) , or m − n ≡ l − m ( mod p ) . Notice ≤ n, m, l < p/ , so − p/ < m − n, l − m < p/ . Therefore, m − n = l − m without modulo p . Thus l − n must be an even number.For ﬁxed n , either n ≤ m ≤ l ≤ A , and there areno more than ( A − n ) / solutions for ( m, l ) ; or ≤ l < m < n , and the number of ( m, l ) is no morethan n/ . Hence, the number of ( n, m, l ) is no more than P An =1 (( A − n ) / n/

2) = A / A .The total number of blocks in the p − chosen paritygroups is less than p ( p − . There are no less than A parity groups of slope v , for all ≤ v ≤ , thereforeor ≤ u < v ≤ , parity groups with slopes u and v have no less than A crossings. Hence the total numberof blocks sent in order to recover one column is: γ < p ( p − | {z } p − parity groups − (cid:18) (cid:19) A | {z } crossings + A + 2 A | {z } common + 3 |{z} P p − i =1 b i,v < p + 179 p − (6)where ( p − / < A ≤ ( p − / . The above estimationis an upper bound because there may be better ways toassign the slopes of each parity group. Thus, we need tosend no more than M/ blocks if r = 3 .By abuse of notation, we write B m,v = { a ,j : j = 2 , . . . , p } as the setof blocks (including the imaginary p -th row)in the parity group except S v and a m, . Let M v ⊆ { , , . . . , q − } , ≤ v ≤ r − , bedisjoint sets such that ∪ r − v =0 M v = { , , . . . , q − } . Let B M v ,v = ∪ m ∈ M v B m,v . For given M v , deﬁne a function f as f ( v , v , . . . , v k ) = |{ m ∈ M v , . . . , m k ∈ M v k :( m − m ) / ( v − v ) ≡ ( m − m ) / ( v − v ) ≡ . . . ( m k − m k − ) / ( v k − v k − ) mod p }| , for k ≥ , and ≤ v < v < · · · < v k ≤ r − . Then we have thefollowing theorem: Theorem 4.

FortheextendedEVENODDwith r ≥ ,therepairbandwidthforoneerasedsystematicnodeis γ < p ( p −

1) + p + r − X ≤ v
Suppose the ﬁrst column is missing andwe transmit the parity groups B m,v , m ∈ M v for v = 0 , , . . . , r − . Since the union of M v covers { , , . . . , q − } , all the blocks in the ﬁrst column can berecovered. The repair bandwidth is the cardinality of theunion of B M v ,v plus the number of zero crossings andthe summation blocks P p − i =1 b i,v . The number of zerocrossings is no more than the size of the imaginary row, p . The number of the summation blocks is r .By inclusion–exclusion principle, the cardinality of theunion of B M v ,v is X ≤ v ≤ r − | B M v ,v | − X ≤ v ,j , j = 2 , . . . , p , the intersection of more than two parity groups B m ,v , . . . , B m k ,v k isequivalent to the solutions of m − v y ≡ m − v y ≡ · · · ≡ m k − v k y mod p where y + 1 is the column index of the intersection. Or, y ≡ m − m v − v ≡ · · · ≡ m k − m k − v k − v k − mod p Therefore, | B M v ,v ∩ B M v ,v ∩ . . . B M vk ,v k | = f ( v , v , . . . , v k ) And (7) follows.We can see that (6) is a special case of (7), with M v = { n + v : 1 ≤ n + v ≤ p − } , for v = 0 , , . For r = 4 , , we can derive similar bounds by deﬁning M v .Choose M v = { rn + v : 1 ≤ rn + v ≤ p − } (8)for v = 0 , , . . . , r − . Let A = ⌊ ( p − /r ⌋ . And for ≤ v < v < v ≤ r − , f ( v , v , v ) becomes thenumber of ( n , n , n ) , ≤ rn i + v i ≤ p − , such that ( n − n )( v − v ) ≡ ( n − n )( v − v ) mod p Since − p/r < n − n , n − n < p/r , and ( v − v ) +( v − v ) < r , the above equation becomes ( n − n )( v − v ) = ( n − n )( v − v ) without modulo p . Therefore, n − n = ( n − n ) + ( n − n )= c · lcm ( v − v , v − v ) (cid:18) v − v + 1 v − v (cid:19) = c v − v gcd ( v − v , v − v ) where c is an integer constant, lcm is the least commonmultiplier and gcd is the greatest common divisor. Andfor ﬁxed n , the number of solutions for ( n , n ) is nomore than A − n ) gcd ( v − v , v − v ) / ( v − v ) ,when n ≤ n ≤ n ≤ A ; and no more than n gcd ( v − v , v − v ) / ( v − v ) , when ≤ n < n < n . Thenumber of ( n , n , n ) is f ( v , v , v ) < X n A − n + n ) gcd ( v − v , v − v ) v − v = A (cid:18) A gcd ( v − v , v − v ) v − v (cid:19) Similarly, for four parity groups, f ( v , v , v , v ) > A (cid:18) A + 2) gcd ( v − v , v − v , v − v ) v − v (cid:19) For ﬁve parity groups, f ( v , v , v , v , v ) < A + A gcd ( v − v , v − v , v − v , v − v ) v − v hen r = 4 , equation (7) becomes γ < p ( p −

1) + p + 4 − X ≤ v A (1 + ( A + 2) / And the repair bandwidth is γ ≈ p − (cid:18) (cid:19) ( p +(2 ×

12 +2 × p −

13 ( p = 724 p where the terms of lower orders are omitted.When r = 5 , we can use (7) again and get γ ≈ p +( − (cid:18) (cid:19) + 42 + 43 + 24 − −

34 + 14 )( p = 5375 p where the terms of lower orders are omitted.It should be noted that the number of common blocksaffects the bandwidth a lot. If we consider only the ﬁrst 4terms in (7), any assignment of M v with equal sizes willresult in a lower bound of γ > ( r + 1) p / (2 r ) ≈ p / ,when r is large. But due to the common blocks, the true γ values for r = 4 , using (8) has only slight improvementcompared to the case of r = 3 .The lower bound (4) is Mdk ( d − k +1) = p ( p − p + r − pr ≈ p ( p + r − r . When r = 3 , this bound is about p / .V. 3 P ARITY N ODES AND

RASED N ODES

Up to now, we have considered the recovery problemgiven that one column is erased. Next, let us assumethat two information columns are erased and we need torecover them successively. So we ﬁrst recover one of theerased nodes, and then the other one. The ﬁrst recoveryis discussed in this section, and the second recovery wasalready discussed in the previous sections. Suppose wehave 3 columns of parity with slopes -1, 0, and 1, whichis in fact the STAR code in [7]. Again, the argumentscan be applied to extended EVENODD in a similar way.Without loss of generality, assume the ﬁrst and ( x +1) -thcolumns are missing, ≤ x ≤ p − .Let B i, , B i, , and B i, − be i -th parity group of slopes0, 1, and -1, respectively, i = 1 , , . . . , p − . Thefollowing are p − / parity groups that repair the ﬁrstcolumn: B , − , B x, , B x, , B x, − , B x, , B x, , . . . ,B ( p − x, − , B ( p − x, , B ( p − x, . For each parity blockabove, the corresponding recovered blocks are: a x, x ,a x, , a x, , a x, x , a x, , a x, , . . . , a ( p − x, x ,a ( p − x, , a ( p − x, . An example of p = 5 , x = 1 isshown in Figure 3.Rearrange the columns in the following order:Columns , x, x, . . . , p − x (every index is computed modulo p ). We can see that the chosenparity groups B jx, , j = x, x, . . . , ( p − x contain theblocks in Rows Z = { x, x, . . . , ( p − x } . B jx, con-tains blocks a jx, , a ( j − x, x , . . . , a ( j − p +1) x, p − x ,for j = 2 , , . . . , p − . And similarly B jx, − con-tains blocks a jx, , a ( j +1) x, x , . . . , a ( j + p − x, p − x ,for j = 0 , , . . . , p − .Now notice that the blocks included in the aboveparity groups have the (1 + x ) -th column as the verticalsymmetry axis. That is, the row indices of the blocksneeded in Columns and x are the same; those ofColumns p − x and x are the same; ...; thoseof Columns p + 3) x/ and p + 1) x/ are thesame. For example, the second column in Figure 3 is thesymmetry axis. Thus, we only need to consider Columns x, x, . . . , p + 1) x/ .For columns ix , where i is even and ≤ i ≤ ( p +1) / , parity groups { B x, , B x, , . . . , B ( p − x, } in-clude the blocks in Rows X = { x, x, . . . , ( p − − i ) x } .And parity groups { B , − , B x, − , . . . , B ( p − x, − } in-clude the blocks in Rows Y = { ix, ( i + 2) x, . . . , ( p − x } . Since ≤ i ≤ ( p +1) / , we have i ≤ ( p − − i )+2 ,and X ∪ Y = { x, x, . . . , ( p − x } . Hence X ∪ Y ∪ Z = { , , . . . , p − } . Thus every block in Column ix needs to be sent, for even i .Similarly, for Columns ix , where i is oddand ≤ i ≤ ( p + 1) / , parity groups { B x, , B x, , . . . , B ( p − x, } include the blocks inRows X = { ( p − i + 2) x, ( p − i + 4) x, . . . , ( p − x } .Parity groups { B , − , B x, − , . . . , B ( p − x, − } includethe blocks in Rows Y = { x, x, . . . , ( i − x } . Since ≤ i ≤ ( p + 1) / , we have i − < p − i + 2 , and X ∪ Y = { x, x, . . . , ( i − x, ( p − i + 2) x, ( p − i +4) x, . . . , ( p − x } . Therefore, the rows not included in X or Y or Z are W = { ( i − x, ( i + 1) x, . . . , ( p − i ) x } and | W | = ( p + 3) / − i . The total saving in blocktransmissions for all the columns is: X i odd, ≤ i ≤ ( p +1) / ( p + 32 − i ) = ( ( p − , p +12 odd ( p +1)( p − , p +12 evenThe above argument can be summarized in the follow-ing theorem. Theorem 5.

When two systematic nodes are erased in aSTAR code,thereexistastrategythattransmitabout / of all the information blocks, and about / of all theparityblockssoastorecoveronenode.The repair bandwidth γ in the above theorem isabout p / . Comparing it to the lower bound (4), Mdk ( d − k +1) = p ( p − p +1)2 p ≈ p , we see a gap of p in total transmission.VI. C ONCLUSIONS

We presented an efﬁcient way to repair one lost nodein EVENODD codes and two lost nodes in STAR codes.Our achievable schemes outperform the naive method of aaaaa aaaaa aaaaa aaaaa aaaaa

Fig. 3. The recovery strategy for the ﬁrst column in STAR code whenthe ﬁrst and second columns are missing. p = 5 , x = 1 . rebuilding by reconstructing all the data. For EVENODDcodes, a bandwidth of roughly M/ is sufﬁcient torepair an erased systematic node. Moreover, if no linearcombinations of bits are transmitted, the proposed repairmethod has optimal repair bandwidth with the sole ex-ception of the sum of the parity nodes. Since array codesonly operate on binary symbols, and our repair methodinvolves no linear combination of content within a nodeexcept in the parity nodes, the proposed construction iscomputationally simple and also requires smaller diskI/O to read data during repairs.There are several open problems on using array codesfor distributed storage. Although our scheme does notachieve the information theoretic cut-set bound, it is notclear if that bound is achievable for ﬁxed code structuresor limited ﬁeld sizes. If we allow linear combinationsof bits within each node, the optimal repair remainsunknown. Our simulations indicate that shortening ofEVENODD (using less than p columns of information)further reduces the repair bandwidth but proper short-ening rules and repair methods need to be developed.Repairing other families of array codes or Reed-Solomoncodes would also be of substantial practical interest.R EFERENCES[1] I. Reed and G. Solomon. Polynomial codes over certain ﬁniteﬁelds.

Journal of the SIAM , 8(2):300–304, 1960.[2] M. Blaum, J. Brady, J. Bruck, and J. Menon. EVENODD:an efﬁcient scheme for tolerating double disk failures in raidarchitectures.

IEEE Trans. on Computers , 44(2):192–202, 1995.[3] M. Blaum, J. Bruck, and A. Vardy. MDS array codes withindependent parity symbols.

IEEE Trans. on Information Theory ,42(2):529–542, 1996.[4] L. Xu, V. Bohossian, J. Bruck, and D. G. Wagner. Low-densityMDS codes and factors of complete graphs.

IEEE Trans. onInformation Theory , 45(6):1817–1826, 1999.[5] L. Xu and J. Bruck. X-Code: MDS array codes with optimalencoding.

IEEE Trans. on Information Theory , 45(1):272–275,1999.[6] P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong,and S. Sankar. Row-diagonal parity for double disk failurecorrection. In

Proc. of the 3rd USENIX Symposium on File andStorage Technologies (FAST ’04) , pages 1–14, 2004.[7] C. Huang and L. Xu. STAR: an efﬁcient coding scheme forcorrecting triple storage node failures.

IEEE Trans. on Computers ,57(7):889–901, 2008.[8] A. G. Dimakis, P. G. Godfrey, Y. Wu, M. J. Wainwright, andK. Ramchandran. Network coding for distributed storage systems.

IEEE Trans. on Information Theory , to appear.[9] Y. Wu, A. G. Dimakis, and K. Ramchandran. Deterministicregenerating codes for distributed storage. In

Allerton Conferenceon Control, Computing, and Communication , 2007.[10] Y. Wu. Existence and construction of capacity-achieving networkcodes for distributed storage. In

Proc. IEEE ISIT , 2009. [11] Y. Wu and A. G. Dimakis. Reducing repair trafﬁc for erasurecoding-based storage via interference alignment. In

Proc. IEEEISIT , 2009.[12] D. Cullina, A. G. Dimakis, and T. Ho. Searching for minimumstorage regenerating codes. In

Allerton Conference on Control,Computing, and Communication , 2009.[13] C. Suh and K. Ramchandran. Exact regeneration codes fordistributed storage repair using interference alignment. In

Proc.IEEE ISIT , 2010.[14] V. R. Cadambe, S. A. Jafar, and H. Maleki. Distributeddata storage with minimum storage regenerating codes - ex-act and functional repair are asymptotically equally efﬁcient.http://arxiv.org/pdf/1004.4299.[15] A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh.A survey on network codes for distributed storage.http://arxiv.org/pdf/1004.4438.[16] L. Xiang, Y. Xu, J. C.S. Lui, and Q. Chang. Optimal recoveryof single disk failure in RDP code storage systems.

Related Researches

Optimal SIC Ordering and Power Allocation in Downlink Multi-Cell NOMA Systems

by Sepehr Rezvani

Differential Privacy for Binary Functions via Randomized Graph Colorings

by Rafael G. L. D'Oliveira

The Exact Rate Memory Tradeoff for Small Caches with Coded Placement

by Vijith Kumar K P

Moving Object Classification with a Sub-6 GHz Massive MIMO Array using Real Data

by B. R. Manoj

Constrained Secrecy Capacity of Finite-Input Intersymbol Interference Wiretap Channels

by Aria Nouri

Mutual Information of Neural Network Initialisations: Mean Field Approximations

by Jared Tanner

Two-Dimensional Golay Complementary Array Sets from Generalized Boolean Functions

by Cheng-Yu Pai

Quantum Algorithm for DOA Estimation in Hybrid Massive MIMO

by Fanxu Meng

Distributed Storage Allocations for Optimal Service Rates

by Pei Peng

A Theoretical Answer to "Does the IRC-SINR of an Interference Rejection Combiner always Increase with an Increase in Number of Receive Antennas?"

by Karthik Muralidhar

Compressed Shaping: Concept and FPGA Demonstration

by Tsuyoshi Yoshida

Variations on a Theme by Massey

by Olivier Rioul

Learning-based WiFi Traffic Load Estimation in NR-U Systems

by Rui Yin

Design of Polar Code Lattices of Small Dimension

by Obed Rhesa Ludwiniananda

Distributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning

by Rui Yin

Multilevel Topological Interference Management: A TIM-TIN Perspective

by Chunhua Geng

Semiquantitative Group Testing in at Most Two Rounds

by Mahdi Cheraghchi

Max-log APP Detection for Non-bijective Symbol Constellations

by Martin Damrath

Coded Computing with Noise

by Royee Yosibash

Learning to Decode Protograph LDPC Codes

by Jincheng Dai

Spectral Graph Theory Based Resource Allocation for IRS-Assisted Multi-Hop Edge Computing

by Huilian Zhang

Robust and Secure Cache-aided Private Linear Function Retrieval from Coded Servers

by Qifa Yan

Frame Based Codes for Partially Active NOMA

by Maya Slamovich

Communications using Sparse Signals

by Madhusudan Kumar Sinha

Bounds on List Decoding of Linearized Reed-Solomon Codes

by Sven Puchinger

«

1

2

3

4

»

Submitted on 16 Sep 2010 Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar