[PDF] Rack-Aware Cooperative Regenerating Codes

Abstract

In distributed storage systems, cooperative regenerating codes tradeoff storage for repair bandwidth in the case of multiple node failures. In rack-aware distributed storage systems, there is no cost associated with transferring symbols within a rack. Hence, the repair bandwidth will only take into account cross-rack transfer. Rack-aware regenerating codes for the case of single node failures have been studied and their repair bandwidth tradeoff characterized. In this paper, we consider the framework of rack-aware cooperative regenerating codes for the case of multiple node failures where the node failures are uniformly distributed among a certain number of racks. We characterize the storage repair-bandwidth tradeoff as well as derive the minimum storage and minimum repair bandwidth points of the tradeoff. We also provide constructions of minimum bandwidth rack-aware cooperative regenerating codes for all parameters.

Full PDF

RRack-Aware Cooperative Regenerating Codes

Shreya Gupta, V. Lalitha

SPCRC, International Institute of Information Technology HyderabadEmail: { [email protected], [email protected] } Abstract —In distributed storage systems, cooperative regener-ating codes tradeoff storage for repair bandwidth in the case ofmultiple node failures. In rack-aware distributed storage systems,there is no cost associated with transferring symbols within arack. Hence, the repair bandwidth will only take into accountcross-rack transfer. Rack-aware regenerating codes for the case ofsingle node failures have been studied and their repair bandwidthtradeoff characterized. In this paper, we consider the frameworkof rack-aware cooperative regenerating codes for the case ofmultiple node failures where the node failures are uniformlydistributed among a certain number of racks. We characterize thestorage repair-bandwidth tradeoff as well as derive the minimumstorage and minimum repair bandwidth points of the tradeoff.We also provide constructions of minimum bandwidth rack-aware cooperative regenerating codes for all parameters.

I. I

NTRODUCTION

Traditional erasure coding techniques, though storage-efﬁcient, incur a large cost in terms of repair bandwidthto handle node failures where repair bandwidth is the totalamount of data download needed to repair a failed node.Regenerating codes [1] are a class of codes designed toefﬁciently tradeoff storage efﬁciency for repair bandwidth. Inthe following, we will describe two variants of the settingof regenerating codes which are important in the context ofthe current paper. The ﬁrst one allows to handle multiplenode failures and are known as cooperative regenerating codes.The second one allows rack-based topology of nodes and areknown as rack-aware regenerating codes.

A. Cooperative Regenerating Codes

Cooperative Regenerating Codes are deﬁned using parame-ters ( n, k, d, e, α, β , β ) . The total number of nodes is denotedby n and k denotes the number of nodes required for thedata collection process. Any k out of n nodes are sufﬁcient toreconstruct the whole ﬁle. Each node consists of α symbols.The original data ﬁle B is stored on n nodes and is encodedinto nα symbols. The number of simultaneous failures isdenoted by e . In case of failures, cooperative regeneratingcodes do the repair in two rounds. In the ﬁrst round, all thenew (or replacement) nodes download β symbols from any d non-failed nodes, where k ≤ d ≤ n − e and in the secondround all the replacement nodes share their information withthe all the other e − replacement nodes. So every replacementnode receives β symbols from e − other replacement nodes.This gives the repair bandwidth per replacement node to be γ = dβ + ( e − β .For ﬁxed values of ( n, k, d, e ) , there exist a bound on theﬁle size B and for the ﬁxed value of B , it gives a storage ( α ) -bandwidth ( γ ) tradeoff. The corner points of the tradeoff where α is minimum, known as Minimum Storage CooperativeRegeneration(MSCR) point, and where γ is minimum, knownas Minimum Bandwidth Cooperative Regeneration(MBCR)point, are given by: ( α MSCR , γ

MSCR ) = (

Bk , B ( d + e − d + e − k ) (1) α MBCR = γ MBCR = B (2 d + e − k (2 d + e − k ) (2)Cooperative regenerating codes reduce to regenerating codesby setting e = 1 . MSCR and MBCR points become minimumstorage regeneration (MSR) and minimum bandwidth regen-eration (MBR) points, respectively, when e = 1 . Cooperativeregenerating codes have been studied in [2], [3]. ExplicitMBCR codes have been constructed in [4], [5]. Constructionsof MSCR codes have been presented in [5], [6].There is another setting of regenerating codes for the caseof multiple node failures where a central node downloads allthe information from the helper nodes and reconstructs all thefailed nodes. This setting is known as centralized repair andhas been studied in [7]. B. Rack-aware Regenerating Codes (RRCs)

Rack-Aware Regenerating code (RRC) is denoted by param-eters n, k, d, r, α, β . In RRC, a data ﬁle of size B is encodedinto nα symbols and stored on n nodes in r racks where r divides n such that each rack consists of nr nodes. The datacollector reconstructs the whole ﬁle and requires downloading α symbols from any k out of n nodes. In case of a failure ofa node in rack h , the new node downloads β symbols fromany (cid:98) krn (cid:99) ≤ d < r helper racks(or relayers) other than rack h and α symbols from all the other nodes within the rack h to reconstruct the failed node. Each rack contains a relayernode which can access the contents of other nodes within therack. The cross-rack repair bandwidth in RRC to reconstructthe failed node is given by: γ = dβ .For ﬁxed parameters ( n, k, d, r ) and B , there exist astorage ( α ) - bandwidth ( dβ ) tradeoff. The point on the tradeoffcurve where storage per node α is minimum is denoted byMinimum Storage Rack-aware Regeneration (MSRR) pointand the point where cross rack repair bandwidth is minimumis denoted by Minimum Bandwidth Rack-aware Regeneration(MBRR) point. The parameters of the MSRR and MBRR pointare given below: ( α MSRR , γ

MSRR ) = (

Bk , Bdk ( d − m + 1) ) , (3) a r X i v : . [ c s . I T ] F e b MBRR = γ MBRR = Bd ( k − m ) d + m ( d − m − ) . (4)The storage repair bandwidth tradeoff and the constructionsof rack-aware regenerating codes have been investigated in[8]. A related version of storage-repair bandwidth tradeoff hasbeen studied in the context of clustered distributed storagesystems in [9]. The tradeoff for the case of multiple nodefailures where all the node failures are in a single cluster havebeen dealt with in [10]. C. Our Contributions

In this paper, we consider the framework of rack-awarecooperative regenerating codes for the case of multiple nodefailures where the node failures are uniformly distributedamong a certain number of racks. We characterize the storagerepair-bandwidth tradeoff as well as derive the minimumstorage and minimum repair bandwidth points of the tradeoff.We also provide constructions of minimum bandwidth rack-aware cooperative regenerating codes for all parameters. Thesystem model and the tradeoff are introduced in Section II. Theconstruction of minimum bandwidth rack-aware cooperativeregenerating codes is presented in Section III.II. S

YSTEM M ODEL

The problem of multi-node repair in case of rack-awareregenerating codes is characterized by the parameters ( B , n , k , d , r , e , f , α , β , β ). We consider a distributed storagesystem consisting of n nodes which are equally divided into r racks and store B amount of information. In this paper weassume that r divides n . So, each rack contains nr nodes. Wedeﬁne X h,i as the i -th node in rack h where i = 1 , , . . . , nr and h = 1 , , . . . , r . Each node consists of α symbols. A data f ile of size B is encoded into nα symbols on n nodes.Each rack has a distinguished node called the relayer node which can access the contents of other nodes within the samerack. The system should satisfy the following two properties : • Reconstruction property : any k ≤ n nodes out of n nodes should be able to reconstruct the whole ﬁle. • Regeneration Property : We consider failure of e nodes inany f racks such that f divides e and each of the f rackshas ef failed nodes, we consider this case for simplicity sothat cross rack bandwidth is uniform. There are d helperracks (or relayers) where m ≤ d ≤ r − f and m = (cid:4) krn (cid:5) .We use cooperative repair across racks and centralizedrepair within the racks. There are two rounds of repair.In the ﬁrst round, each of the f racks, which have failednodes, downloads β symbols from each of the d helperracks. In the second round, all f racks share informationwith each other. So every rack which has failed nodesdownloads β symbols from each of the other racks whichhave failed node. Thus, the cross rack repair bandwidthfor one rack in case of e node failures : γ = dβ +( f − β .The aim of this paper is to characterize the tradeoff betweenthe storage per node α and repair bandwidth γ . In thispaper, we derive the tradeoff for functional repair. We call an encoding scheme which satisﬁes the above requirementswith parameters n, k, d, r, e, f, α, β , β as a rack − awarecooperative storage system (RCSS). A. Information Flow Graphs

The storage system described in the previous section can berepresented by inf ormation f low graphs (IFGs). Our IFGcontains a vertex S which represents the original data ﬁle anda vertex T which represents the data collector.We have vertices Out h,i for i -th node in rack h where h =1 , , ..., r and i = 1 , , ..., n/r . Edges from S to Out h,i havecapacity α . First node ( X h,1 ) in rack h for h = 1 , , ..., r isconsidered as the relayer node and can access the data fromother nodes within the same rack. For each h = 1 , , ..., r and i = 2 , , ..., nr , there are edges of inﬁnite capacity from each Out h,i to Out h,1 . Fig. 1. Information Flow Graph of Rack-aware Cooperative Storage System.Red nodes denote the failed nodes.

We consider e nodes fail in f racks with ef failures in eachof the f racks. We deﬁne a stage as the point of recoveryof e nodes in f racks, failed in a previous stage and thensimultaneous failure of e nodes in f racks. At s = 0 , ﬁrstset of e nodes fail. For s = 1 , , , ..., let F s be the setof f racks which have nodes that failed in stage s − andare regenerating in stage s . The set F s ⊆ { , , ..., r } and | F s | = f . F s is also known as repair group which is a set ofracks with failed nodes that get reconstructed simultaneouslyin stage s . For each rack h in F s , we construct vertex virt h ,vertex mid h and vertices corresponding to all the nodes withinthat rack denoted by Out ’h,i for i = 1 , , ..., n/r . For the nodesin rack h which have not failed in stage s − , there is an edgeof capacity inﬁnity from Out h,i to Out ’h,i . To reconstruct thefailed nodes in rack h , vertex virt h has d incoming edges withcapacity β from d helper racks and ( nr − ef ) incoming edgeswith capacity α from the nodes which have not failed in stage s − within rack h . Vertex virt h is connected to vertex mid h with an edge of capacity inﬁnite. For h, h (cid:48) ∈ F s , h (cid:54) = h (cid:48) ,we join virt h to mid h ’ with capacity β . virt h and mid h ininformation ﬂow graph together work as cent h which is theode where centralized reconstruction happen for the failednodes within rack h . There are directed edges of capacity α from mid h to Out ’h,i for i belonging to the failed nodesin rack h in stage s − . The download at virt h where thereplacement rack downloads β symbols from d other racksand α symbols from the remaining nodes within the samerack corresponds to the ﬁrst round of repair and download at mid h where each replacement rack downloads β symbolsfrom every other rack corresponds to the second round ofrepair. We also have an edge with inﬁnite capacity from each Out ’h,j to Out ’h,1 for j = 2 , ..., nr . Data collector T is connectedto any k out of n existing nodes with capacity of inﬁnite.Given parameters n, k, d, r, e, f, α, β , β , there can bemany IFGs based on the failure pattern. The set of all suchIFGs is denoted by G ( n, k, d, r, e, f, α, β , β ) . Given an IFG G ∈ G , there can be many different data collectors connectingto any k data nodes out of n . We denote the set of alldata collectors corresponding to IFG G by DC ( G ) . For anIFG G ∈ G with source vertex S and a data collector T ∈ DC ( G ) , the ( S, T ) -cut is deﬁned as the sum of capacitiesof a subset of edges of graph G which when removed fromthe graph, partition the vertices of G such that S and T aredisconnected. The smallest capacity of an ( S, T ) -cut in a givenIFG G is denoted by mincut G ( S, T ) . According to the max-ﬂow bound in network thoery, the supported ﬁle size is upperbounded by mincut G ( S, T ) minimized over all data collectors T ∈ DC ( G ) and IFG G ∈ G .The following lemma is very similar to the one in [8] andit has been discussed here for the sake of completeness. Lemma II.1.

If a relayer node in a rack is connected to thedata collector T and not all the other remaining nr − nodesare connected to T , then the capacity of ( S, T ) -cut is notminimum.Proof. Suppose a relayer X h, for any h = 1 , , ..., r isconnected to the data collector T . All the incoming edgesof T have inﬁnite capacity. Assuming that rack h does nothave any failed nodes, we consider only the incoming edgesof Out h , i for i = 1 , , ..., n/r , each with capacity α , whichin total contribute α nr to the cut. So, relayer node X h, contribute α nr to the cut in case of no node failure withinthe rack. On the other hand, if rack h has failed and thenreconstructed nodes and the relayer node X (cid:48) h, is connectedto the T , then we can have cut of capacity α nr consideringthe incoming edges of Out (cid:48) h , i and Out h , i for the cut or ofcapacity dβ + ( nr − ef ) α considering the incoming edgesof virt h and ( f − β considering the incoming edgesof mid h for the cut. So a failed relayer node contributes min ( α nr , ( dβ +( nr − ef ) α +( f − β )) to the cut. In both casesof failed relayer or non failed relayer, if a relayer is connectedto T then the other nodes within the rack if connected or notconnected to T , will not have any contribution to the cut. Soto minimize the capacity of the cut, if a relayer is connectedto T then all the other nodes within the rack should also beconnected to T . Theorem II.2.

For ﬁxed parameters n, k, d, r, e, f, α, β , β ,where m ≤ d ≤ ( r − f ) and f | m , if there exist an RCSS withﬁle size B , then it will satisfy : B ≤ kα + g (cid:88) i =1 u i min(0 , ( d − i − (cid:88) j =1 u j ) β − ef α +( f − u i ) β ) (5) where u = [ u , u , ..., u g ] and ≤ u i ≤ f, g ∈ N , (cid:80) gi =1 u i = m. Proof.

We consider g number of stages (or repair groups ofsize f ) and u i denotes number of racks contacted in stage(or from repair group) i for the data collection process. Let I denote the set of nodes which will contribute for the datacollection process. We want to ﬁnd the min-cut to get the upperbound on ﬁle size. To partition IFG G into sets U and ¯ U , wedo not consider edges with inﬁnite capacity. So, S ∈ U andeither Out ’h,i ∈ ¯ U or Out h,i ∈ ¯ U for ( h, i ) ∈ I . We know thatIFGs are Directed acyclic graphs (DAG) and every DAG has atopological sorting. As we know that all the racks in a repairgroup have nodes which are reconstructed simultaneously soall the nodes in these racks will be adjacent to each other intopological sorting. If we consider topological sorting for thenodes connected to the data collector, then nodes in rack in i -th repair group do not have incoming edges from nodes inrack in j -th repair group for j > i where i, j ∈ [ g ] .From Lemma II.1, we can see that a relayer node contributes min ( nr α, dβ + ( nr − ef ) α + ( f − β ) to the cut. Includinga relayer node means including the whole rack for the datacollection process. We take u i racks from the i -th repair groupfor data collection. For the reconstruction of a rack in i -threpair group, at most (cid:80) i − j =1 u j edges come from the rackswhich are reconstructed in the previous steps and are alreadyincluded for the data collection process. Thus, i -th repair groupcontributes at least u i min ( nr α, ( d − (cid:80) i − j =1 u j ) β +( nr − ef ) α +( f − u i ) β )) to the cut. Finally, adding all the contributionsfrom different repair groups, we get contributions for m nr nodes as (cid:80) gi =1 u i = m and each of the remaining k − m nr nodes have α contribution to the cut. This gives us the min-cut for any IFG G ( n, k, d, r, e, f, α, β , β ) and thus the upperbound on ﬁle size.Next, we show that there exist an IFG where min-cut isequal to the right hand side value in (5). We consider the racks , , ..m participating in data collection process and hence, ﬁrst u racks taken for data collection in ﬁrst stage (consideringthey are reconstructed in that stage) and next u in next stageand so on, until last u g racks in g -th stage, considering thateach of them have failed nodes, so each of them will contribute u i min ( α nr , (( d − (cid:80) i − j =1 u j ) β + ( nr − ef ) α + ( f − u i ) β )) andrest k − m nr nodes from the ( m + 1) -th rack will contribute α to the cut. Summing all the values will give min-cut whichis right side in (5). B. Optimal Tradeoffs

We would like to ﬁnd points where storage cost α isminimized and repair bandwidth γ is minimized under theonstraints of (5). MSRCR Point:

Minimum Storage Rack-aware Coop-erative Regeneration Codes are optimal codes which providethe lowest possible storage cost α while minimizing the repairbandwidth γ . α = Bk , β = Bk ef d − m + f , β = Bk ef d − m + f These values are determined in two steps. In ﬁrst step, weconsider two particular cuts to ﬁnd the minimum values ofparameters α , β and β which ensure that the max ﬂow isat least equal to the ﬁle size B which proves the optimalityof the solution if correct. The correctness of these parametersis proved by showing that they are sufﬁcient for all possiblecuts. P roof of M SRCR Optimality : First we minimize α and then for the minimum value of α we minimize repairbandwidth γ . It is clear from the (5) that we get α = Bk as theminimum value of α . Now we consider two particular casesof repairs ( u = [1 , , ..., ]) and ( u = [ f, f, ..., ]) to minimizerepair bandwidth. Case

When u i = f , ∀ i ∈ { , , ..., g } , then we requirethat ≤ ( d − (cid:80) i − j =1 f ) β − ef α, ∀ i ∈ { , , ..., mf } , whichleads to β ≥ Bk ef d − m + f . Case

When u i = 1 , ∀ i ∈ { , , ..., g } , then we want ≤ ( d − (cid:80) i − j =1 β − ef α + ( f − β , ∀ i ∈ { , , ..., m } , which results in β ≥ f − ef α − ( d − m + 1) β ) . (6)Substituting the minimum value of β from above in theexpression of cross rack repair bandwidth for one rack γ = dβ + ( f − β , we have γ in terms of β as follows: γ = ef Bk + ( m − β . This shows that the repair cost increases linearly with β ,so to minimize γ we need to minimize β . We know fromCase 1 that the minimum value of β = Bk ef d − m + f . We can get the corresponding value of β from (6). Wecan observe that β = β for MSRCR codes. MBRCR Point:

Minimum Bandwidth Rack-aware Co-operative Regeneration Codes are optimal codes which providethe lowest possible repair bandwidth ( γ ) while minimizing thestorage cost α . The α , β and β parameters for MBRCRpoint will be derived to be : α = fe γ, β = Bk fe ( d + f − ) + m − m β = 12 Bk fe ( d + f − ) + m − m .P roof of M BRCR Optimality : For MBRCR codes, wewant to minimize γ before α . From the upper bound of ﬁlesize, we can say that ≥ dβ + ( nr − ef ) α + ( f − β , whichimplies that α ≥ fe γ . Let us take two particular cases of repairs ( u = [1 , , ..., ]) and ( u = [ f, f, ..., ]) to minimize repair bandwidth. Case

When u i = f , ∀ i ∈ , , ..., mf , then B ≤ kα + (cid:80) mf i =1 f (( d − (cid:80) i − j =1 f ) β − ef α ) , which leads to β ≥ B − ( k − m ef ) αm ( d − m − f ) . Case

When u i = 1 , ∀ i ∈ , , ..., m , we have, B ≤ kα + (cid:80) mi =1 (( d − (cid:80) i − j =1 β − ef α + ( f − β ) whichleads to β ≥ f − B − ( k − m ef ) α − mβ ( d − m − ) m . (7)Substituting the minimum value of β from above in theexpression of cross rack repair bandwidth for one rack γ = dβ + ( f − β , we have γ in terms of β as follows: γ = fme ( k − m ef ) ( Bm + m − β ) . The above equation shows that γ grows linearly with β .Hence, in order to minimize γ , we need to minimize β andhence minimum value of β is : β = B − ( k − m ef ) αm ( d − m − f ) . (8)Substituting this value of β in (7), we get β = 2 β .Substituting α = γ fe and γ = dβ + ( f − β in (8), we get β = Bk fe ( d + f − )+ m − m . Correctness of MBRCR and MSRCR points can be provedby showing that these values make sure that enough infor-mation ﬂows through every cut in any case. The proof ofcorrectness for MBRCR and MSRCR follows along the linesof proof of correctness for MBCR and MSCR in [2]. Theproofs of correctness have been omitted because of lack ofspace.

Remark 1.

MSRCR and MBRCR points with n = r , e = f coincide with MSCR and MBCR points given in (1) and (2) . Similarly, MSRCR and MBRCR points with e = f = 1 coincide with MSRR and MBRR points given in (3) and (4) . III. C

ONSTRUCTION OF

MBRCR C

ODES FOR ALLPARAMETERS

In this section, we will present our construction of MBRCRcodes for all parameters ( B , n , k , d , r , e , f ). The idea ofproduct-matrix construction has been introduced in [11] in thecontext of regenerating codes. We generalize the constructionof MBRR construction given in [8] to the case of multipleerasures using the product-matrix construction of MBCR codegiven in [5]. Construction III.1.

The MBRCR construction will have theparameters β = ef , β = ef , α = 2 d + f − and B = k (2 d + f −

1) + ef ( m − m ) , (9) = (cid:18) k − m ef (cid:19) (2 d + f −

1) + ef ( m (2 d + f − m )) . here m = (cid:98) krn (cid:99) .We will describe the MBRCR code construction in thefollowing steps. Step-1

Generation of global coded symbols: Let [ s , s , . . . , s B ] denote B message symbols over F q .Encode these message symbols using the generatormatrix G of a MDS code where G is of size B × ( n − r ef )(2 d + f −

1) + ef ( m (2 d + f − m )) . Step-2

Filling racks partially with global coded symbols:Divide the ( n − r ef )(2 d + f − of the global coded symbolsinto r groups of ( nr − ef )(2 d + f − symbols each. Thesesymbols are then used to ﬁll the last nr − ef nodes in eachrack. Step-3

Generating MBCR code symbols: Consider theremaining ef ( m (2 d + f − m )) of the global coded symbolswhich were not ﬁlled in the step above. We will generateMBCR coded symbols corresponding to ef MBCR codes asfollows:Consider a message matrix M i , ≤ i ≤ ef formed of m (2 d + f − m ) global coded symbols as follows: M i (cid:44) (cid:20) A i B i C i (cid:21) (10)where A i is a matrix of size m × m , B i is a matrix of size m × ( d + f − m ) and C i is a matrix of size ( d − m ) × m .Let U and V denote matrices of sizes d × r and ( d + f ) × r . U = (cid:34) U (1) m × r U (2)( d − m ) × r (cid:35) such that any m × m submatrix of U (1) is full rank and any d × d submatrix of U is full rank. V = (cid:34) V (1) m × r V (2)( d − m ) × r (cid:35) such that any m × m submatrix of V (1) is fullrank and any ( d + f ) × ( d + f ) submatrix of V is full rank. u l denote the l th column of U and v l denote the l th columnof V . In the i th node of l th rack ( ≤ i ≤ ef , ≤ l ≤ r ), α = 2 d + f − symbols will be calculated based on thefollowing set of symbols: M i v l as well as M Ti u l . There isa linear dependence relation among the above d + f codesymbols given by u Tl M i v l = v Tl M Ti u l . Step-4

Addition of local parities to the MBCR code sym-bols: Let c l = [ c ,l , c ,l , . . . , c nr − ef ,l ] denote the vector ofglobal code symbols stored in the last nr − ef nodes of rack l .Let P i,l denote a matrix of size (2 d + f ) × ( nr − ef )(2 d + f − ,where the last row is ﬁlled with zeros. Consider the followingset of d + f symbols obtained by adding local parities to theMBCR code symbols generated above: (cid:20) M i v l M Ti u l (cid:21) + P i,l c l . The ﬁrst d + f − symbols of the above vector are stored inthe i th node in the l th rack. Also, P i,l are required to satisfythe property that c ,l , c ,l , . . . , c nr − ef ,l , P ,l c l , . . . , P ef ,l c l aresuch that any nr − ef of them are sufﬁcient to recover the rest(also known as vector-MDS property). File Recovery:

It is easy to see that if the k nodes arepicked from the last nr − ef nodes of each rack, then since the global coded symbols are formed based on MDS code, we canrecover all the B message symbols. If the k nodes are pickedpartly from the ﬁrst ef nodes in each rack and the rest from nr − ef nodes, it can be argued (using argument similar to thosein the Proof of Theorem 8 in [8]) that by picking elements of U , V and { P i,l } from a sufﬁciently large ﬁeld, we can ensurethat the encoding matrix corresponding to symbols in these k nodes is full rank (rank B ). Hence we can recover at least B global coded symbols and as a result we can also recover B message symbols by the property of the MDS code. Thedetails are omitted due to lack of space. Node Repair:

Each of the d helper racks sends ef symbols u Tl M i v j and v Tl M Ti u j , ≤ i ≤ ef . Based on these M i v l canbe recovered at each of the failed node. In the second round ofrepair, u Tt M i v l is sent from rack l to rack t , where l and t areboth failed racks. Thus, the failed racks can completely recoverthe symbols corresponding to (cid:20) M i v l M Ti u l (cid:21) . Hence, the MBCRcode symbols can be generated at every failed node. Assumethat (cid:96) of the ﬁrst set of ef nodes and (cid:96) of the last set of nodesfail in a particular rack such that (cid:96) + (cid:96) = ef , then it is clearthat local parities corresponding to ef − (cid:96) can be recoveredﬁrst by subtracting the MBCR code symbols from them. In thatcase, a total of ef − (cid:96) + ( nr − ef − (cid:96) ) = nr − ef node contents(global coded symbols + local parities) are available. Thesecan be used for recover the rest of ef global coded symbols+ local parities (from the vector-MDS property). Hence, thenodes can be repaired. R EFERENCES[1] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ram-chandran, “Network coding for distributed storage systems,”

IEEE Trans.Inf. Theory , vol. 56, no. 9, pp. 4539–4551, 2010.[2] A. Kermarrec, N. Le Scouarnec, and G. Straub, “Repairing MultipleFailures with Coordinated and Adaptive Regenerating Codes,” in

Proc.Int. Symp. Networking Coding (NetCod) , pp. 1–6, IEEE, 2011.[3] K. W. Shum and Y. Hu, “Cooperative Regenerating Codes,”

IEEE Trans.Inf. Theory , vol. 59, no. 11, pp. 7229–7258, 2013.[4] A. Wang and Z. Zhang, “Exact cooperative regenerating codes withminimum-repair-bandwidth for distributed storage,” in

Proc. IEEE IN-FOCOM , pp. 400–404, IEEE, 2013.[5] K. W. Shum and J. Chen, “Cooperative repair of multiple node failuresin distributed storage systems,”

IJICoT , vol. 3, no. 4, pp. 299–323, 2016.[6] M. Ye and A. Barg, “Cooperative Repair: Constructions of Optimal MDSCodes for All Admissible Parameters,”

IEEE Trans. Inf. Theory , vol. 65,no. 3, pp. 1639–1656, 2019.[7] M. Zorgui and Z. Wang, “Centralized multi-node repair regeneratingcodes,”

IEEE Trans. Inf. Theory , vol. 65, no. 7, pp. 4180–4206, 2019.[8] H. Hou, P. P. Lee, K. W. Shum, and Y. Hu, “Rack-aware regeneratingcodes for data centers,”

IEEE Trans. Inf. Theory , vol. 65, no. 8, pp. 4730–4745, 2019.[9] N. Prakash, V. Abdrashitov, and M. M´edard, “The storage versus repair-bandwidth trade-off for clustered storage systems,”

IEEE Trans. Inf.Theory , vol. 64, no. 8, pp. 5783–5805, 2018.[10] V. Abdrashitov, N. Prakash, and M. M´edard, “The storage vs repairbandwidth trade-off for multiple failures in clustered storage networks,”in

IEEE Information Theory Workshop (ITW) , pp. 46–50, IEEE, 2017.[11] K. V. Rashmi, N. B. Shah, and P. V. Kumar, “Optimal exact-regeneratingcodes for distributed storage at the MSR and MBR points via a product-matrix construction,”