[PDF] The Discrepancy Attack on Polyshard-ed Blockchains

Abstract

Sharding, i.e. splitting the miners or validators to form and run several subchains in parallel, is known as one of the main solutions to the scalability problem of blockchains. The drawback is that as the number of miners expanding each subchain becomes small, it becomes vulnerable to security attacks. To solve this problem, a framework, named as \textit{Polyshard}, has been proposed in which each validator verifies a coded combination of the blocks introduced by different subchains, thus helping to protect the security of all subchains. In this paper, we introduce an attack on Polyshard, called \textit{the discrepancy} attack, which is the result of malicious nodes controlling a few subchains and dispersing different blocks to different nodes. We show that this attack undermines the security of Polyshard and is undetectable in its current setting.

Full PDF

aa r X i v : . [ c s . CR ] F e b The Discrepancy Attack on Polyshard-edBlockchains

Nastaran Abadi Khooshemehr

Sharif University of TechnologyEmail: [email protected]

Mohammad Ali Maddah-Ali

Sharif University of TechnologyEmail: [email protected]

Abstract

Polyshard , has beenproposed in which each validator veriﬁes a coded combination of the blocks introduced by different subchains, thus helping toprotect the security of all subchains. In this paper, we introduce an attack on Polyshard, called the discrepancy attack, which isthe result of malicious nodes controlling a few subchains and dispersing different blocks to different nodes. We show that thisattack undermines the security of Polyshard and is undetectable in its current setting.

I. I

NTRODUCTION

Blockchain is known as a disruptive technology that would eliminate the necessity of trusting centralized entities, running,and sometimes abusing, the ﬁnancial and information networks. For this promise of blockchain systems to be fulﬁlled, thefundamental problems regarding the underlying technology should be addressed. In that sense, there has been a remarkableeffort in the community to solve the blockchain trilemma . This trilemma highlights the challenge of simultaneously achievingthree main requirements of a blockchain network: (1) Scalability, i.e., the throughput of the blockchain with N nodes shouldscale with N , (2) Security, i.e. the blockchain network should function properly even if a constant fraction of the nodes iscontrolled by adversaries, (3) Decentralization, i.e. the storage, communication, and computation resources needed by eachnode remain constant, as N increases. For a survey on approaches to this problem, refer to [1].One of the promising approaches to blockchain scalability problem is sharding . In sharding, the system exploits the gainof parallel processing by splitting the nodes into several shards, where each shard forms and maintain a subchain, in parallelwith other shards. The drawback is that as the number of nodes maintaining a subchain becomes small, it becomes vulnerableto security attacks. Recently, in [2], a scheme, named Polyshard , has been proposed that claims to achieve linear scaling inthroughput and security threshold. In addition, the storage and computation resources per node remain constant. This would bea remarkable step toward outlining a solution to the blockchain trilemma. The main idea of Polyshard is incorporating codedcomputing techniques in the context of blockchain.Coded computing is based on running the desirable computation tasks on some linear combinations of the inputs, ratherthan each input individually [3]–[9]. This would help the system to detect and correct the results of adversarial nodes, ignorethe results of straggles, and even protect the input data against curious nodes. In particular, in [8], Lagrange coded computing(LCC) has introduced to compute a general polynomial function of several inputs on a cluster of servers, where some of themare adversaries. In Polyshard, Lagrange coded computing is used to run the function, verifying the validity of the blocks, onsome linear combinations of the blocks produced by the subchains. It is claimed that this would entangle the security of allshards together and improve the security of the blockchain, without increasing the computation and storage cost at each node.Blockchain systems can beneﬁt from coding in other ways as well. For example, [10], [11], [12] use coding to reduce storagein nodes, [13] uses coding for easing the bootstrap process of new nodes, and [14], [15] use coding to tackle the availabilityproblem in blockchain.In this paper, we introduce a fundamental attack on Polyshard that undermines its security. The attack is based on anadversarial behavior in the system that is unobserved in [2]. In essence, the adversarial nodes can take the control of a fewshards and transmit inconsistent blocks to different nodes. The heavy load of communication does not allow nodes to resolvethe inconsistencies. These inconsistent versions make the set of equations used for decoding inconsistent. We prove that thisinconsistency cannot be resolved by linear decoding, unless the number of nodes N grows linear with v β , where β is thenumber of shards compromised by adversaries and v is the number of the inconsistent versions that a compromised shard canproduce. This prevents the system from tolerating O ( N ) adversaries. The general discrepancy attack that results from somemalicious nodes distributing inconsistent data, can happen in many systems and is not limited to blockchains. We have studiedthe fundamental problem of distributed encoding in [16].In the following, ﬁrst we summarize Polyshard in Section II. Then, we explain the attack in Section III, and analyze it inSection IV.I. S UMMARY OF P OLYSHARD [2]In this section, we review Polyshard [2], and adopt the notation in [2]. Each shard k ∈ [ K ] has a subchain, denoted by Y t − k = ( Y k (1) , . . . , Y k ( t − right before epoch t ∈ N , where Y k ( t ) ∈ U denotes the block accepted to shard k at epoch t ,and U is a vector space. It is assumed that in each epoch, a new block X k ( t ) ∈ U is proposed to each shard k . In the nextsection, we will question this assumption, but for now, let us accept it. The new blocks should be veriﬁed by a veriﬁcationfunction f t : U t → V with X k ( t ) and Y t − k as inputs, where V is a vector space. This function depends on the consensusalgorithm of the blockchain, but can be expressed as a multivariate polynomial in general.In order to verify X k ( t ) , k ∈ [ K ] , we need h tk = f t ( X k ( t ) , Y t − k ) , so that we can compute e tk = ( h tk ∈ W ) , and then theveriﬁed block Y k ( t ) = e tk X k ( t ) , where W ⊆ V denotes the set of function outputs that afﬁrm X k ( t ) . In other words, we needto calculate f t ( X ( t ) , Y t − ) , . . . , f t ( X K ( t ) , Y t − K ) to verify the incoming blocks.In polyshard, each shard k ∈ [ K ] and each node n ∈ [ N ] are associated with distinct constant values ω k ∈ F , and α n ∈ F ,respectively. There are two global Lagrange polynomials for each epoch m ∈ N , p m ( z ) = K X k =1 Y k ( m ) Y j = k z − ω j ω k − ω j , (1) q m ( z ) = K X k =1 X k ( m ) Y j = k z − ω j ω k − ω j . (2)These polynomials are such that p m ( ω k ) = Y k ( m ) and q m ( ω k ) = X k ( m ) , for all k ∈ [ K ] . Thus, f t ( X k ( t ) , Y t − k ) , k ∈ [ K ] are equivalent to f t ( q t ( ω k ) , p ( ω k ) , . . . , p t − ( ω k )) , k ∈ [ K ] .At the beginning of epoch t , a node n ∈ [ N ] has the coded chain ˜ Y t − n = ( ˜ Y n (1) , . . . , ˜ Y n ( t − in its storage, where ˜ Y i ( m ) = p m ( α n ) = K X k =1 Y k ( m ) Y j = k α n − ω j ω k − ω j , m ∈ [ t − . (3)This coded chain is the outcome of executing Polyshard in the previous epochs. All nodes in the network receive the newlyproposed blocks X ( t ) , . . . , X K ( t ) . Each node n ∈ [ N ] calculates the coded block ˜ X n ( t ) = q t ( α n ) , and then veriﬁes thecoded block against the stored coded chain, i.e. calculates f t ( ˜ X n ( t ) , ˜ Y t − n ) := g tn , and broadcasts the result in the network.After this step, all nodes have g t , . . . , g tN , though the values from the adversarial nodes may be arbitrary or empty. Since g tj = f t ( q t ( α j ) , p ( α j ) , . . . , p t − ( α j )) , j ∈ [ N ] , and each node n needs f t ( q t ( ω k ) , p ( ω k ) , . . . , p t − ( ω k )) for all k ∈ [ K ] , node n should determine the polynomial f t ( q t ( z ) , p ( z ) , . . . , p t − ( z )) using g t , . . . , g tN , and then evaluate it at ω , . . . , ω K to ﬁnd f t ( X ( t ) , Y t − ) , . . . , f t ( X K ( t ) , Y t − K ) .The degree of f t ( q t ( z ) , p ( z ) , . . . , p t − ( z )) is d ( K − , so there are d ( K − unknowns to be found. The Reed-Solomondecoding allows a maximum of ö N − ( d ( K − − ù incorrect values among g t , . . . , g tN . If µ fraction of the nodes are controlledby the adversary (i.e. β = µN ), it sufﬁces to have µN ≤ ö N − ( d ( K − − ù , which means K = O ( N ) .Polyshard is indeed based on the Lagrange coded computing in [8]. In LCC, there are N workers, and a master that wants f ( X ) , . . . , f ( X K ) . Worker i ∈ [ N ] is provided with ˜ X i , which is an encoded version of X , . . . , X K . The encoding is donewith a Lagrange polynomial, exactly similar to Polyshard. Workers apply f on their encoded data and send them to the master.The master can recover f ( X ) , . . . , f ( X K ) using Reed-Solomon decoding.III. T HE DISCREPANCY A TTACK

In this section, we explain the discrepancy attack on Polyshard and show that it undermines both security and scalability ofPolyshard. The attack stems from the assumption that all nodes in the network receive the same new blocks X ( t ) , . . . , X K ( t ) in the beginning of each epoch t ∈ N . As mentioned in the previous section, Polyshard assumes that new blocks are somehow proposed to shards. But in fact, the new block for a shard is produced by nodes in that shard, and not through an externalprocess. Note that an adaptive adversary can decide to control an arbitrary subset of the nodes up to a certain size. When acertain number of nodes in shard k ∈ [ K ] are under the control of the adversary, they can produce more than one new block X K ( t ) , say X (1) k ( t ) , . . . , X ( v ) k ( t ) , for some v ∈ N , and send them to different nodes in the network. Note that the adversariesdo not broadcast the produced blocks, instead they use separate links to deliver the new blocks to different nodes, so thatnodes would not know what other nodes have received and remains unaware of this attack. By doing so, the adversary cancause a sort of discrepancy in the network while remaining undetected by the nodes.n polyshard, the only considered adversarial behaviour is broadcasting incorrect g tn , which is subdued in the process ofdecoding. But an adversary need not wait till that step, and can violate the protocol in the very ﬁrst step by sending inconsistentdata to different nodes. In the following, we investigate the effects of the discrepancy attack.Suppose that only one shard is controlled by the adversary, and the other K − shards are honest (contain honest nodes).We show that even one adversarial shard can have detrimental effects. Let the adversarial shard be the ﬁrst shard, producing v ∈ N blocks X (1)1 ( t ) , . . . , X ( v )1 ( t ) for epoch t . Some or all of these blocks may contradict the history of the ﬁrst shard. Weassume that the adversary attacks at epoch t for the ﬁrst time. Honest shards produce and broadcast X ( t ) , . . . , X K ( t ) . Wedeﬁne the polynomials q ( i ) m ( z ) = X ( i )1 ( m ) Y j =1 z − ω j ω − ω j + X ( m ) Y j =2 z − ω j ω − ω j + · · · + X K ( m ) Y j = K z − ω j ω K − ω j , i ∈ [ v ] , (4)for epoch m ∈ N . Suppose that node n ∈ [ N ] receives X ( ν n )1 , where ν n ∈ [ v ] . Following Polyshard, node n ﬁrst calculates ˜ X n ( t ) = q ( ν n ) t ( α n ) (not knowing ν n , and whether any attack is going on), then calculates g tn = f t ( ˜ X n ( t ) , ˜ Y t − n ) = f t ( q ( ν n ) t ( α n ) , p ( α n ) , . . . , p t − ( α n )) , and then broadcasts g tn . The adversarial nodes may broadcast arbitrary values or maybroadcast nothing.After the broadcast, all nodes have g t , . . . , g tN , but unlike the previous section, they are not points on a single polynomial.In other words, there are v different polynomials, f t ( q (1) t ( z ) , p ( z ) , . . . , p t − ( z )) ,...f t ( q ( v ) t ( z ) , p ( z ) , . . . , p t − ( z )) , and we have N evaluations of these polynomials at α , . . . , α N . Therefore, the Reed-Solomon decoding in Polyshard fails andcannot ﬁnd any of the above polynomials.To realize the severity of the discrepancy attack, lets consider a simple solution that aims to make nodes aware of theattack so that honest nodes can set the blocks of the adversarial nodes aside and continue with their own blocks. Assumethat all nodes broadcast their K received blocks, so that each node sees what other nodes have received and gets notiﬁed ofany inconsistency. This entails an overall communication load of O ( N K ) , but K = O ( N ) , and the communication load of O ( N ) is unbearable .The discrepancy attack can harm the underlying LCC engine of Polyshard if distributed encoding is deployed in it. In LCC,the encoded data in workers is assumed to be the outcome of a centralized encoding entity which receives the raw data, encodesthem, and distributes the encoded data among the workers. If encoding is done in a distributed manner, e.g. workers receivethe raw data and perform the encoding themselves as in Polyshard, LCC becomes prone to the discrepancy attack. We havestudied the problem of distributed encoding in [16].IV. A NALYSIS OF THE D ISCREPANCY A TTACK

In this section, we study the more general problem of Lagrange coded computing with distributed encoding, with Polyshardas a special case. There are N nodes, β of which are adversarial. We denote the set of all nodes with N = { , . . . , N } , andthe set of adversarial and honest nodes with A and H respectively, where N = A ∪ H . The ﬁrst K nodes in N , i.e. the set K = { , . . . , K } are message producers, where K = A K ∪ H K , and A K = A ∩ K , H K = H ∩ K . Let the number of adversarialmessage producers be β ′ , i.e. |A K | = β ′ . The node k ∈ H K sends X k ∈ V to all the other nodes, while node k ∈ A K sends X ( ν k,n ) k ∈ V , ν k,n ∈ [ v ] to node n = k , where V is a vector space over F . Each adversarial node can inject at most v ∈ N different messages in the network, i.e. |{ X ( ν k,n ) k , n ∈ N }| ≤ v for all k ∈ A K . Indeed, we consider a (constant) maximum onthe number of different messages that an adversarial node can inject into the system . Honest nodes do not know which nodesare adversarial, and how adversarial nodes send their messages. Each node n ∈ N does the following in order:1) calculates a coded version of the received data using Lagrange polynomial, q ( V ) ( z ) := X k ∈A K X ( V k ) k Y j = k z − ω j ω i − ω j + X k ∈H K X k Y j = k z − ω j ω i − ω j , (5)where V = ( V k , k ∈ A K ) ∈ [ v ] β ′ is a β -tuple of elements in [ v ] , denoting the versions of the received adversarialmessages. Then, node n evaluates q ( V ) ( z ) at α n ∈ F , ˜ X n = q ( V ) ( α n ) .2) calculates y n = f ( ˜ X n ) , It is worth noting that the communication load of Polyshard for each node is O ( N ) , which troubles the decentralization property. This assumption is necessary, because too many versions from even one adversary can destroy the system. It is also in accordance with practical system. ) broadcasts y n .The honest nodes should be able to recover f ( X k ) , k ∈ H K , after they receive enough number of y ’s. We deﬁne the recoverythreshold N ∗ , such that for any set N ∗ ⊆ N , |N ∗ | = N ∗ , f ( X k ) , k ∈ H K can be decoded from { y n , n ∈ N ∗ } .The following result shows that recovering f ( X k ) , k ∈ H K is not possible if β = O ( N ) . This proves the validity of thediscrepancy attack. Theorem 1.

Consider a system with N worker nodes, β of which are adversaries, and K of which are message producers.When using Lagrange coded computing and linear decoding, the threshold for recovering f of the messages of the honestproducers is N ∗ ≥ v β ′ ( d − K −

1) + vβ ′ + K − β ′ + 2 β, where d is the degree of f , β ′ is the number of adversarial message producers, and v is the number of the versions availableto each of such nodes.The deﬁnition of the recovery threshold implies that N ≥ N ∗ is necessary, otherwise f ( X k ) , k ∈ H K , cannot be decoded. Remark 1.

In order to use this result for Polyshard, suppose that a shard is controlled by adversaries if γ ∈ [0 , fractionof its nodes are controlled by adversaries. In sharding, there are N nodes and K shards, so there are NK nodes in eachshard. Consequently, adversaries can take over a maximum of β ′ = βγ NK shards. The adversarial shards are equivalent to theadversarial message producers in the general model. If we set β = cN for a constant c ∈ N , we have β ′ = Kcγ . In order toachieve scalability, Polyshard sets K = O ( N ) , but this also sets β ′ = O ( N ) . Theorem 1 states that the system fails in suchcase. Therefore, Polyshard cannot be secure and linearly scale with N at the same time. Remark 2.

Setting v = 1 means no effective adversary exists, so N ∗ should be the recovery threshold of normal LCC, i.e. d ( K −

1) + 1 + 2 β . The term β is for correcting errors of the adversaries. The formula in the theorem is consistent with this. Remark 3.

The ﬁrst summation in (5) that contains the messages of the adversaries, can have v β ′ different combinations,counting v cases for each X ( ν ) k , k ∈ A K . Thus, there are v β ′ different q ( V ) ( z ) polynomials. If N ≥ v β ′ ( d ( K −

1) + 1) + 2 β ,and we know which polynomial each node has evaluated, we can pick out at least ( d ( K −

1) + 1) consistent evaluations ofone polynomial using which we can decode f ( X k ) , k ∈ H K . Therefore, v β ′ ( d ( K −

1) + 1) + 2 β is an upper bound on N ∗ ,given the mentioned knowledge of the adversarial behaviour.Before proving the theorem, let us examine some polynomials ﬁrst. We can rewrite (5) as q ( V ) ( z ) = K − X i =0 L i ( X ( V r ) r , r ∈ A K , X s , s ∈ H K ) z i , (6)where L , . . . , L K − : F K → F are linear maps. Then, f ( q ( V ) ( z )) = d ( K − X i =0 u i ( X ( V r ) r , r ∈ A K , X s , s ∈ H K ) z i , (7)where u , . . . , u d ( K − : F K → F are polynomials of degree d of X ( v r ) r , r ∈ A K and X s , s ∈ H K . As a result, for a V ′ = ( v ′ k , k ∈ A K ) ∈ [ v ] β ′ , V ′ = V , we have u i ( X ( V r ) r , r ∈ A K , X s , s ∈ H K ) = u i ( X ( V ′ r ) r , r ∈ A K , X s , s ∈ H K ) for all i ∈ { , . . . , d ( K − } . To make things clear, consider the following example. Example 1.

Assume K = N = 3 , β = 2 , v = 2 , A = { , } , H = { } , ( ω , ω , ω ) = (1 , , , and f ( x ) = x . q ( i ,i ) ( z ) = ( z − z − X ( i )1 + ( z − z − − X ( i )2 + ( z − z − X = z ( X ( i )1 − X ( i )2 + X z ( − X ( i )1 X ( i )2 − X X ( i )1 − X ( i )2 + X ) , i , i ∈ { , } . (8)It is easy to conﬁrm that the coefﬁcients of z in f ( u (1 , ( z )) , f ( u (1 , ( z )) , f ( u (2 , ( z )) , and f ( u (2 , ( z )) are distinct. Thesame is true for coefﬁcients of each of z , z , z , and the constant term.Now we present the proof of Theorem 1. roof. We choose an arbitrary set ˆ N ⊆ N of size ˆ N = v β ′ ( d − K −

1) + vβ ′ + K − β ′ + 2 β − , that includes all the β adversarial nodes, and show that under some adversarial behaviours, f ( X k ) , k ∈ H K cannot be decoded,even if we know the adversarial behaviour, i.e. what each node have received from each adversary. The adversaries in ˆ N cancause errors by sending arbitrary values instead of y n , n ∈ A . It is a well-known fact that we can ignore erroneous y ’s byremoving β equations when decoding. Thus, in the following, we assume ˆ N = v β ′ ( d − K −

1) + vβ ′ + K − β ′ − , andthere is no erroneous value in y ∈ N ∗ . In the ﬁrst step, we partition ˆ N = N ∪ · · · ∪ N v β ′ , where n ∈ N i , i ∈ [ v β ′ ] denotes theset of nodes that all receive the same versions of messages V i ∈ [ v ] β ′ from A K . The goal is to ﬁnd f ( X k ) := Z k , k ∈ H K .All the equations we have are as follows. f ( q ( V i ) ( α n )) = y n , n ∈ N i , i ∈ [ v β ′ ] , (9) f ( q ( V i ) ( ω k )) = Z k , k ∈ H K , i ∈ [ v β ′ ] , (10) f ( q ( V i ) ( ω k )) = f ( q ( V j ) ( ω k )) , k ∈ M ( V i , V j ) , i, j ∈ [ v β ′ ] , (11)where M ( V i , V j ) = { r ∈ A K , V i,r = V j,r } . In other words, M ( V i , V j ) contains indexes where V i and V j are equal.Since we want to ﬁnd Z k , k ∈ H K using linear decoding, the next step is converting the above equations into matrix form.For that, we need to deﬁne Van D S as a |S| × ( D + 1) Vandermonde matrix whose rows consist of elements of the set S , fromdegree to D . For example, Van { α ,α } = ï α α α α α α ò . We put the unknowns in (9) in the vector X . X :=  u d ( K − ( X ( V ,r ) r , r ∈ A K , X s , s ∈ H K ) ...u ( X ( V ,r ) r , r ∈ A K , X s , s ∈ H K ) ...u d ( K − ( X ( V vβ ′ ,r ) r , r ∈ A K , X s , s ∈ H K ) ...u ( X ( V vβ ′ ,r ) r , r ∈ A K , X s , s ∈ H K )  , (12)and the remaining unknowns in (10) in Z := [ Z k , k ∈ H K ] . The length of X is v β ′ ( d ( K −

1) + 1) . The matrix equivalence of(9) is AX =  y n , n ∈ N y n , n ∈ N ...y n , n ∈ N v β ′  , (13)where A :=  Van d ( K − { α n ,n ∈N } . . . ... . . . Van d ( K − { α n ,n ∈N vβ ′ }  (14)Size of A is N × v β ′ ( d ( K −

1) + 1) . The matrix equivalence of (10) is ñ B 0

Van d ( K − { ω k ,k ∈H K } − I K − β ′ ô ï XZ ò = , (15)where B :=  Van d ( K − { ω k ,k ∈H K } − Van d ( K − { ω k ,k ∈H K } . . . ... Van d ( K − { ω k ,k ∈H K } . . . − Van d ( K − { ω k ,k ∈H K }  (16)The size of B is v β ′ ( K − β ′ ) × ( v β ′ ( d ( K −

1) + 1)) . The matrix equivalence of (11) is CX = , (17)where C :=  Van d ( K − { ω k ,k ∈M ( V , V ) } − Van d ( K − { ω k ,k ∈M ( V , V ) } . . . ... ... ... ... Van d ( K − { ω k ,k ∈M ( V , V vβ ′ ) } . . . − Van d ( K − { ω k ,k ∈M ( V , V vβ ′ ) }  (18)The size of C is ( v β ′ − v )( K − β ′ ) × ( v β ′ ( d ( K −

1) + 1)) . In order to count the number of rows in C , we can count thenumber of rows that have ω k , for each k ∈ A K . We leave it to reader to conﬁrm that this number is ( v β ′ − − v . Combining(13), (15), and (17), D ï XZ ò =  y n , n ∈ N ...y n , n ∈ N v β ′  , (19)where D :=  A 0B 0C 0

Van d ( K − { ω k ,k ∈H K } − I K − β ′  . (20)The condition for the unique determination of Z isrank ( D ) = rank  ABC

Van d ( K − { ω k ,k ∈H K }  + rank  − I K − β ′  . (21)In other words, if a linear combinations of the columns of D is zero, the coefﬁcients of its last K − β ′ columns are zero.Therefore, suppose that the linear combination of columns D with coefﬁcients λ (1) = [ λ (1) d ( K − , . . . , λ (1)1 , λ (1)0 ] T ,... λ ( v β ′ ) = [ λ ( v β ′ ) d ( K − , . . . , λ ( v β ′ )1 , λ ( v β ′ )0 ] T , ζ = [ ζ , . . . , ζ K − β ′ ] T , is zero. We can break this into the following equations.Van d ( K − { α n ,n ∈N } λ (1) = ,... Van d ( K − { α n ,n ∈N vβ ′ } λ ( v β ′ ) = , (22)Van d ( K − { ω k ,k ∈M ( V i , V j ) } λ ( i ) = Van d ( K − { ω k ,k ∈M ( V i , V j ) } λ ( j ) , i, j ∈ [ v β ′ ] , (23)Van d ( K − { ω k ,k ∈H K } λ (1) = · · · = Van d ( K − { ω k ,k ∈H K } λ ( v β ′ ) = ζ . (24)e want the above equations to have non-zero solution for ζ , meaning Z cannot be uniquely decoded. If there exists i ∈ [ v β ′ ] such that N i = |N i | ≥ d ( K −

1) + 1 , then Van d ( K − { α n ,n ∈N i } λ ( i ) = results in λ ( i ) = , substituting which in (24) gives ζ = .Therefore, we suppose that the adversarial behaviour is such that N , . . . , N v β ′ < d ( K −

1) + 1 . Therefore, Van d ( K − { α n ,n ∈N i } isa fat full row rank matrix, i ∈ [ v β ′ ] . Consequently, the equation Van d ( K − { α n ,n ∈N i } λ ( i ) = has a solution with d ( K −

1) + 1 − N i free variables. There are X i ∈ [ v β ′ ] d ( K −

1) + 1 − N i = v β ′ ( d ( K −

1) + 1) − ˆ N = ( v β ′ − K − β ′ ) + ( v β ′ − v ) β ′ + 1 free variables from λ (1) , . . . , λ ( v β ′ ) in total. By substituting them in (23), we can reduce the number of free variables to ( v β ′ − K − β ′ ) + 1 . Adding the k − β ′ variables of ζ to what we have and substituting those in (24) gives a homogeneoussystem of v β ′ ( K − β ′ ) equations and v β ′ ( K − β ′ ) + 1 variables. Since that system is underdetermined, there exists a non-zerosolution for ζ . This completes the proof.It is worth noting that if we know the adversarial behaviour when decoding, i.e. we know what each node have receivedfrom each adversarial node, the bound given in Theorem 1 becomes tight. In other words, that number of equations would beenough for recovering recover f ( X k ) , k ∈ H K . R EFERENCES[1] Q. Zhou, H. Huang, Z. Zheng, and J. Bian, “Solutions to scalability of blockchain: A survey,”

IEEE Access , vol. 8, pp. 16 440–16 455, 2020.[2] S. Li, M. Yu, C.-S. Yang, A. S. Avestimehr, S. Kannan, and P. Viswanath, “Polyshard: Coded sharding achieves linearly scaling efﬁciency and securitysimultaneously,”

IEEE Transactions on Information Forensics and Security , vol. 16, pp. 249–261, 2020.[3] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,”

IEEE Transactions onInformation Theory , vol. 64, no. 3, pp. 1514–1529, 2018.[4] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,” arXiv preprintarXiv:1705.10464 , 2017.[5] M. Fahim, H. Jeong, F. Haddadpour, S. Dutta, V. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” in

Proceedings of 55th Annual Allerton Conference on Communication, Control, and Computing , 2018, pp. 1264–1270.[6] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,”

IEEE Transactions on Information Theory , vol. 66, no. 3, pp. 1920–1933, 2020.[7] T. Jahani-Nezhad and M. A. Maddah-Ali, “Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication,” arXivpreprint arXiv:1812.10460 , 2018.[8] Q. Yu, S. Li, N. Raviv, S. M. M. Kalan, M. Soltanolkotabi, and S. A. Avestimehr, “Lagrange coded computing: Optimal design for resiliency, security,and privacy,” in

The 22nd International Conference on Artiﬁcial Intelligence and Statistics . PMLR, 2019, pp. 1215–1225.[9] H. A. Nodehi and M. A. Maddah-Ali, “Secure coded multi-party computation for massive matrix operations,”

IEEE Transactions on Information Theory ,2021.[10] S. Kadhe, J. Chung, and K. Ramchandran, “Sef: A secure fountain architecture for slashing storage costs in blockchains,” arXiv preprint arXiv:1906.12140 ,2019.[11] H. Wu, A. Ashikhmin, X. Wang, C. Li, S. Yang, and L. Zhang, “Distributed error correction coding scheme for low storage blockchain systems,”

IEEEInternet of Things Journal , vol. 7, no. 8, pp. 7054–7071, 2020.[12] D. Perard, J. Lacan, Y. Bachy, and J. Detchart, “Erasure code-based low storage blockchain node,” in . IEEE, 2018, pp. 1622–1627.[13] R. Pal, “Fountain coding for bootstrapping of the blockchain,” in .IEEE, 2020, pp. 1–5.[14] M. Al-Bassam, A. Sonnino, and V. Buterin, “Fraud and data availability proofs: Maximising light client security and scaling blockchains with dishonestmajorities,” arXiv preprint arXiv:1809.09044 , 2018.[15] M. Yu, S. Sahraei, S. Li, S. Avestimehr, S. Kannan, and P. Viswanath, “Coded merkle tree: Solving data availability attacks in blockchains,” in

InternationalConference on Financial Cryptography and Data Security . Springer, 2020, pp. 114–134.[16] N. Abadi Khooshemehr and M. A. Maddah-Ali, “Fundamental limits of distributed encoding,” arXiv e-printsarXiv e-prints