[PDF] Network Size Estimation in Small-World Networks under Byzantine Faults

Abstract

Full PDF

aa r X i v : . [ c s . D C ] F e b Network Size Estimation in Small-World Networks under ByzantineFaults

Soumyottam Chatterjee ∗ Gopal Pandurangan † Peter Robinson ‡ February 19, 2021

Abstract

We study the fundamental problem of counting the number of nodes in a sparse network(of unknown size) under the presence of a large number of Byzantine nodes. We assume thefull information model where the Byzantine nodes have complete knowledge about the entirestate of the network at every round (including random choices made by all the nodes), haveunbounded computational power, and can deviate arbitrarily from the protocol. Essentiallyall known algorithms for fundamental Byzantine problems (e.g., agreement, leader election,sampling) studied in the literature assume the knowledge (or at least an estimate) of the sizeof the network. In particular, all known algorithms for fundamental Byzantine problems thatcan tolerate a large number of Byzantine nodes in bounded-degree networks assume a sparseexpander network with nodes having knowledge of the network size. It is non-trivial to designalgorithms for Byzantine problems that work without knowledge of the network size, especiallyin bounded-degree (expander) networks where the local views of all nodes are (essentially) thesame and limited, and Byzantine nodes can quite easily fake the presence/absence of non-existingnodes. To design truly local algorithms that do not rely on any global knowledge (includingnetwork size), estimating the size of the network under Byzantine nodes is an important ﬁrststep.Our main contribution is a randomized distributed algorithm that estimates the size of anetwork under the presence of a large number of Byzantine nodes. In particular, our algorithmestimates the size of a sparse, “small-world”, expander network with up to O ( n − δ ) Byzantinenodes, where n is the (unknown) network size and δ can be be any arbitrarily small (but ﬁxed)positive constant. Our algorithm outputs a (ﬁxed) constant factor estimate of log( n ) with highprobability; the correct estimate of the network size will be known to a large fraction ((1 − ǫ )-fraction, for any ﬁxed positive constant ǫ ) of the honest nodes. Our algorithm is fully distributed,lightweight, and simple to implement, runs in O (log n ) rounds, and requires nodes to send andreceive messages of only small-sized messages per round; any node’s local computation cost perround is also small. ∗ Department of Computer Science, University of Houston, Houston, TX 77204, USA. Email: [email protected] . † Department of Computer Science, University of Houston, Houston, TX 77204, USA. Email: [email protected] . Research supported, in part, by NSF grant CCF-1527867. ‡ Department of Computing & Software, McMaster University, Hamilton, Ontario L8S 4L7, Canada. Email: [email protected] . Introduction

Motivated by the need for robust and secure distributed computation in large-scale (sparse) net-works such as peer-to-peer (P2P) and overlay networks, we study the fundamental

Byzantine count-ing problem in networks, where the goal is to count (or estimate) the number of nodes in a networkthat can contain a large number of Byzantine nodes that can exhibit malicious behaviour.The Byzantine counting problem is challenging because the goal is to guarantee that most ofthe honest (i.e., non-Byzantine) nodes obtain a good estimate of the network size despite thepresence of a large number of Byzantine nodes (which have full information about all the nodesand can behave arbitrarily or maliciously) in the network. Byzantine counting is related to, yetdiﬀerent, compared with other fundamental problems in distributed computing, namely, Byzantineagreement and

Byzantine leader election . Similar to the latter two problems, it involves solvinga global problem under the presence of Byzantine nodes. However, it is a diﬀerent problem,since protocols for Byzantine agreement or leader election do not necessarily yield a protocol forByzantine counting. In a sense, the Byzantine counting problem can be considered to be morefundamental than Byzantine agreement and leader election, since many existing algorithms forthese two problems (discussed below and in Section 1.3) assume knowledge of the number of nodesin the network n ; some algorithms require at least a reasonably good estimate of n , typically aconstant factor estimate of log n . Indeed, one of the main motivations for this paper is to designdistributed protocols that can work with little or no global knowledge, including the network size.In this sense, an eﬃcient protocol for the Byzantine counting problem can serve as a pre-processingstep for protocols for Byzantine agreement, leader election and other problems that either requireor assume knowledge of an estimate of n [4].Byzantine agreement and leader election have been at the forefront of distributed computingresearch for several decades. The work of Dwork et. al. [11] and Upfal [34] studied the Byzantineagreement problem in bounded-degree expander networks under the condition of almost-everywhere agreement, where almost all (honest) processors need to reach agreement as opposed to all nodesagreeing as required in the standard Byzantine agreement problem. Dwork et. al. [11] showedhow one can achieve almost-everywhere agreement under up to Θ( n log n ) of Byzantine nodes ina bounded-degree expander network ( n is the network size). Subsequently, Upfal [34] gave animproved protocol that can tolerate up to a linear number of faults in a bounded degree expander of suﬃciently large spectral gap (in fact, on Ramanujan graphs , which have the asymptoticallylargest spectral graph possible [17] — our protocol in this paper also works on a similar type ofexpander). These algorithms required O (log n ) rounds and polynomial (in n ) number of messages;however, the local computation required by each processor is exponential. Both of the abovealgorithms require knowledge of the global topology (including the knowledge of n ) , since at thestart, nodes need to have this information hardcoded. The work of King et. al. [24] was theﬁrst to study scalable (polylogarithmic communication and number of rounds, and polylogarithmiccomputation per processor) algorithms for Byzantine leader election and agreement. Similar toDwork et al.’s and Upfal’s algorithm, the nodes require hardcoded information on the networktopology — which is also an expander graph — to begin with, including the network size. Wenote that expansion property is crucially exploited in all the above works to achieve Byzantineagreement and leader election. Furthermore, the expander networks assumed in Dwork et al andUpfal works are bounded-degree (essentially, regular) graphs, where without prior knowledge it isdiﬃcult for nodes to have a knowledge of the network size.The works of [6], [2], and [3] studied stable agreement, Byzantine agreement, and Byzantine In sparse, bounded-degree networks, an adversary can always isolate some number of honest nodes; hence “almost-everywhere” knowledge is the best one can hope for in such networks (cf. [11]). all nodes are assumed to have knowledge of n . It was notclear how to estimate n without additional information under presence of Byzantine nodes in such(essentially, regular and constant degree expander) networks. In fact, the works of [4, 3] raisedthe question of designing protocols in expander networks that work when the network size is notknown and may even change over time, with the goal of obtaining a protocol that works whennodes have strictly local knowledge. This requires devising a distributed protocol that can measureglobal network parameters such as size, diameter, average degree, etc. under Byzantine nodes insparse networks, especially in sparse expander networks. We introduce and study the problem of Byzantine Counting. Our goal is to design a distributedalgorithm that guarantees, despite a large number of Byzantine nodes, that almost all honest nodesknow a good estimate of the network size in a bounded degree, “small world” network. We are notaware of any prior work that studies

Byzantine counting in the setting addressed here.Before stating our result, we brieﬂy describe the key ingredients of our network model (we referto Section 2.1 for the full details). We assume a sparse network that has constant bounded degree(essentially regular) and has high expansion as well as large clustering coeﬃcient . In other words, itis a “small-world” network. Expander graphs have been used extensively as candidates to solve theByzantine agreement and related problems in bounded degree graphs (e.g., as discussed earlier, see[11, 20, 21, 23, 34]); the expander property proves crucial in tolerating a large number of Byzantinenodes. The high expansion of such graphs have been exploited in previous works as well, mostnotably by Upfal [34] to solve the Byzantine agreement (with knowledge of n ). For the Byzantinecounting problem, which seems harder, however, expansion by itself does not seem to be suﬃcient;our protocol also exploits the high clustering coeﬃcient of the network crucially (cf. Section 1.2).We assume that up to O ( n − δ ) nodes can be Byzantine , where δ > full information model where theByzantine nodes (who have unbounded computational power) are adaptive , in the sense that theyknow the entire states of all nodes at the beginning of every round (including the messages sent bythem), including the random choices made by the nodes up to and including the current round aswell as future rounds (in other words, they are omniscient ). However, we note that the Byzantinenodes can communicate only using the edges of the network, i.e., they can send messages directlyonly to their neighbors.In our network model, where nodes have constant bounded degree, most nodes, with highprobability, see (essentially) the same local topological structure even for a reasonably large neigh-borhood radius — cf. Section A, and hence nodes do not have any a priori local information thatcan help them estimate the network size. In this setting, Byzantine nodes can easily fake thepresence/absence of nodes — thus trying to foil the estimate of the honest nodes.Our main contribution is a distributed algorithm (cf. Section 3) that estimates the size of thenetwork, even under the presence of a large number of Byzantine nodes. In particular, our algorithmestimates the size of a sparse (constant degree) “small-world” network with up to O ( n − δ ) (for anysmall positive constant δ ) Byzantine nodes, where n is the (unknown) network size. Our algorithmoutputs a (ﬁxed) constant factor estimate of log n with high probability; the correct estimate ofthe network size will be known to (1 − ǫ )-fraction (where ǫ > “With high probability (whp)” refers to a probability ≥ − n − c , for some constant c > Our algorithm is the ﬁrst known, decentralized Byzantine counting algorithm that can toleratea large amount of Byzantine nodes. It is fully-distributed, localized (does not require any globaltopological knowledge), lightweight, runs in O (log n ) rounds, and requires nodes to send andreceive “small-sized messages” only. Any node’s computation cost per round is also logarithmic.The given algorithm is a basic ingredient that can be used for the design of eﬃcient distributedalgorithms resilient against Byzantine failures, where the knowledge of the network size (a globalparameter) may not be known a priori. It can serve as a building block for implementing other non-trivial distributed computation tasks in Byzantine networks such as agreement and leader electionwhere the network size (or its estimate) is not known a priori.

The main technical challenge that we have to overcome is designing and analyzing distributedalgorithms under the presence of Byzantine nodes in networks where (honest) nodes only have localknowledge, i.e., knowledge of their immediate neighborhood. It is possible to solve the countingproblem exactly in networks without Byzantine nodes by simply building a spanning tree andconverge-casting the nodes’ counts to the root, which in turn can compute the total number of nodesin the network. A more robust and alternate way that works also in the case of anonymous networksis the technique of support estimation [6, 4] which uses exponential distribution (or alternately onecan use a geometric distribution, see e.g., [25]) to estimate accurately the network size as describedbelow.Consider the following simple protocol for estimating the network size that uses the geometricdistribution. Each node u ﬂips an unbiased coin until the outcome is heads ; let X u denote therandom variable that denotes the number of times that u needs to ﬂip its coin. Then, nodesexchange their respective values of X u whereas each node only forwards the highest value of X u (once) that it has seen so far. We observe that X u is geometrically distributed and denote itsglobal maximum by ¯ X . For any u , Pr( X u ≥ n ) = ( ) n , and by taking a union bound,Pr( ¯ X ≥ n ) ≤ n . Furthermore, Pr( ¯ X < log n ) = (1 − ( ) log n ) n ≤ e −√ n . It follows thateach node forwards at most O (log n ) distinct values (w.h.p.). After O ( D ) rounds (where D is thenetwork diameter), each node knows the value of ¯ X , and sets that as its estimate of log n . Due tothe above bounds on ¯ X it follows that (w.h.p.), it is a constant factor estimate of log n . The supportestimation algorithm [6, 4] which uses the exponential distribution works in a similar manner.The geometric distribution protocol fails when even just one Byzantine node is present. Byzan-tine nodes can fake the maximum value or can stop the correct maximum value from spreadingand hence can violate any desired approximation guarantee. Hence a new protocol is needed whendealing with Byzantine nodes.Prior localized techniques that have been used successfully for solving other problems such asByzantine agreement and leader election such as random walks and majority agreement (e.g., [2, 3])do not imply eﬃcient (i.e., fast algorithms that uses small message sizes) algorithms for Byzantinecounting. For instance, random walk-based techniques crucially exploit a uniform sampling oftokens (generated by nodes) after Θ(mixing time) number of steps. However, the main diﬃculty inthis approach is that the mixing time is unknown (since the network size is unknown) — and hence We call ǫ the error parameter — by changing its value (please refer to Line ?? in the pseudocode in Algorithm1), we (the algorithm designer) can control exactly how large a fraction of the honest nodes would estimate log n correctly (i.e., get a constant-factor approximation of log n ). Theorem 1, which is the main result of this paper, tellsus that at most ǫ -fraction of the honest nodes would fail to get a constant factor approximation of log n . A “small-sized message” is one that contains a constant number of IDs and O (log n ) additional bits.

3t is unclear a priori how many random walk steps the tokens should take. Similar approachesbased on the return time of random walks fail due to long random walks having a high chance ofencountering a Byzantine node. One can also use Birthday paradox ideas to try to estimate n (e.g.,these have been tried in an non-Byzantine setting [14]); these also fail in the Byzantine case.We note that one can possibly solve Byzantine counting if one can solve Byzantine leaderelection; however, all known algorithms for Byzantine leader election (or agreement) assume apriori knowledge (or at least a good estimate) of the network size . Hence we require a new protocolthat solves Byzantine counting from “scratch.” In our random network model, where most nodes,with high probability, see (essentially) the same local topological structure (and constant degree)even for a reasonably large neighborhood radius (cf. Section A), it is diﬃcult for nodes to breaksymmetry or gain a priori knowledge of n . Another approach is to try to estimate the diameter of the network, which, being Θ(log n ) forsparse expanders, can be used to deduce an approximation of the network size. Assuming thatthere exists a leader in the network, one way to do this is for the leader to initiate the ﬂoodingof a message and it can be shown that a large fraction of nodes (say a (1 − ǫ )-fraction, for somesmall ǫ >

0) can estimate the diameter by recording the time when they see the ﬁrst token, sincewe assume a synchronous network. However, this method fails since it is not clear, how to breaksymmetry initially by choosing a leader — this by itself appears to be a hard problem in theByzantine setting without knowledge of n .We now give a high-level intuition behind our protocol. The main idea is based on using thegeometric distribution, but there are several technical obstacles that we need to tackle (cf. Section3). The algorithm operates in phases. In phase i , each honest node estimates the number of nodesat distance i (in particular, whether there are any nodes at all) by observing the maximum (ornear-maximum) value, generated according to the geometric distribution, at distance i ; this valuecan be propagated by ﬂooding for exactly i steps. We only allow certain values to propagate inphase i ; this avoids congestion and hence our algorithm works using only small message sizes. As i increases, i.e., when it becomes a log n , for some small constant 0 < a <

1, this provides a constantfactor estimate of log n . Up to a distance of i = a log n , most nodes (i.e., n − o ( n ) nodes) do notsee any values from Byzantine nodes, since most nodes are a distance at least a log n from anyByzantine node — this is due to the property of the expander graph. However, as i increases, theByzantine nodes can introduce fake values and hence can fool most of the nodes into believingthat the network is much larger than it actually is. To overcome this, the protocol exploits the small-world property of the network, i.e., nodes have high clustering coeﬃcients — which impliesthat a node’s neighbors are well-connected among themselves. Each (honest) node checks with itsneighbors to see if the value sent by the Byzantine node is consistent among the neighbor set; ifnot, this (high) value is discarded.There are some complications in implementing this idea, since Byzantine nodes can lie aboutthe identity of neighboring nodes; our protocol exploits the fact that the network is a union ofexpander and small-world network to overcome this. We refer to Section 3.3 for more details. Informally, the idea is as follows. If one can elect a honest leader, then it can initiate ﬂooding by sending amessage to the entire network; any other node can set an estimate of log n as the round number when it sees themessage for the ﬁrst time. It can be shown that in a sparse expander, n − o ( n ) nodes will have a constant factorestimate of log n . We point out that with constant probability, in our network model, due to the property of the d -regular randomgraph, an expected constant number of nodes might have multi-edges — this can potentially be used to break ties;however, this fails to work with constant probability. In any case, such symmetry breaking will fail in symmetricregular graphs. .3 Other Related Works There have been several works on estimating the size of the network, see e.g., the works of [14, 18,27, 33, 32], but all these works do not work under the presence of Byzantine adversaries. Therehave been some work on using network coding for designing byzantine protocols (see e.g., [19]);but these protocols have polynomial message sizes and are highly ineﬃcient for problems such ascounting, where the output size is small. There are also some works on topology discovery problemsunder Byzantine setting (e.g., [29]), but these do not solve the counting problem.Several recent works deal with Byzantine agreement, Byzantine leader election, and fault-tolerant protocols in dynamic networks. We refer to [15, 6, 2, 1, 3] and the references thereinfor details on these works. These works crucially assume the knowledge of the network size (or atleast an estimate of it) and don’t work if the network size is not known.There have been signiﬁcant work in designing peer-to-peer networks that are provably robust toa large number of Byzantine faults [12, 16, 28, 31]. These focus only on (robustly) enabling storingand retrieving data items. The problem of achieving almost-everywhere agreement among nodes inP2P networks (modeled as expander graphs) is considered by King et al. in [24] in the context ofthe leader election problem; essentially, [24] is a sparse (expander) network implementation of thefull information protocol of [23]. In another recent work [22], the authors use a spectral techniqueto “blacklist” malicious nodes leading to faster and more eﬃcient Byzantine agreement. The workof [15] presents a solution for maintaining a clustering of the network, where each cluster containsmore than two-thirds honest nodes with high probability in a setting where the size of the networkcan vary polynomially over time. All the above works assume an exact knowledge of or some goodestimate of the network size and do not solve the Byzantine counting problem.The work of [8] shows how to implement uniform sampling in a peer-to-peer system under thepresence of Byzantine nodes where each node maintains a local “view” of the active nodes. We pointout that the choice of the view size and the sample list size of Θ( n ) necessary for withstandingadversarial attacks requires the nodes to have a priori knowledge of a polynomial estimate of thenetwork size. [18] considers a dynamically changing network without Byzantine nodes where nodescan join and leave over time and provides a local distributed protocol that achieves a polynomialestimate of the network size. In [35], the authors present a gossip-based algorithm for computingaggregate values in large dynamic networks (but without the presence of Byzantine failures), whichcan be used to obtain an estimate of the network size. The work of [9] focuses on the consensusproblem under crash failures and assumes knowledge of log n , where n is the network size. The distributed computing model:

We consider a synchronous network represented by a graph G whose nodes execute a distributed algorithm and whose edges represent connectivity in the net-work. The computation proceeds in synchronous rounds, i.e., we assume that nodes run at thesame processing speed (and have access to a synchronized clock) and any message that is sent bysome node u to its neighbors in some round r ≥ r . Byzantine nodes:

Among the n nodes ( n or its estimate is not known to the nodes initially),up to B ( n ) can be Byzantine and deviate arbitrarily from the given protocol. Throughout thispaper, we assume that B ( n ) = O ( n − δ ) (where n is the unknown network size), for any d < δ ≤ u is honest if u is not a Byzantine node and use Honest to denote the set of5onest nodes in the network. Byzantine nodes are “adaptive”, in the sense that they have completeknowledge of the entire states of all nodes at the beginning of every round (including random choicesmade by all the nodes), and thus can take the current state of the computation into account whendetermining their next action (they also can know the future random choices of honest nodes).The Byzantine nodes have unbounded computational power, and can deviate arbitrarily from theprotocol. This setting is commonly referred to as the full information model . We assume that theByzantine nodes are randomly distributed in the network.

Distinct IDs:

We assume that nodes (including Byzantine) have distinct

IDs and they cannotlie about their ID while communicating with a neighbor. Note that the n distinct IDs (where n isthe unknown network size) are assumed to be chosen from a large space (not known a priori to thenodes). Note that this precludes (most) nodes from estimating log n by potentially looking at thelength of their IDs. Network Topology:

Let G = ( V, E ) be the graph representing the network. We take G to bethe union of two other graphs H and L (both deﬁned below). That is, V ( G ) = V ( H ) = V ( L ) = V ,say, and E ( G ) = E ( H ) ∪ E ( L ). We take H to be a sparse, random d -regular graph that isconstructed by the union of d (assume d ≥ n nodes. We call this random graph model the H ( n, d ) random graph model . It is known that sucha random graph is an expander with high probability. The H ( n, d ) random, regular graph model isa well-studied and popular random graph model (see e.g., [37]). In particular, the H ( n, d ) randomgraph model has been used as a model for Peer-to-Peer networks and self-healing networks [26, 30]. E ( L ) is deﬁned as follows. For u, v ∈ V , ( u, v ) ∈ E ( L ) if and only if dist ( u, v ) ≤ k in H , where k = ⌈ d ⌉ is a positive integer. In other words, each node has direct connections (via edges of L )to nodes that are within distance k . Note that adding the edges of L makes H a “small-world” network, i.e., for each node v in G , the neighbors of v within distance k in H are connected to eachother (thus the clustering coeﬃcient is increased in G compared to H ). The small-world propertycomplements the expander property of the d -regular random graph, since the clustering coeﬃcientof the random regular graph is small. We exploit both properties crucially in the protocol. Largerthe degree d , larger will be k , and large will be the robustness to Byzantine nodes, i.e., up to O ( n − δ ) Byzantine nodes can be tolerated where d < δ ≤ n ) degree and hence notconstant bounded degree, unlike our model.It is important to note that nodes in G do not know a priori which edges are in H and whichare in L . However, as shown in Lemma 3, most (honest) nodes can distinguish between the twotypes of edges using a simple protocol.We point out although we assume a speciﬁc type of network model described above — which,intuitively, is the worst case (most diﬃcult) scenario for the algorithm designer due to its (essen-tially) identical local topological structure — our results can be extended to apply to potentiallyany (sparse) graph that has high expansion and high clustering coeﬃcient (e.g., one can presumablytake any bounded-degree expander rather than a d -regular graph as H ). Problem and Goal:

Our goal is to design a distributed protocol to estimate the numberof nodes in G , even under the presence of a large number of Byzantine nodes. The problem isnon-trivial, since each node has a local view and knowledge that is independent of the network We give more details on the model and analyze its properties in the appendix (Section A). n ) rounds, and use only “small-sized” messages. A “small-sized message” is one that contains aconstant number of IDs and O (log n ) additional bits.We now present the formal deﬁnition of the Byzantine counting problem. Since we assume a sparse (constant bounded degree) network and a large number of Byzantine nodes, it is diﬃcult foran algorithm where every honest node eventually knows the exact estimate of n . This motivatesus to consider the following “approximate, almost everywhere” variant of counting: Deﬁnition 1 (Byzantine Counting) . Suppose that there are B ( n ) Byzantine nodes in the network.We say that an algorithm A solves Byzantine Counting in T rounds if, in any run of A :1. all honest nodes terminate in T rounds,2. all except B ( n ) + ǫn honest nodes (for any arbitrarily small constant ǫ > ) have a constantfactor estimate of log n (i.e., if L is the estimate, then c log n ≤ L ≤ c log n , for some ﬁxedpositive constants c and c ), where n is the actual network size. Deﬁnition 2.

For any two nodes u and v in V , the distance between them (in G ) is deﬁned as dist G ( u, v ) def = the length of a shortest path between u and v in G . Similarly, dist H ( u, v ) def = the lengthof a shortest path between u and v in H .Remark . For any node v ∈ V ( G ), we follow the convention that dist G ( v, v ) = dist H ( v, v ) = 0. Deﬁnition 3.

For any node u and any set V ′ ⊂ V ( G ) = V , the distance between u and V ′ (in G )is deﬁned as dist G ( u, V ′ ) def = min { dist G ( u, v ) | v ∈ V ′ } . Deﬁnition 4.

For any two subsets V ′ and V ′′ of V ( G ) = V , the distance between V ′ and V ′′ (in G ) is deﬁned as dist G ( V ′ , V ′′ ) def = min { dist G ( u, v ) | u ∈ V ′ , v ∈ V ′′ } .Remark . In all our notations, the subscript G or H denotes the underlying graph. We will,however, for the most part, talk about H . Thus to obtain notational simplicity, we will omit thesubscript H from now on. If at any point, we need to talk about G instead, we will explicitlymention the subscript G . For example, dist ( u, v ) will denote the length of a shortest path between u and v in H , whereas dist G ( u, v ) will be used to denote the length of a shortest path between u and v in G . And so on. Deﬁnition 5.

For any v ∈ V ( H ) and any positive integer r , B ( v, r ) is deﬁned as the set of nodeswithin the ball of radius r from v (including at the boundary), i.e, B ( v, r ) def = { w ∈ V ( H ) | ≤ dist ( v, w ) ≤ r } . Deﬁnition 6.

For any v ∈ V ( H ) and any positive integer r , Bd ( v, r ) is deﬁned as the set of nodesat distance r from v (i.e., at the boundary), i.e, Bd ( v, r ) def = { w ∈ V ( H ) | dist ( v, w ) = r } . Next we introduce the “locally tree-like” property of an H ( n, d ) random graph: i.e., for mostnodes w , the subgraph induced by B ( w, r ) up to a certain radius r looks “like a tree”. This isstated more precisely as follows. 7 eﬁnition 7. Let G be an H ( n, d ) random graph and w be any node in G . Consider the subgraphinduced by B ( w, r ) for r = log n

10 log d . Let u be any node in Bd ( w, j ) , ≤ j < r . u is said to be“typical” if u has only one neighbor in Bd ( w, j − and ( d − -neighbors in Bd ( w, j + 1) ; otherwiseit is called “atypical”. Deﬁnition 8.

We call a node w “locally tree-like” if no node in B ( w, r ) is atypical. In other words, w is “locally tree-like” if the subgraph induced by B ( w, r ) is a ( d − -ary tree. It can be shown using properties of the H ( n, d ) random graph model and standard concentrationbounds (cf. Section A) that most nodes in G are locally tree-like. Lemma 1.

In an H ( n, d ) random graph, with high probability, at least n − O ( n . ) nodes are locallytree-like. For the proof of this lemma, as well as for further details about the H ( n, d ) random graphmodel, please refer to Section A. Observation . In a d -regular graph, for any vertex v , the number of vertices that are within a τ -distance of v is bounded by ( d − τ +1 , i.e., | B ( v, τ ) | < ( d − τ +1 .Since any two vertices that are within τ -distance of each other in G is within kτ distance ofeach other in H (which is a d -regular graph), we have that Observation . In the graph G , for any vertex v , the number of vertices that are within a τ -distanceof v is bounded by ( d − kτ +1 , i.e., | B G ( v, τ ) | < ( d − kτ +1 . Deﬁnition 9.

We categorize the nodes in V into the following distinct categories. Unlike our usualconvention, the distances referred to in this deﬁnition refer to the respective distances in G (not in H , as is usual).1. Byzantine nodes:

The set of Byzantine nodes is denoted by

Byz .2.

Honest nodes:

The set of honest nodes is deﬁned to be

Honest def = V \ Byz .3.

Locally tree-like nodes:

Please refer to Deﬁnition 8. That is, the set of the locally tree-likenodes is deﬁned as

LTL def = { v ∈ V | v is locally tree-like } .4. Non-locally-tree-like nodes:

The set of the non-locally-tree-like nodes is deﬁned as

NLT def = V \ LTL .5.

Unsafe nodes:

The set of nodes that have one or more

NLT nodes within a distance of a log n , where a def = δ k log ( d − . If we denote the set of unsafe nodes by Unsafe , then

Unsafe def = { v ∈ V | dist G ( v, NLT ) ≤ a log n } .6. Safe nodes:

Nodes that are not unsafe. In other words, the set of nodes that have no

NLT nodes within a distance of a log n . If we denote the set of safe nodes by Safe , then

Safe def = { v ∈ V | dist G ( v, NLT ) > a log n } .7. Bad nodes:

The set of bad nodes is deﬁned to be

Bad def = Byz ∪ NLT . . Byzantine-Unsafe nodes:

The set of nodes that have one or more bad nodes within adistance of a log n , where a def = δ k log ( d − . If we denote the set of Byzantine-unsafe nodes by BUS , then

BUS def = { v ∈ V | dist G ( v, Bad ) ≤ a log n } .9. Byzantine-Safe nodes:

Nodes that are not Byzantine-unsafe. In other words, the set ofnodes that have no bad nodes within a distance of a log n . If we denote the set of Byzantine-safe nodes by Byz-safe , then

Byz-safe def = { v ∈ V | dist G ( v, Bad ) > a log n } . Lemma 2.

The various node sets deﬁned in Deﬁnition 9 have the following sizes, respectively.1. | Byz | = n − δ .2. | Honest | = n − n − δ .3. | LTL | ≥ n − O ( n . ) .4. | NLT | ≤ O ( n . ) .5. | Unsafe | ≤ O ( n . δ ) = o ( n ) .6. | Safe | ≥ n − O ( n . δ ) = n − o ( n ) .7. | Bad | ≤ n − δ + n . ≤ n − δ (assuming δ ≤ . ).8. | BUS | ≤ d − n − δ = o ( n ) .9. | Byz-safe | ≥ n − d − n − δ = n − o ( n ) .Proof.

1. By deﬁnition.2. By deﬁnition.3. By Lemma 1.4. By Lemma 1.5. By Deﬁnition 9, Observation 2, and Lemma 1.6. By Deﬁnition 9, Observation 2, and Lemma 1.7. By deﬁnition.8. Follows from Observation 2 and the deﬁnition of

BUS .9. By deﬁnition.

Deﬁnition 10.

We call u a child of w with respect to v (or w the parent of u , with respect to v )if u is a child of w in the BFS tree rooted at v . Similarly, we call u and w siblings with respect to v if they are siblings in the BFS tree rooted at v . We note that, as is our usual custom, this BFStree is in the graph H and not in G .

9t is important to note that nodes in G do not know a priori which edges are in H and which arein L . However, the following lemma assures us that most (honest) nodes can distinguish betweenthe two types of edges using a simple protocol. Lemma 3.

For any honest node v , if v has no Byzantine neighbor in G (that is, no Byzantineneighbor in its k -distance neighborhood in H ), then v can faithfully reconstruct the topology of its k -distance neighborhood in H from the information it is provided by its G -neighbors.Proof. For any x ∈ V ( G ), let N G ( x ) denote the set of G -neighbors of x . Let w and u be two G -neighbors of v . Then we observe that • w is a child of u with respect to v if and only if N G ( w ) ∩ N G ( v ) ⊂ N G ( u ) ∩ N G ( v ). • u is a child of w with respect to v if and only if N G ( u ) ∩ N G ( v ) ⊂ N G ( w ) ∩ N G ( v ). • u and v are siblings if u ∈ N G ( w ) and w ∈ N G ( u ) but neither of them is a child of the other. Remark . Since d and k are constants, the list of neighbors is still O (1), and hence can be exchangedin a constant number of rounds (using small sized messages). For the sake of exposition, we ﬁrst describe the algorithm and analyze its behavior free from theinﬂuence of any Byzantine nodes; in other words, we will assume that all nodes (including Byzantinenodes) honestly execute the protocol without malicious behavior. We will discuss any maliciouseﬀects the Byzantine nodes may have in Section 3.3 and describe how to modify the algorithm (andanalysis) to counter the Bzyantine nodes.

Phases and subphases:

This is a distributed algorithm that runs in phases . In the i th phase, thealgorithm works with the current estimation of log n , which is i . We reserve the letter i exclusivelyto denote the phase that the algorithm is presently in. For i ≥

1, the i th phase consists of severalruns (repetitions) of the same random experiment (the random experiment is described in the nextfew paragraphs; also see Lines 10 through 18 of the pseudocode in Algorithm 1). We call one suchrun a subphase of the i th phase. We would usually index the subphases by j , i.e., we will veryfrequently use the phrase “in the j th subphase of the i th phase” in our description and analysis ofthe algorithm. We note that in a synchronized network the value of i and j are known to all nodes.The i th phase consists of exactly α i subphases (repetitions), where α i def = ⌈ log ( ǫ )+ i +1 − log d ( i −

2) log ( d − ⌉ . Wecall ǫ the error parameter — by changing its value (please refer to Line ?? in the pseudocode inAlgorithm 1), we (the algorithm designer) can control exactly how large a fraction of the honestnodes would estimate log n correctly (i.e., get a constant-factor approximation of log n ). Theorem1, which is the main result of this paper, tells us that at most ǫ -fraction of the honest nodes would fail to get a constant factor approximation of log n . Basic idea (see also Section 1.2) : In one random experiment, i.e., in the j th subphase of the i th phase, say, every node sends out some tokens (these contain some information) that propagatethrough the network (by ﬂooding) for some (pre-determined) number of steps (rounds), at the end10f which every node takes stock of the tokens it has received over the intermediate rounds. Color of a token:

Every token circulating in the network has a color (deﬁned next), which ispassed down to a token from its generating node. Every node v tosses an unbiased coin until it getsits ﬁrst head (see Line 10 of the pseudocode in Algorithm 1). If a node v gets its ﬁrst head at the r th trial, we call r to be the color of v (see Line 11 of the pseudocode in Algorithm 1). Thus the color ofa node is always a positive integer, which may be (but is not necessarily) diﬀerent for diﬀerent nodes. Estimating log n : When i is much smaller than log n , most nodes will receive their respectivehighest colored tokens in the last round. In contrast, when i is of the same order as log n , mostnodes will have received their respective highest colored tokens much before the last round. Thisprovides a node with a way to determine when its estimate of log n , which is i , has reached closeto the actual value of log n . Algorithm 1

The basic counting algorithm (in the absence of Byzantine nodes). Code for node v . Ask all the neighbors for their respective adjacency lists and distinguish between the edges of H and L from that information. for i ← , , . . . do ⊲ i denotes the phase node v is in F lagT erminate ← if d ( d − i − ≤ ǫ then α i ← ⌈ log ( ǫ )+ i +1log d +( i −

2) log ( d − − ⌉ ⊲ < ǫ < else α i ← i +1log ( ǫ ) end if for j ← , , . . . , iα i do ⊲ Phase i consists of α i subphases; the subphases are indexed by j v tosses a fair coin until the outcome is heads in the r -th trial, for some r ≥ c v,i ← r Flood the color c v,i , along the edges of H only, for exactly i steps. ⊲ This is possible byvirtue of Lemma 3. for time t = 1 , , . . . , i do In each round t , mark and store the highest color received. Let’s call it k t end for if k i > k t , ∀ ≤ t log ( d ( d − i − ) − log log ( d ( d − i − ) then F lagT erminate ← end if end for if F lagT erminate = 1 then

Decide i and terminate all for-loops. ⊲ v accepts i as the estimate of log n else Continue to the next phase i + 1. end if end for .2 Analysis of the algorithm (assuming Byzantine nodes behave honestly) In this section we show that the algorithm gives a ( ba )-factor approximation of log n with highprobability, where a def = δ k log ( d − and b def = hd ) , where h is the edge-expansion of H . Note that0 < a < b <

1. We recall that n − δ is the number of Byzantine nodes in the network G , and d isthe uniform degree of H . ( H is a subset of G . For the exact deﬁnition of H , please refer to Section2.1.) Observation . b log n ≥ D ( H ), where D ( H ) is the diameter of H . High-level overview of the proof

We break our analysis up into two diﬀerent stages of thealgorithm. We show that the following statements hold with high probability.1. For i < a log n , at least (1 − ǫ )-fraction of the good nodes do not accept i to be the rightestimate of log n , and they continue with the algorithm. The rest of the nodes — i.e., atmost ǫ -fraction of the good nodes — even though they have stopped generating tokens, stillcontinue to forward tokens generated by other nodes. 0 < ǫ ≤ i = b log n , all but o ( n ) of the remaining active nodes accept i to be the estimate of log n and they stop producing tokens. They however continue to forward tokens generated by other(if any) nodes.We cannot say which way a node will decide when a log n ≤ i < b log n . The above twostatements are, however, suﬃcient to give us an approximation factor of ba = k log ( d − δ log (1+ hd ) . i is small: In particular, when i < a log n For the sake of the analysis in this subsection only, we will consider only safe nodes , i.e., onlythose nodes in the set

Safe . Let us ﬁrst take note of a few properties of the geometric distributionthough; these will be useful later.

Observation . For any node v and any positive integer r ,1. P r [ c v = r ] = r .2. P r [ c v ≥ r ] = r − .3. P r [ c v < r ] = 1 − P r [ c v ≥ r ] = 1 − r − .4. P r [ c v ≤ r ] = 1 − P r [ c v ≥ r + 1] = 1 − r .5. P r [ c v > r ] = 1 − P r [ c v ≤ r ] = r .For any non-empty V ′ ⊂ V ( G ), c max V ′ is deﬁned as c max V ′ def = { c v | v ∈ V ′ } . Suppose | V ′ | = n ′ . Observation . For any positive integer j ,1. P r [ c max V ′ < r ] = ( P r [ c v < r ]) n ′ = (1 − r − ) n ′ .2. P r [ c max V ′ ≥ r ] = 1 − P r [ c max V ′ < r ] = 1 − (1 − r − ) n ′ .3. P r [ c max V ′ ≤ r ] = P r [ c max V ′ < r + 1] = (1 − r ) n ′ .12. P r [ c max V ′ > r ] = P r [ c max V ′ ≥ r + 1] = 1 − (1 − r ) n ′ .5. P r [ c max V ′ = r ] = P r [ c max V ′ ≥ r ] − P r [ c max V ′ > r ] = (1 − r ) n ′ − (1 − r − ) n ′ . Lemma 4.

P r [ c max V ′ > n ′ ] ≤ n ′ .Proof. P r [ c max V ′ > n ′ ] = 1 − (1 − n ′ ) n ′ = 1 − (1 − n ′ ) n ′ ≤ − (1 − n ′ n ′ ) = 1 − (1 − n ′ ) = 1 n ′ . Lemma 5.

P r [ c max V ′ ≤ log n ′ − log log n ′ ] < n ′ .Proof. P r [ c max V ′ ≤ log n ′ − log log n ′ ] = (1 − log n ′ − log log n ′ ) n ′ = (1 − log n ′ n ′ ) n ′ ≤ exp( − (log n ′ ) .n ′ n ′ ) = exp( − log n ′ ) < n ′ .For any node v and any non-negative integer r , let us denote the set B ( v, r ) \ { v } by B ∗ ( v, r ).We recall that from the “locally tree-like property” (cf. Deﬁnition 8 and Lemma 1), for any safenode v , | B ( v, r ) | = 1 + d · P rj =1 ( d − j − = ⇒ | B ∗ ( v, r ) | = d P rj =1 ( d − j − = d ( d − r d − , and | Bd ( v, r ) | = d ( d − r − .For any positive integer r , let l r def = log d + r log ( d − l r = l r − + log ( d − Lemma 6. log ( | B ∗ ( v, r ) | ) = l r − log ( d − and log ( | Bd ( v, r ) | ) = l r − log ( d − . Lemma 7.

P r [ c max B ∗ ( v,r ) > l r − log ( d − ≤ d − d ( d − r .Proof. Follows from Lemma 4 and Lemma 6.

Lemma 8.

P r [ c max Bd ( v,r ) ≤ l r − log ( d − − log ( l r − log ( d − < d ( d − r − .Proof. Follows from Lemma 5 and Lemma 6.Next we show that the probability that a safe node decides to stop (when i < a log n ) is boundedby a constant (any arbitrarily small, but ﬁxed constant). Lemma 9.

P r [ a safe node v makes a wrong decision in the i th phase ] < ǫ i +1 .

13e will use a series of other, smaller results to show the above. One subtle issue to keep in mindis that since the “failure probability” for a safe node is not v , say, in phase i , wehave to take into consideration the fact that there may be some nodes in the i -hop neighborhood of v , i.e., in B ( v, i ), that made a wrong decision in some previous phase j , j < i , and is thus inactive in phase i .We show this by induction on i , where i is the phase-number. We note that in the very ﬁrstphase, all the nodes are active, thus there is no need to consider inactive nodes. This helps us provethe basis of the induction. Next we assume that for any i < log n , the probability that a safe node v went inactive in some previous phase i ′ , i ′ < i , is at most ǫ i ′ +1 , where ǫ is the error parameter.This forms the inductive hypothesis. Assuming this, we go on to show that the failure probabilityfor a safe node in the i th phase is less than ǫ i +1 .We defer the detailed, formal proof to the appendix — please refer to Section B. Translating the constant probability of error into a “low” probability of error.

Lemma9 promises us that any individual node has a small probability of error when i < a log n . So theexpected number of nodes to make an error is also small. We, however, want to show a highprobability bound on the number of nodes that make a mistake.In order to show that, we proceed along the usual way of formulating an indicator randomvariable and then computing the expectation of the sum of the individual indicator random variablesby using the principle of linearity of expectation. We show the high probability bound by usingthe method of bounded diﬀerences (Azuma’s Inequality, more speciﬁcally).Now to the formal description.Let Y vi be an indicator random variable which is 1 if and only if v decides i to be a correctestimate of log n . Lemma 9 shows that P r [ Y vi = 1] < ǫ i +1 . Now let Y i = X v ∈ V Y vi .That is, Y i denotes the number of nodes that decide wrongly in the i th phase. We recall once againthat here we are interested only in the case where i < a log n . Then Y i cannot be too large, i.e.,not too many nodes can decide wrongly in one phase. Lemma 10.

P r [ Y i > nǫ i ] < n if i < a log n .Proof. E [ Y i ] = E [ X v ∈ V Y vi ] = X v ∈ V E [ Y vi ] [by linearity of expectation]= X v ∈ V P r [ Y vi = 1] [since Y vi is an indicator random variable] < X v ∈ V ǫ i +1 = nǫ i +1 Two vertices v and w are independent if their i -distance neighborhoods do not intersect, i.e.,if the distance between them is greater than 2 i . In other words, v going defective can aﬀect only14hose vertices that are within a distance of 2 i to v . The number of vertices that are within a 2 i distance of v is at most ( d − (2 i +1) . By the Azuma-Hoeﬀding Inequality [10], P r [ Y i − E [ Y i ] ≥ nǫ i +1 ] ≤ exp( − ( nǫ i +1 ) n · ( d − i +1) ) = exp( − nǫ i +3 . ( d − i +2 )= exp( − nǫ k ), say, where k = 2 i + 3 + (4 i + 2) log ( d − i < a log n = δ log n

10 log ( d − , k < δ log n d −

1) + 3 + 2 log ( d −

1) + 2 δ log n δ log n d −

1) (2 log ( d −

1) + 1) + 2 log ( d −

1) + 3 < log n − log log n − − ǫ )[assuming log n > d − n + 2 log ( d −

1) + 3 + 2 log ( ǫ ) + 2)(5 − δ ) log ( d − − δ ,which is true for large enough values of n ]Thus P r [ Y i − E [ Y i ] ≥ nǫ i +1 ] ≤ exp( − nǫ k ) ≤ exp( − nǫ log n − log log n − − ǫ ) )= exp( − nǫ n n · ǫ ) = exp( − n ) < n But again E [ Y i ] < nǫ i +1 . Hence P r [ Y i > nǫ i ] ≤ P r [ Y i − E [ Y i ] ≥ nǫ i +1 ] < n .Now this is true for one particular phase i . Summing over all the phases (recall that we areconcerned here only with the case i < a log n ), we get that the fraction of nodes that make a wrongdecision cannot be more than X i (1 − X i (1 − n ).Thus we have Lemma 11.

For Algorithm 1, the following holds with probability > − n : While ≤ i < a log n ,at most ǫ -fraction of the nodes decide wrongly, i.e., decide i to be a correct estimate of log n (where ǫ is any arbitrarily small but ﬁxed positive constant).Proof. Follows from Lemma 10 and Lemma 2. 15 .2.2 When i = Θ(log n ) : In particular, when i = b log n Here we show that the following statement holds with probability at least 1 − n : If a node v isstill active at the beginning of this phase, by the end of this phase, it accepts the current value of i , i.e., b log n , to be a correct estimate of log n and terminates. Lemma 12.

The following holds with probability at least − n : In all the iα i subphases of phase i (where i = b log n ), it is always the case that c max V ≤ n − , where c max V def = { c v | v ∈ V } , i.e.,the highest color generated in the network.Proof. From Observation 4, for any particular node w , P r [ c w > n −

1] = n − = n . Takingthe union bound over all w ∈ V ( G ), P r [ c max V > n − ≤ n (1)This is for one subphase of the i th phase. Since there are iα i subphases in the i th phase,we take the union bound over all the subphases and get that with probability at least 1 − iα i n , c max V ≤ n − all the iα i subphases. But iα i n = Θ(log n ) n < n . Thus with probabilityat least 1 − n , c max V ≤ n − all the iα i subphases. Lemma 13.

The following holds with probability at least − n for Algorithm 1: If a node v isstill active at the beginning of phase i (when i = b log n ), by the end of this phase, it accepts thecurrent value of i , i.e., b log n , to be a correct estimate of log n and terminates.Proof. We recall that in order for an honest node v to continue after this phase, the followingcriterion must be satisﬁed at least once in the iα i subphases of the i th phase (Please see Line 17 ofthe pseudocode): k i > log d + ( i −

1) log ( d − − log (log d + ( i −

1) log ( d − k i is the highest color that v receives after i rounds, i.e., at the end of the j th subphase ofthe i th phase. Substituting i = b log n = n log (1+ hd ) > n , we get that in order for an honestnode v to continue after this phase, the following criterion must be satisﬁed at least once in the iα i subphases of the i th phase: k i > log d + ( i −

1) log ( d − − log (log d + ( i −

1) log ( d − > log d + (4 log n −

1) log ( d − − log (log d + (4 log n −

1) log ( d − > . (log d + (4 log n −

1) log ( d − > (4 log n − . log ( d − ≥ n −

1, assuming log ( d − ≥

2, or equivalently, d ≥ − n , no node generates a color > n −

1. Therefore v will not receive any such color ( > n −

1) either. So in all the iα i subphases of phase i (where i = b log n ), k i will always be ≤ n − v will not continue after this phase. We next discuss the modiﬁcations made to the Basic Counting Protocol (Algorithm 1) to counterthe eﬀect of the Byzantine nodes — this gives us the Byzantine Counting Protocol (Algorithm 2).16 .3.1 Description of the modiﬁcations in the algorithm

1. At the very beginning (that is, even before phase 1 starts), every honest node v asks itsneighbors in G for their own IDs and the IDs of their respective neighbors. We observe thatthis takes a constant number of rounds. From that neighborhood information of its neighbors, v tries to reconstruct the topology of its k -distance neighborhood in H . Lemma 3 tells usthat this is possible when there are no Byzantine nodes.When there are Byzantine nodes, however, they can try to provide false neighborhood datato v . The algorithm dictates that v shuts itself down (that is, goes into crash failure ) if v receives inconsistent or conﬂicting data from two or more of its neighbors. Please refer toLine 2 of the pseudocode in Algorithm 2.2. For every color that v receives from a neighbor w , say, v checks (via the lattice edges , i.e., theedges of L ) with all the nodes in B ( w, k −

1) (this ball B is deﬁned with respect to H ) toverify that w indeed received that color via a legitimate path (up to a distance of k −

1) fromits ( k − H . Please refer to Line 15 of the pseudocode in Algorithm 2.We note a minor detail here: For colors received within the ﬁrst t time-steps (in any j th subphase of any phase i ), when 1 ≤ t ≤ k −

1, an honest node v checks with nodes in thesmaller ball B ( w, t ) (instead of B ( w, k − v ) the Byzantine nodes cannot fool v intobelieving the existence of a k -length chain, composed purely of Byzantine nodes, in its k -distance neighborhood in H . Thus it ensures that a Byzantine node is not able to push anyarbitrary color into the network without raising a ﬂag.17 .3.2 The Pseudocode for the Byzantine counting algorithm Lines 2 and 15 respectively indicate the changes from the previous algorithm (please refer to Al-gorithm 1). These lines are shown in boldface. Suppose that a node sends a message with somecolor c . We say that c is a legitimate color if it was generated by an honest node. Note that somenodes might be forwarding colors generated by Byzantine nodes. Algorithm 2

The Byzantine counting algorithm. Code for an honest node v . Ask all the neighbors (in G ) for their respective adjacency lists and distinguish between theedges of H and L from that information. If v gets conﬂicting or contradictory information from two or more of its neighborsin G , v shuts down, i.e., v goes into crash failure . for i ← , , . . . do ⊲ i denotes the phase node v is in F lagT erminate ← if d ( d − i − ≤ ǫ then α i ← ⌈ log ( ǫ )+ i +1log d +( i −

2) log ( d − − ⌉ ⊲ < ǫ < else α i ← i +1log ( ǫ ) end if for j ← , , . . . , iα i do v tosses a fair coin until the outcome is heads in the r -th trial, for some r ≥ c v,i ← r Flood the color c v,i , along the edges of H only, for exactly i steps. for time t = 1 , , . . . , i do In each round t , for every received color c , if v got c from its neighbor (in H ) w , say, v checks with the ( k − -distance neighbors (in H ) of w to verify that c is a legitimate color. In each round t , mark and store the highest color received. Let’s call it k t end for if k i > k t , ∀ ≤ t log d + ( i −

1) log ( d − − log (log d + ( i −

1) log ( d − then F lagT erminate ← end if end for if F lagT erminate = 1 then

Decide i and terminate all for-loops. ⊲ v accepts i as the estimate of log n else Continue to the next phase i + 1. end if end for Let

Crashed be the set of honest nodes that shut themselves down at the very beginning of thealgorithm (please see Line 2 of the pseudocode in Algorithm 2). Let

Core be the largest connectedcomponent in H induced by Uncrashed , where

Uncrashed def = Honest \ Crashed .18 emma 14 (See [5]) . Core has size at least n − o ( n ) . Moreover, Core is an expander with edge-expansion at least γ , where γ > is a constant.Proof. Follows from Lemma 3 in [5].

Observation . In the graph H , with high probability, there is no chain of length ≥ k composed ofByzantine nodes only. Proof.

We have that k = ⌈ d ⌉ and δ > d , implying kδ >

1. We assume that kδ = 1 + δ ′ for a ﬁxedpositive constant δ ′ .The number of possible k -length chains is upper-bounded by n · d k − . We recall that theByzantine nodes are randomly distributed in the network. Therefore, for any one such chain, theprobability that it is composed purely of Byzantine nodes is ( n − δ n ) k = n − kδ . By union bound, theprobability that there is at least one chain made only of Byantine nodes is upper-bounded by n · d k − · n − kδ = n · d k − · n − (1+ δ ′ ) [since kδ = 1 + δ ′ ]= d k − n δ ′ , which is low probability for a ﬁxed positive constant k . Lemma 15.

The following statement holds with high probability: ∀ v ∈ Honest , the Byzantinenodes cannot make v believe that there is a chain of length ≥ k composed entirely of Byzantinenodes, without shutting v down.Proof. Consider an honest node v . If v has no Byzantine neighbors in G , then v gets true neighbor-hood information from all its neghbors in G , and is thus able to accurately reconstruct the exacttopology of its k -distance neighborhood in H (please refer to Observation 3). Since H does nothave any k -length chain of Byzantine nodes (please see Observation 6), v ’s reconstruction will havenone either.So suppose v has one or more Byzantine neighbors in G . Let C be the ﬁnal k -length chainwhose existence the Byzantine nodes are trying to “trick” v into believing. Since, in truth, C hasat most k − dummy node b , say,which the Byzantine nodes will try to insert into C (that is, to make it look as such in v ’s eyes).Thus, b , the (fake) parent of b in the chain C , must report to v that it has b as a child.While doing so, however, b will need to suppress the existence of a (real) child u (which may ormay not be Byzantine) because b will need to maintain its degree d in H (in v ’s eyes).But as u is directly connected to v in G , the Byzantine nodes cannot disrupt the communicationbetween u and v . Since u knows b to be its neighbor in H (we recall that b , even though Byzantine,cannot lie about its ID to u ), the algorithm would dictate that u let it be known to v (regardlessof whether or not u is Byzantine).Therefore, v will have two conﬂicting pieces of information: it will hear from b that b and u are not neighbors in H , and v will hear the exact opposite from u . Thus, as per the algorithm, v will go into crash failure , i.e., will shut itself down (see Line 2 of the pseudocode in Algorithm2. In this section we show that the algorithm gives a ( ba )-factor approximation of log n with highprobability, where a def = δ k log ( d − and b def = γd ) , where γ is the edge-expansion of Core . Note19 w w w i x j b u b w d Figure 1: C = ( w i , x j , . . . , b , b ) is the k -length chain the Byzantine nodes are trying to tamperwith. In reality, b is not a child of b (even though b is directly connected to v in the graph G ).So b must hide the existence of a real child u , say, in order to concoct the existence of the fake,Byzantine child b .that 0 < a < b <

1. We recall that n − δ is the number of Byzantine nodes in the network G , and d is the uniform degree of H . ( H is a subset of G . For the exact deﬁnition of H , please refer toSection 2.1.) Observation . b log n ≥ D ( Core ), where D ( Core ) is the diameter of

Core . High-level overview of the proof

We break our analysis up into two diﬀerent stages of thealgorithm. We show that the following statements hold with high probability.1. Then for i < a log n , at least (1 − ǫ )-fraction of the good nodes do not accept i to be theright estimate of log n , and they continue with the algorithm. The rest of the nodes — i.e.,at most ǫ -fraction of the good nodes — even though they have stopped generating tokens,still continue to forward tokens generated by other nodes. 0 < ǫ ≤ i = b log n , all but o ( n ) of the remaining active nodes accept i to be the estimate of log n and they stop producing tokens. They however continue to forward tokens generated by other(if any) nodes.We cannot say which way a node will decide when a log n ≤ i < b log n . The above twostatements are, however, suﬃcient to give us an approximation factor of ba = k log ( d − δ log (1+ γd ) . Thisgives us our main result of the paper: Theorem 1.

Algorithm 2, with high probability, solves the Byzantine counting problem with up to O ( n − δ ) (randomly distributed) Byzantine nodes (where δ > is a small ﬁxed constant that dependson d ) and runs in Θ(log n ) rounds with the guarantee that all but an ǫ -fraction of the nodes in thenetwork (for any arbitrary small positive constant ǫ ) have a constant factor approximation of log n ,where n is the number of nodes in the network. The proof of the above Theorem is shown in the following Sections.20 .4.3 When i is small: In particular, when i < a log n For the sake of the analysis in this subsection only, we will consider only

Byzantine-safe nodes , i.e.,only those nodes in the set

Byz-safe .We note that while i < a log n , no token generated by a Byzantine node reaches a Byzantine-safenode (by the very deﬁnition of a Byzantine-safe node, as deﬁned in Deﬁnition 9). Thus for any v ∈ Byz-safe ⊂ Safe , the exact same analysis of Section 3.2.1 remains valid. That is, we have thesame result, i.e., Lemma 11, as in the Byzantine-free setting. i = Θ(log n ) : In particular, when i = b log n We showed in Section 3.2.2 that the following statement holds with probability at least 1 − n : Ifan honest node v is still active at the beginning of this phase, by the end of this phase, it acceptsthe current value of i , i.e., b log n , to be a correct estimate of log n and terminates.But the aforementioned analysis in Section 3.2.2 takes into account tokens generated by thehonest nodes only. The Byzantine nodes can generate arbitratily high colors in any subphase ofany phase. But we argue that they too are restricted by the structure of the network G . Inparticular, we argue that a Byzantine node can push a high-colored token (that is, a token withcolor > log ( d ( d − i − ) − log log ( d ( d − i − ) in phase i ) into the network only at the beginningof a subphase, and not at some arbitrary point in the middle of a subphase. More speciﬁcally weshow (by exploiting the structure of the network, i.e., of the graph G ): Lemma 16.

The following statement holds with high probability: If a core node receives a high-colored token (generated by some Byzantine node) in round t in some subphase j , ≤ j ≤ α i , then ≤ t ≤ k − .Proof. Suppose not. Suppose that there is at least one core node that receives a high-colored tokenin some subphase j , where 1 ≤ j ≤ α i . Let t ≥ k be the earliest time-instant in that subphasewhen a core node receives a high-colored token. That is, there is some core node v that receives ahigh-colored token from a neighbor b , say, in the t th round. Now b has to pretend that it receivedthe high-colored token from somebody else (because b is not allowed to generate a token itself inthe middle of a subphase). Since v has edges (the edges in L ) to all the nodes in B ( b, k − v cancontact all those nodes directly and check the veracity of b ’s claim. Since t ≥ k , and since thereare no Byzantine chains of length ≥ k (Please see Observation 6 and Lemma 15), there will be atleast one honest node on any chain in B ( b, k −

1) who would testify against b . Lemma 17.

For v ∈ Core , if v is still active at the beginning of phase i (when i = b log n ), thenwith high probability, by the end of this phase, it accepts the current value of i , i.e., b log n , to be acorrect estimate of log n and terminates.Proof. Lemma 16 says that the Byzantine nodes would not be able to push tokens into

Core after the ﬁrst ( k − Core within the ﬁrst ( k − j of the i th phase, where i = b log n .But once even one core node receives a high color — Core being an expander (Please refer toLemma 14) — that high color will start propagating through the network by means of ﬂooding andwill therefore reach every uncrashed node v within D ( Core ) rounds, where D ( Core ) is the diameterof

Core . By Observation 7, this means that the highest color introduced by the Byzantine nodeswill reach every core node v within ( b log n + k − v , v will receive no higher color in round i than what it hasalready received before. This will violate the criterion for continuing, in particular, the variable F lagT erminate will not be assigned the value 0 (Please see Line 19 of the pseudocode in Algorithm2. Therefore v will accept the current value of i , which is b log n , to be a correct estimate of log n and will terminate.Thus we have Lemma 18. If i = b log n , after the i th phase, the following statement holds with high probability:All but o ( n ) of the nodes that were active at the beginning of this phase accept i to be the correctestimate of log n .Proof. Follows from Lemma 17 and Lemma 14.Lemma 18 together with Lemma 11 give us Theorem 1, which is the main result of this paper.

In this paper, we take a step towards designing localized, secure, robust, and scalable algorithmsfor large-scale networks. We presented a fast (running in O (log n ) rounds) and lightweight (onlysimple local computations per node per round) distributed protocol for the fundamental Byzantinecounting problem tolerating O ( n − δ ) (for any constant δ >

0) Byzantine nodes while using onlysmall-sized communication messages per round. Our work leaves many questions open.A key open problem is to show a lower bound that is essentially tight with respect to the amountof Byzantine nodes that can be tolerated, or show an algorithm that can tolerate signiﬁcantly moreByzantine nodes. Our protocol works only when the Byzantine nodes are randomly distributed; itwill be good to remove this assumption and design a protocol that works under Byzantine nodesthat are adversarially distributed. Another interesting question is whether one can improve theapproximation factor of the estimate of log n to 1 ± o (1).22 eferences [1] John Augustine, Anisur Rahaman Molla, Ehab Morsy, Gopal Pandurangan, Peter Robinson,and Eli Upfal. Storage and search in dynamic peer-to-peer networks. In Proceedings of theTwenty-ﬁfth Annual ACM Symposium on Parallelism in Algorithms and Architectures , SPAA’13, pages 53–62, New York, NY, USA, 2013. ACM.[2] John Augustine, Gopal Pandurangan, and Peter Robinson. Fast byzantine agreement in dy-namic networks. In

Proceedings of the 2013 ACM Symposium on Principles of DistributedComputing , PODC ’13, pages 74–83, New York, NY, USA, 2013. ACM.[3] John Augustine, Gopal Pandurangan, and Peter Robinson. Fast byzantine leader electionin dynamic networks. In Yoram Moses, editor,

Distributed Computing: 29th InternationalSymposium, DISC 2015, Tokyo, Japan, October 7-9, 2015, Proceedings , pages 276–291, Berlin,Heidelberg, 2015. Springer Berlin Heidelberg.[4] John Augustine, Gopal Pandurangan, and Peter Robinson. Distributed algorithmic founda-tions of dynamic networks.

SIGACT News , 47(1):69–98, March 2016.[5] John Augustine, Gopal Pandurangan, Peter Robinson, Scott T. Roche, and Eli Upfal. Enablingrobust and eﬃcient distributed computation in dynamic peer-to-peer networks. In

IEEE 56thAnnual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA,17-20 October, 2015 , pages 350–369, 2015.[6] John Augustine, Gopal Pandurangan, Peter Robinson, and Eli Upfal. Towards robust andeﬃcient computation in dynamic peer-to-peer networks. In

Proceedings of the Twenty-thirdAnnual ACM-SIAM Symposium on Discrete Algorithms , SODA ’12, pages 551–569, Philadel-phia, PA, USA, 2012. Society for Industrial and Applied Mathematics.[7] Marc Barth´el´emy and Lu´ıs A. Nunes Amaral. Small-world networks: Evidence for a crossoverpicture.

Physical Review Letters , 82:3180–3183, April 1999.[8] Edward Bortnikov, Maxim Gurevich, Idit Keidar, Gabriel Kliot, and Alexander Shraer.Brahms: Byzantine resilient random membership sampling.

Computer Networks , 53(13):2340– 2359, 2009. Preliminary version in PODC 2008.[9] Bogdan S. Chlebus and Dariusz R. Kowalski. Locally scalable randomized consensus for syn-chronous crash failures. In

Proceedings of the Twenty-ﬁrst Annual Symposium on Parallelismin Algorithms and Architectures , SPAA ’09, pages 290–299, New York, NY, USA, 2009. ACM.[10] Devdatt Dubhashi and Alessandro Panconesi.

Concentration of Measure for the Analysis ofRandomized Algorithms . Cambridge University Press, New York, NY, USA, 1st edition, 2009.[11] Cynthia Dwork, David Peleg, Nicholas Pippenger, and Eli Upfal. Fault tolerance in networksof bounded degree.

SIAM Journal on Computing , 17(5):975–988, 1988.[12] Amos Fiat and Jared Saia. Censorship resistant peer-to-peer content addressable networks. In

Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA’02, pages 94–103, Philadelphia, PA, USA, 2002. Society for Industrial and Applied Mathe-matics.[13] Joel Friedman. On the second eigenvalue and random walks in random d-regular graphs.

Combinatorica , 11(4):331–362, 1991. 2314] A. J. Ganesh, A. M. Kermarrec, E. Le Merrer, and L. Massouli´e. Peer counting and samplingin overlay networks based on random walks.

Distributed Computing , 20(4):267–278, 2007.[15] Rachid Guerraoui, Florian Huc, and Anne-Marie Kermarrec. Highly dynamic distributedcomputing with byzantine failures. In

Proceedings of the 2013 ACM Symposium on Principlesof Distributed Computing , PODC ’13, pages 176–183, New York, NY, USA, 2013. ACM.[16] Kirsten Hildrum and John Kubiatowicz.

Asymptotically Eﬃcient Approaches to Fault-Tolerance in Peer-to-Peer Networks , pages 321–336. Springer Berlin Heidelberg, Berlin, Hei-delberg, 2003.[17] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications.

Bulletin of the American Mathematical Society , 43(4):439–561, 2006.[18] Keren Horowitz and Dahlia Malkhi. Estimating network size from local information.

Infor-mation Processing Letters , 88(5):237–243, 2003.[19] Sidharth Jaggi, Michael Langberg, Sachin Katti, Tracey Ho, Dina Katabi, Muriel M´edard,and Michelle Eﬀros. Resilient network coding in the presence of byzantine adversaries.

IEEETrans. Information Theory , 54(6):2596–2603, 2008.[20] Bruce M. Kapron, David Kempe, Valerie King, Jared Saia, and Vishal Sanwalani. Fast asyn-chronous byzantine agreement and leader election with full information.

ACM Transactionson Algorithms , 6(4):68:1–68:28, September 2010.[21] Valerie King and Jared Saia. Breaking the O ( n ) bit barrier: Scalable byzantine agreementwith an adaptive adversary. Journal of the ACM , 58(4):18:1–18:24, July 2011.[22] Valerie King and Jared Saia. Faster agreement via a spectral method for detecting maliciousbehavior. In

Proceedings of the Twenty-ﬁfth Annual ACM-SIAM Symposium on Discrete Al-gorithms , SODA ’14, pages 785–800, Philadelphia, PA, USA, 2014. Society for Industrial andApplied Mathematics.[23] Valerie King, Jared Saia, Vishal Sanwalani, and Erik Vee. Scalable leader election. In

Pro-ceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm , SODA ’06,pages 990–999, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics.[24] Valerie King, Jared Saia, Vishal Sanwalani, and Erik Vee. Towards secure and scalable com-putation in peer-to-peer networks. In

Proceedings of the 47th Annual IEEE Symposium onFoundations of Computer Science , FOCS ’06, pages 87–98, Washington, DC, USA, 2006.IEEE Computer Society.[25] Shay Kutten, Gopal Pandurangan, David Peleg, Peter Robinson, and Amitabh Trehan. Onthe complexity of universal leader election.

Journal of the ACM , 62(1):7:1–7:27, March 2015.[26] Ching Law and Kai-Yeung Siu. Distributed construction of random expander networks. In

Proceedings IEEE INFOCOM 2003, The 22nd Annual Joint Conference of the IEEE Computerand Communications Societies, San Franciso, CA, USA, March 30 - April 3, 2003 , pages2133–2143, 2003.[27] Giuseppe Antonio Di Luna, Roberto Baldoni, Silvia Bonomi, and Ioannis Chatzigiannakis.Counting in anonymous dynamic networks under worst-case adversary. In

Proceedings of the

014 IEEE 34th International Conference on Distributed Computing Systems , ICDCS ’14,pages 338–347, Washington, DC, USA, 2014. IEEE Computer Society.[28] Moni Naor and Udi Wieder.

A Simple Fault Tolerant Distributed Hash Table , pages 88–97.Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.[29] Mikhail Nesterenko and S´ebastien Tixeuil.

Discovering Network Topology in the Presence ofByzantine Faults , pages 212–226. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006.[30] Gopal Pandurangan and Amitabh Trehan. Xheal: a localized self-healing algorithm usingexpanders.

Distributed Computing , 27(1):39–54, February 2014.[31] Christian Scheideler. How to spread adversarial nodes?: Rotate! In

Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing , STOC ’05, pages 704–713, NewYork, NY, USA, 2005. ACM.[32] Tallat M. Shafaat, Ali Ghodsi, and Seif Haridi.

A Practical Approach to Network Size Esti-mation for Structured Overlays , pages 71–83. Springer Berlin Heidelberg, Berlin, Heidelberg,2008.[33] H˚akan Terelius, Damiano Varagnolo, and Karl Henrik Johansson. Distributed size estimationof dynamic anonymous networks. In

Proceedings of the 51th IEEE Conference on Decisionand Control, CDC 2012, December 10-13, 2012, Maui, HI, USA , pages 5221–5227, 2012.[34] Eli Upfal. Tolerating a linear number of faults in networks of bounded degree.

Informationand Computation , 115(2):312 – 320, 1994.[35] Ruud van de Bovenkamp, Fernando A. Kuipers, and Piet Van Mieghem. Gossip-based countingin dynamic networks. In

NETWORKING 2012 - 11th International IFIP TC 6 NetworkingConference, Prague, Czech Republic, May 21-25, 2012, Proceedings, Part II , pages 404–417,2012.[36] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ networks.

Nature , 393:440–442, June 1998.[37] Nicholas Wormald.

Models of random regular graphs , pages 239–298. London MathematicalSociety Lecture Note Series. Cambridge University Press, 1999.25 H ( n, d ) random regular graph: deﬁnitions and properties In this section, we formally deﬁne the d -regular random graph model that we are assuming andalso state and prove some crucial properties that we will use in the analysis. A.1 Deﬁnitions

We assume a random regular graph that is constructed by the union of d random permutationsas described below. Call such a random graph model, the H ( n, d ) model (or simply H-graphs ).This model was also used by Law and Siu [26] to model Peer-to-Peer networks. A random graphin this model can be constructed by picking d (assume d is even) Hamilton cycles independentlyand uniformly at random among all possible Hamilton cycles on the set of n vertices, and takingthe union of these Hamilton cycles. This construction yields a random d -regular graph (henceforthcalled as a H ( n, d ) graph) that can be shown to be an expander with high probability (cf. Lemma19). Note that a H ( n, d ) graph is d -regular multigraph whose set of edges is composed of the d Hamilton cycles. Friedman’s [13] result below (rephrased here for our purposes) shows that a H ( n, d ) graph is an expander (in fact, a Ramanujan Expander , i.e., the second smallest eigenvaluefor these random graphs is close to the best possible) with high probability.

Lemma 19 ([13, 26]) . A random n -node, d -regular H ( n, d ) -graph (say, for d ≥ ) is an expanderwith high probability. A.2 Properties

We next show some basic properties of the H ( n, d ) random graph which are needed in the analysis.We show some bounds on the sizes of B ( w, r ) and Bd ( w, r ). Lemma 20. [(1)]1. | Bd ( w, r ) | ≤ ( d − | Bd ( w, r − | .2. W.h.p. | Bd ( w, r ) | ≥ ( d − − o (1)) | Bd ( w, r − | , for < r < log n d .3. For some constant c and c ′ , c ′ ( d − r ≤ | B ( w, r ) | ≤ c ( d − r , w.h.p.4. | B ( w, r ) | = Θ( | Bd ( w, r ) | ) .Proof. Since the degree of each node is d , (1) follows. From (1) it is easy to show the upper boundon | B ( w, r ) | in (3).We next show (2).We ﬁrst bound the expected number of neighbours that a node u ∈ Bd ( w, r − B ( w, r − u in B ( w, r ) is ( d − n −| B ( w,r − | ) n ≤ ( d − − √ nn ), since | B ( w, r − | < d log n d = √ n . Hence the expected number of nodes in Bd ( w, r ) is | Bd ( w, r − | ( d − − √ nn ). The high probability bound can be obtained via a Chernoﬀ bound(one can consider the choices made by individual nodes as essentially independent if one regards the“sampling without replacement” due to the permutations. This can be done if one pretends thatthe sample is from a set of size n −√ n (instead of n ). This will not make a diﬀerence asymptotically.The lower bound of (3) follows from (2) and (4) follows from (1), (2), and (3).26ext we establish the “locally tree-like” property of an H ( n, d ) random graph: i.e., for mostnodes w , the subgraph induced by B ( w, r ) up to a certain radius r looks “like a tree”. This isstated more precisely as follows. Deﬁnition 11.

Let G be an H ( n, d ) random graph and w be any node in G . Consider the subgraphinduced by B ( w, r ) for r = log n

We call a node w “locally tree-like” if no node in B ( w, r ) is atypical. In otherwords, w is “locally tree-like” if the subgraph induced by B ( w, r ) is a ( d − -ary tree. The following lemma shows that most nodes in G are locally tree-like. Lemma 21.

In an H ( n, d ) random graph, with high probability, at least n − O ( n . ) nodes arelocally tree-like.Proof. Consider a node w ∈ V . We upper bound the probability that a node in B ( w, r ), where r = log n

10 log d , is atypical. For any 1 ≤ j < r ,Pr( u ∈ B ( w, j ) is atypical) ≤ ( d − · | B ( w,j ) | n = O ( n . ),using the bound that | B ( w, j ) | ≤ d r (the above upper bounds the probability that u has more thanone neighbor in B ( w, j ), in which case it is atypical). Hence the probability that there is somenode u that is atypical in B ( w, r ) is O ( n . n . ) = O ( n . ). Hence the probability that node w is notlocally tree-like is at most O ( n . ).Let the indicator random variable X w indicate the event that node w is locally tree-like. Letrandom variable X = P w ∈ V X w denote the number of nodes that are locally tree-like. By linearityof expectation, using the above probability bound, it follows that the expected number of nodes in G that are not locally tree-like is at most O ( n . ); in other words, E [ X ] ≥ n − O ( n . ).To show concentration of X , we use Azuma’s inequality ([10], Theorem 5 .

3) as follows. Changingthe value of X w aﬀects only the nodes within radius r ′ = 2 r = n

10 log d , i.e., at most n . nodes andhence aﬀects E [ X ] by at most n . . Thus, we havePr( | X − E [ X ] | > n . ) ≤ − n n × n ) = 2exp( − n ).Hence, with high probability, at least n − O ( n . ) nodes are locally tree-like.We now show a property that will be useful in our analysis; this follows immediately from thedeﬁnition of locally-tree like and the regularity of the graph. Corollary 1.

Let G be an H ( n, d ) random graph and consider a node w in G . Assume that w islocally tree-like, i.e., the subgraph induced by B ( w, r ) , where r = log n

10 log d is a tree. For every neighbor u of w , the respective subtrees rooted at u (in the subgraph induced by B ( w, r ) ) are isomorphic; inparticular each is a ( d − -ary tree. Proof of Lemma 9

We recall that phase i consists of α i subphases, and the subphases are indexed by j . Deﬁnition 13.

Let

F ailure ( i, j ) be the event that in the j th subphase of the i th phase, ∃ t < i suchthat k t ≥ k i . That is, F ailure ( i, j ) is the event that in the j th subphase of the i th phase, the node v receives the maximum color in some round t < i . In the same vein, we deﬁne

Deﬁnition 14.

F ailure ( i ) def = T α i j =1 F ailure ( i, j ) ,Observation . We observe that the variable

F lagT erminate in the pseudocode (please refer toAlgorithm 1) remains 1 after all the α i subphases if the event F ailure ( i ) occurs (Please see Line17 of Algorithm 1). In other words, a node v accepts i as the estimate of log n (and thus makes awrong decision) if the event F ailure ( i ) occurs. HencePr[a safe node v makes a wrong decision in the i th phase] ≤ Pr[

F ailure ( i )]. Observation . Pr[

F ailure (1)] = 0.

Induction Hypothesis.

Let i ′ be a positive integer such that 1 ≤ i ′ < i . ThenPr[a safe node v makes a wrong decision in the ( i ′ ) th phase] < ǫ i ′ +1 ,where ǫ is the error parameter. Remark . Observation 9 serves as the basis of induction.

Lemma 22.

Let E i,j, be the event that k t > l i − − log ( d − for some < t l i − − log ( d − E i,j, ]= Pr[ c max B ∗ ( v,i − > l i − − log ( d − ≤ d − d ( d − i − (by Lemma 7) Lemma 23.

Let E i,j, be the event that k i ≤ l i − log ( d − − log ( l i − log ( d − . Then Pr[ E i,j, ] < ǫ + d ( d − i − .Proof. E i,j, occurs if and only if c max Bd ( v,i ) ≤ l i − log ( d − − log ( l i − log ( d − v max ∈ Bd ( v, i ) be the node that generates (or any one of the nodes that generate) thecolor c max Bd ( v,i ) . Let E bad i,v max be the event that v max went inactive (i.e., took a wrong decision) in somephase i ′ < i . Then Pr[ E bad i,v max ]= i − X i ′ =1 Pr[ v max went inactive in phase i ′ ] < i − X i ′ =1 ǫ i ′ +1 (by the induction hypothesis) < ∞ X i ′ =1 ǫ i ′ +1 = ǫ E bad i,v max ] < ǫ v max is still active in the current phase, i.e., v max did not go inactive in some previous phase i ′ < i , then in order to calculate Pr[ E i,j, ], it is enough to consider Bd ( v, i ) in its entirety alongwith the properties of the geometric distribution. That is,Pr[ E i,j, | ( E bad i,v max ) c ]= Pr[ c max Bd ( v,i ) ≤ l i − log ( d − − log ( l i − log ( d − < d ( d − i − (by Lemma 8)That is, Pr[ E i,j, | ( E bad i,v max ) c ] < d ( d − i − (3)Combining Equations 2 and 3, we get thatPr[ E i,j, ] ≤ Pr[ E bad i,v max ] + Pr[ E i,j, | ( E bad i,v max ) c ] (thanks to Fact ?? ) < ǫ d ( d − i − . Lemma 24.

Let

Success ( i, j ) be the event that1. the maximum color received by node v until the ( i − th round of the j th subphase of the i th phase is strictly less than the maximum color received by node v in the i th round of the samesubphase of the same phase. That is, in terms of the pseudocode (please refer to Line 14 andLine 16 of the pseudocode), k t < k i , ∀ t l i − log ( d − − log ( l i − log ( d − . hen Pr[

Success ( i, j )] > − ( d ( d − i − + ǫ ) .Proof. One of the ways

Success ( i, j ) can happen is if k t ≤ l i − − log ( d − ∀ t l i − log ( d − − log ( l i − log ( d − Success ( i, j ) ⊃ ( E ci,j, ∩ E ci,j, )= ⇒ Pr[

Success ( i, j )] ≥ Pr[ E ci,j, ∩ E ci,j, ]= Pr[( E i,j, ∪ E i,j, ) c ] (by De Morgan’s Theorem)= 1 − Pr[ E i,j, ∪ E i,j, ] ≥ − Pr[ E i,j, ] − Pr[ E i,j, ](since, by the union bound, Pr[ E i,j, ∪ E i,j, ] ≤ Pr[ E i,j, ] + Pr[ E i,j, ]) > − d − d ( d − i − − d ( d − i − − ǫ − ( 1 d ( d − i − + ǫ Lemma 25.

Pr[

F ailure ( i, j )] < d ( d − i − + ǫ .Proof. We observe that

F ailure ( i, j ) = ( Success ( i, j )) c and the result immediately follows fromLemma 24. Lemma 26.

Pr[ a safe node v makes a wrong decision in the i th phase ] < ǫ i +1 .Proof. Pr[

F ailure ( i )] = α i Y j =1 Pr[

F ailure ( i, j )][since the subphases are independent from each other] < α i Y j =1 d ( d − i − [since Pr[ F ailure ( i, j )] < d ( d − i − from Lemma 25]= ( 1 d ( d − i − ) α i If we set α i def = ⌈ log ( ǫ )+ i +1 − log d ( i −

2) log ( d − ⌉ , then α i ≥ log ( ǫ ) + i + 1 − log d ( i −

2) log ( d − ⇒ ( 1 d ( d − i − ) α i ≤ ǫ i +1 = ⇒ Pr[

F ailure ( i )] < ( 1 d ( d − i − ) α i ≤ ǫ i +1 Thanks to Observation 8,Pr[a safe node v makes a wrong decision in the i th phase] ≤ Pr[

F ailure ( i )] < ǫ i +1+1