A Thin Self-Stabilizing Asynchronous Unison Algorithm with Applications to Fault Tolerant Biological Networks
AA Thin Self-Stabilizing Asynchronous Unison Algorithm withApplications to Fault Tolerant Biological Networks
Yuval EmekTechnion — Israel Institute of Technology. [email protected]
Eyal KerenTechnion — Israel Institute of Technology. [email protected]
Abstract
Introduced by Emek and Wattenhofer (PODC 2013), the stone age (SA) model providesan abstraction for network algorithms distributed over randomized finite state machines. Thismodel, designed to resemble the dynamics of biological processes in cellular networks, assumesa weak communication scheme that is built upon the nodes ability to sense their vicinity in anasynchronous manner. Recent works demonstrate that the weak computation and communica-tion capabilities of the SA model suffice for efficient solutions to some core tasks in distributedcomputing, but they do so under the (somewhat less realistic) assumption of fault free computa-tions. In this paper, we initiate the study of self-stabilizing
SA algorithms that are guaranteed torecover from any combination of transient faults. Specifically, we develop efficient self-stabilizingSA algorithms for the leader election and maximal independent set tasks in bounded diametergraphs subject to an asynchronous scheduler. These algorithms rely on a novel efficient self-stabilizing asynchronous unison (AU) algorithm, “thin” in terms of its state space: the numberof states used by the AU algorithm is linear in the graph’s diameter bound, irrespective of thenumber of nodes. a r X i v : . [ c s . D C ] F e b Introduction
A fundamental dogma in distributed computing is that a distributed algorithm cannot be deployedin a real system unless it can cope with faults. When it comes to recovering from transient faults,the agreed upon concept for fault tolerance is self-stabilization . Introduced in the seminal paper ofDijkstra [Dij74Dij74], an algorithm is self-stabilizing if it is guaranteed to converge to a correct outputfrom any (possibly faulty) initial configuration [Dol00Dol00, ADDP19ADDP19].Similarly to distributed man-made digital systems, self-stabilization is also crucial to the survivalof biological distributed systems. Indeed, these systems typically lack a central component thatcan determine the initial system configuration in a coordinated manner and more often than not,they are exposed to environmental conditions that may lead to transient faults. On the other hand,biological distributed systems are usually inferior to man-made distributed systems in terms of thecomputation and communication capabilities of their components, thus calling for a different modelof distributed network algorithms.Aiming to capture distributed processes in biological cellular networks, Emek and Wattenhofer[EW13EW13] introduced the stone age (SA) model that provides an abstraction for distributed algorithmsin a network of randomized finite state machines that communicate with their network neighborsusing a fixed message alphabet based on a weak communication scheme. Since then, the powerand limitations of distributed SA algorithms have been studied in several papers. In particular,it has been established that some of the most fundamental tasks in the field of distributed graphalgorithms can be solved, efficiently, in this restricted model [EW13EW13, AEK18aAEK18a, AEK18bAEK18b, EU20EU20].However, for the most part, the existing literature on the SA model focuses on fault free networksand little is known about self-stabilizing distributed algorithms operating under this model. In the current paper, we strive to change this situation: Focusing on graphs of bounded diame-ter , we design efficient self-stabilizing SA algorithms for leader election and maximal independentset — two of the most fundamental and extensively studied tasks in the theory of distributed com-puting. A key technical component in the algorithms we design is a self-stabilizing synchronizer for SA algorithms in graphs of bounded diameter. This synchronizer relies on a novel anonymoussize-uniform self-stabilizing algorithm for the asynchronous unison task [CFG92CFG92, AKM + +
93] thatoperates with a number of states linear in the graphs diameter bound D . To the best of our knowl-edge, this is the first self-stabilizing asynchronous unison algorithm for graphs of general topologywhose state space is expressed solely as a function of D , independently of the number n of nodes.The decision to focus on bounded diameter graphs is motivated by regarding this graph familyas a natural extension of complete graphs. Indeed, environmental obstacles may disconnect (per-manently or temporarily) some links in an otherwise fully connected network, thus increasing itsdiameter beyond 1, but hopefully not to the extent of exceeding a certain fixed upper bound. More-over, the algorithmic study of multiple access channels often focuses on fully connected networks,viewed as a single broadcast link that includes all nodes. As the SA model offers a (weak form) of In [EU20EU20], Emek and Uitto study the SA model in networks that undergo dynamic topology changes, includingnode deletion that may be seen as (permanent) crash failures.
The computational model used in this paper is a simplified version of the stone age (SA) modelof Emek and Wattenhofer [EW13EW13]. This model captures anonymous size-uniform distributed al-gorithms with bounded memory nodes that exchange information by means of an asynchronousvariant of the set-broadcast communication scheme (cf. [HJK + + sender collision detec-tion (cf. [AAB + + T defined over a set O of output values,an algorithm Π for T is encoded by the 4-tuple Π = (cid:104) Q, Q O , ω, δ (cid:105) , where • Q is a set of states; • Q O ⊆ Q is a set of output states; • ω : Q O → O is a surjective function that maps each output state to an output value; and • δ : Q × { , } Q → Q is a state transition function (to be explained soon).We would eventually require that the state space of Π, namely, the size | Q | of the state set, is fixed,and in particular independent of the graph on which Π runs, as defined in [EW13EW13]. To facilitatethe discussion though, let us relax this requirement for the time being.Consider a finite connected undirected graph G = ( V, E ). A configuration of G is a function C : V → Q that determines the state C ( v ) ∈ Q of node v for each v ∈ V . We say that a node v ∈ V senses state q ∈ Q under C if there exists some (at least one) node u ∈ N + ( v ) such that C ( u ) = q . The signal of v under C is the binary vector S C v ∈ { , } Q defined so that S C v ( q ) = 1 if and only if v senses state q ∈ Q ; in other words, the signal of node v allows v to determine for each state q ∈ Q whether q appears in its (inclusive) neighborhood, but it does not allow v to count the number ofsuch appearances, nor does it allow v to identify the neighbors residing in state q .The execution of Π progresses in discrete steps , where step t ∈ Z ≥ spans the time interval[ t, t + 1). Let C t : V → Q be the configuration of G at time t and let S tv = S C t v denote the signalof node v ∈ V under C t . We consider an asynchronous schedule defined by means of a sequenceof node activations (cf. a distributed fair daemon [DT11DT11]). Formally, a malicious adversary, whoknows Π but is oblivious to the nodes’ coin tosses, determines the initial configuration C and asubset A t ⊆ V of nodes to be activated at time t for each t ∈ Z ≥ . If node v ∈ V is not activatedat time t , then C t +1 ( v ) = C t ( v ). Otherwise ( v ∈ A t ), the state of v is updated in step t from C t ( v )to C t +1 ( v ) picked uniformly at random from δ (cid:0) C t ( v ) , S tv (cid:1) . We emphasize that all nodes v ∈ V obeythe same state transition function δ .Fix some schedule { A t } t ≥ . The adversary is required to prevent “node starvation” in the sensethat each node must be activated infinitely often. Given a time t ∈ Z ≥ , let (cid:37) ( t ) be the earliest Throughout this paper, we denote the neighborhood of a node v in G by N ( v ) = { u ∈ V | ( u, v ) ∈ E } and theinclusive neighborhood of v in G by N + ( v ) = N ( v ) ∪ { v } . v ∈ V , there exists a time t ≤ t (cid:48) < (cid:37) ( t ) suchthat v ∈ A t (cid:48) . This allows us to introduce the round operator (cid:37) i ( t ) defined by setting (cid:37) ( t ) = t and (cid:37) i ( t ) = (cid:37) (cid:0) (cid:37) i − ( t ) (cid:1) for i = 1 , , . . . Denote R ( i ) = (cid:37) i (0) for i = 0 , , . . . , and observe that if R ( i ) ≤ t < R ( i + 1), then R ( i + 1) ≤ (cid:37) ( t ) < R ( i + 2).A configuration C : V → Q is said to be an output configuration if C ( v ) ∈ Q O for every v ∈ V ,in which case, we regard ω ( C ( v )) as the output of node v under C and refer to ω ◦ C as the outputvector of C . We say that the execution of Π on G has stabilized by time t ∈ Z ≥ if (1) C t (cid:48) is anoutput configuration for every t (cid:48) ≥ t ; and (2) the output vector sequence { ω ◦ C t (cid:48) } t (cid:48) ≥ t satisfies therequirements of the distributed task T for which Π is defined (the requirements of the distributedtasks studied in the current paper are presented in Sec. 1.21.2).The algorithm is self-stabilizing if for any choice of initial configuration C and schedule { A t } t ≥ ,the probability that Π has stabilized by time R ( i ) goes to 1 as i → ∞ . We refer to the smallest i for which the execution has stabilized by time R ( i ) as the stabilization time of this execution.The stabilization time of a randomized (self-stabilizing) algorithm on a given graph is a randomvariable and one typically aims towards bounding it in expectation and whp. The schedule { A t } t ≥ is said to be synchronous if A t = V for all t ∈ Z ≥ which means that R ( i ) = i for i = 0 , , . . . A (self-stabilizing) algorithm whose correctness and stabilization timeguarantees hold under the assumption of a synchronous schedule is called a synchronous algorithm .We sometime emphasize that an algorithm does not rely on this assumption by referring to it asan asynchronous algorithm . In this paper, we focus on three classic (and extensively studied) distributed tasks, defined overa finite connected undirected graph G = ( V, E ). In the first task, called asynchronous unison(AU) [CFG92CFG92] (a.k.a. distributed pulse [AKM + + V outputs a clock value takenfrom an (additive) cyclic group K . The task is then defined by the following two conditions: The safety condition requires that if two neighboring nodes output clock values κ ∈ K and κ (cid:48) ∈ K , then κ (cid:48) ∈ { κ − , κ, κ +1 } , where the +1 and − K . The liveness conditionrequires that for every (post stabilization) time t and for every i ∈ Z > , each node updates its clockvalue at least i times during the time interval [ t, (cid:37) diam( G )+ i ( t )), where diam( G ) denotes the diameterof G ; these updates are performed by and only by applying the +1 operation of K .The other two distributed tasks considered in this paper are leader election (LE) and maximalindependent set (MIS) . Both tasks are defined over a binary set O = { , } of output values andare static in the sense that once the algorithm has stabilized, its output vector remains fixed. InLE, it is required that exactly one node in V outputs 1; in MIS, it is required that the set U ⊆ V of nodes that output 1 is independent, i.e., ( U × U ) ∩ E = ∅ , whereas any proper superset of U is not independent. We note that LE and MIS correspond to global and local mutual exclusion , In the context of a randomized algorithm running on an n -node graph, we say that event A occurs with highprobability, abbreviated whp, if P ( A ) ≥ − n − c for an arbitrarily large constant c . G is the complete graph. In what follows, we refer to the class of graphs whose diameter is up-bounded by D as D -boundeddiameter . Our first result comes in the form of developing a new self-stabilizing AU algorithm. Theorem 1.1.
The class of D -bounded diameter graphs admits a deterministic self-stabilizing AUalgorithm that operates with state space O ( D ) and stabilizes in time O ( D ) . To the best of our knowledge, the algorithm promised in Thm. 1.11.1 is the first self-stabilizingAU algorithm for general graphs G = ( V, E ) with state space linear in the diameter bound D ,irrespective of any other graph parameter including n = | V | . This remains true even when con-sidering algorithms designed to work under much stronger computational models (see Sec. 55 forfurther discussion). Moreover, to the best of our knowledge, this is also the first anonymous size-uniform self-stabilizing algorithm for the AU task whose stabilization time is expressed solely as a(polynomial) function of D , again, irrespective of n . Expressing the guarantees of AU algorithmswith respect to D is advocated given the central role that the diameter of G plays in the livenesscondition of the AU task.There is a well known reduction from the problem of network synchronization (a.k.a. synchro-nizer [Awe85Awe85]) to AU under computational models that support unicast communication (see, e.g.,[AKM + + Corollary 1.2.
Suppose that a distributed task T admits a synchronous self-stabilizing algorithmthat on D -bounded diameter n -node graphs, operates with state space g ( D ) and stabilizes in time atmost f ( n, D ) in expectation and whp. Then, T admits an asynchronous self-stabilizing algorithmthat on D -bounded diameter n -node graphs, operates with state space O ( D · ( g ( D )) ) and stabilizesin time at most f ( n, D ) + O ( D ) in expectation and whp. Next, we turn our attention to LE and MIS and develop efficient self-stabilizing asynchronousalgorithms for these tasks by combining Corollary 1.21.2 with the following two theorems.
Theorem 1.3.
There exists a synchronous self-stabilizing LE algorithm that on D -bounded diame-ter n -node graphs, operates with state space O ( D ) and stabilizes in time O ( D · log n ) in expectationand whp. Theorem 1.4.
There exists a synchronous self-stabilizing MIS algorithm that on D -bounded di-ameter n -node graphs, operates with state space O ( D ) and stabilizes in time O (( D + log n ) log n ) in expectation and whp. We emphasize that when the diameter bound D is regarded as a fixed parameter, the statespace of our algorithms reduces to a constant, as required in the SA model [EW13EW13]. In this case,the asymptotic stabilization time bounds in Thm. 1.11.1, 1.31.3, and 1.41.4 should be interpreted as O (1), O (log n ), and O (log n ), respectively. 4 .4 Paper’s Outline The remainder of this paper is organized as follows. In Sec. 22, we develop our self-stabilizingAU algorithm and establish Thm. 1.11.1. The self-stabilizing synchronous LE and MIS algorithmspromised in Thm. 1.31.3 and 1.41.4, respectively, are presented in Sec. 33. Sec. 44 is dedicated to a SAvariant of the well known reduction from self-stabilizing network synchronization to the AU task,establishing Corollary 1.21.2. We conclude with additional related literature and a discussion of theplace of our work within the scope of the existing ones; this is done in Sec. 55.
In this section, we establish Thm. 1.11.1 by introducing a deterministic self-stabilizing algorithm called
AlgAU for the AU task on D -bounded diameter graphs, whose state space and stabilization timeare bounded by O ( D ) and O ( D ), respectively. The algorithm is presented in Sec. 2.22.2 and analyzedin Sec. 2.32.3. Before diving into the technical parts, Sec. 2.12.1 provides a short overview of AlgAU ’sdesign principles, and how they compare with existing constructions.
Most existing efficient constructions of self-stabilizing AU algorithms with bounded state spacerely on some sort of a reset mechanism. This mechanism is invoked upon detecting an illegalconfiguration that usually means a “clock discrepancy”, namely, graph neighbors whose states areassociated with non-adjacent clock values of the acyclic group K . The reset mechanism is designedso that it brings the system back to a legal configuration, from which a fault free execution canproceed. It turns out though that designing a self-stabilizing AU algorithm with state space O ( D )based on a reset mechanism is more difficult than what one may have expected as demonstratedby the failed attempt presented in Appendix AA.Discouraged by this failed attempt, we followed a different approach and designed our self-stabilizing AU algorithm without a reset mechanism. Rather, we augment the | K | output stateswith (approximately) | K | “faulty states”, each one of them forms a short detour over the cyclicstructure of K ; refer to Figure 11 for the state diagram of AlgAU , where the output states and thefaulty states are marked by integers with (wide) bars and hats, respectively. Upon detecting a clockdiscrepancy, a node residing in an output state s moves to the faulty state associated with s andstays there until certain conditions are satisfied and the node may complete the faulty detour andreturn to a nearby output state (though, not to the original state s ). This mechanism is designedso that clock discrepancies are resolved in a “meet in the middle” fashion.The conditions that determine when a faulty node may return to an output state and theconditions for moving to a faulty state when sensing a faulty neighbor without being directlyinvolved in a clock discrepancy are the key to the stabilization guarantees of AlgAU . In particular,the algorithm takes a relatively cautious approach for switching between output and faulty states,5hat, as it turns out, allows us to avoid “vicious cycles” and ultimately bound the stabilizationtime as a function of | K | = O ( D ). The design of
AlgAU relies upon the following definitions.
Definition (turns, able, faulty) . Fix k = 3 D + 2. The states of AlgAU , referred to hereafter as turns , are partitioned into a set T = { (cid:96) | (cid:96) ∈ Z , ≤ | (cid:96) | ≤ k } of able turns and a set (cid:98) T = { (cid:98) (cid:96) | (cid:96) ∈ Z , ≤ | (cid:96) | ≤ k } of faulty turns. A node residing in an able (resp. faulty) turn is said to be able (resp., faulty ). Definition (levels) . Throughout Sec. 22, we refer to the integers (cid:96) ∈ Z , 1 ≤ | (cid:96) | ≤ k , as levels anddefine the level of turn (cid:96) ∈ T (resp., (cid:98) (cid:96) ∈ (cid:98) T ) to be (cid:96) . We denote the level of (the turn of) a node v ∈ V at time t ∈ Z ≥ by λ tv and the set of levels sensed by v at time t by Λ tv = { λ tu | u ∈ N + ( v ) } .For a level (cid:96) , let L t ( (cid:96) ) = { v ∈ V | λ tv = (cid:96) } be the set of nodes whose level at time t is (cid:96) . Thisnotation is extended to level subsets B , defining L t ( B ) = (cid:83) (cid:96) ∈ B L t ( (cid:96) ). Definition (forward operator, adjacent) . For a level (cid:96) , let φ ( (cid:96) ) = , (cid:96) = − − k, (cid:96) = k(cid:96) + 1 , otherwise . Based on that, we define the forward operator φ j ( (cid:96) ), j = 1 , , . . . , by setting φ ( (cid:96) ) = φ ( (cid:96) ) and φ j +1 ( (cid:96) ) = φ ( φ j ( (cid:96) )). Observing that the forward operator is bijective for each j , we extend it tonegative superscripts by setting φ − j ( (cid:96) ) = (cid:96) (cid:48) if and only if φ + j ( (cid:96) (cid:48) ) = (cid:96) . Levels (cid:96) and (cid:96) (cid:48) are said tobe adjacent if either(1) (cid:96) = (cid:96) (cid:48) ;(2) (cid:96) = φ +1 ( (cid:96) (cid:48) ); or(3) (cid:96) = φ − ( (cid:96) (cid:48) ). Definition (outwards operator, outwards, inwards) . Given a level (cid:96) and an integer parameter −| (cid:96) | < j ≤ k − | (cid:96) | , the outwards operator ψ j ( (cid:96) ) returns the unique level (cid:96) (cid:48) that satisfies (1)sign( (cid:96) (cid:48) ) = sign( (cid:96) ); and (2) | (cid:96) (cid:48) | = | (cid:96) | + j . This means in particular that if j is positive, then | (cid:96) (cid:48) | > | (cid:96) | ,and if j is negative, then | (cid:96) (cid:48) | < | (cid:96) | . If (cid:96) (cid:48) = ψ j ( (cid:96) ) for a positive (resp., negative) j , then we refer tolevel (cid:96) (cid:48) as being | j | units outwards (resp., inwards ) of (cid:96) .Let Ψ > ( (cid:96) ) = (cid:8) ψ j ( (cid:96) ) | < j ≤ k − | (cid:96) | (cid:9) and let Ψ ≥ ( (cid:96) ) = Ψ > ( (cid:96) ) ∪ { (cid:96) } and Ψ (cid:29) ( (cid:96) ) = Ψ > ( (cid:96) ) −{ ψ +1 ( (cid:96) ) } . Likewise, let Ψ < ( (cid:96) ) = (cid:8) ψ j ( (cid:96) ) | −| (cid:96) | < j < (cid:9) and let Ψ ≤ ( (cid:96) ) = Ψ < ( (cid:96) ) ∪ { (cid:96) } and Ψ (cid:28) ( (cid:96) ) =Ψ < ( (cid:96) ) − { ψ − ( (cid:96) ) } . Definition (protected, good) . An edge e = ( u, v ) ∈ E is said to be protected at time t ∈ Z ≥ iflevels λ tv and λ tu are adjacent. A node v ∈ V is said to be protected at time t if all its incident edges6re protected. Let V t p ⊆ V and E t p ⊆ E denote the set of nodes and edges, respectively, that areprotected at time t . A protected node that does not sense any faulty turn is said to be good . Thegraph G is said to be protected (resp., good ) at time t if all its nodes are protected (resp., good).We are now ready to complete the description of AlgAU . The 2 k levels are identified with the AUclock values, associating φ ( (cid:96) ) with the +1 operation of the corresponding cyclic group. Moreover,we identify the output state set of AlgAU with the set T of able turns and regard the faulty turnsas the remaining (non-output) states.For the state transition function of AlgAU , consider a node v ∈ V residing in a turn ν ∈ T ∪ (cid:98) T at time t and suppose that v is activated at time t . Node v remains in turn ν during step t unlesscertain conditions on ν are satisfied, in which case, v performs a state transition that belongs toone of the following three types (refer to Table 11 for a summary and to Figure 11 for an illustration): • Suppose that v ’s turn at time t is ν = (cid:96) ∈ T , 1 ≤ | (cid:96) | ≤ k . Node v performs a type able-able(AA) transition in step t and updates its turn to (cid:96) (cid:48) ∈ T , where (cid:96) (cid:48) = φ +1 ( (cid:96) ), if and only if (1) v is good at time t ; and (2) Λ tv ⊆ { (cid:96), (cid:96) (cid:48) } . • Suppose that v ’s turn at time t is ν = (cid:96) ∈ T , 2 ≤ | (cid:96) | ≤ k . Node v performs a type able-faulty(AF) transition in step t and updates its turn to (cid:98) (cid:96) ∈ (cid:98) T if and only if at least one of thefollowing two conditions is satisfied: (1) v is not protected at time t ; or (2) v senses turn (cid:98) (cid:96) (cid:48) at time t , where (cid:96) (cid:48) = ψ − ( (cid:96) ). • Suppose that v ’s turn at time t is ν = (cid:98) (cid:96) ∈ (cid:98) T , 2 ≤ | (cid:96) | ≤ k . Node v performs a type faulty-able(FA) transition in step t and updates its turn to (cid:96) (cid:48) ∈ T , where (cid:96) (cid:48) = ψ − ( (cid:96) ) is the level oneunit inwards of (cid:96) , if and only if v does not sense any level in Ψ > ( (cid:96) ). In this section, we establish the correctness and stabilization time guarantees of
AlgAU . First, inSec. 2.3.12.3.1, we present (and prove) certain fundamental invariants and general observations regard-ing the operation of
AlgAU . This allows us to prove in Sec. 2.3.22.3.2 that in the context of
AlgAU ,stabilization corresponds to reaching a good graph. Following that, we focus on proving that thegraph is guaranteed to become good by time O ( R ( D )). This is done in three stages, presented inSec. 2.3.32.3.3, 2.3.42.3.4, and 2.3.52.3.5. The following additional two definitions play a central role in the analysis of
AlgAU . Definition (out-protected, (cid:96) -out-protected) . We say that a node v ∈ V of level (cid:96) is out-protected at time t ∈ Z ≥ if Λ tv ∩ Ψ (cid:29) ( λ tv ) = ∅ . In other words, v is out-protected at time t if any edge( u, v ) ∈ E − E t p satisfies either (1) sign( λ tu ) (cid:54) = sign( λ tv ); or (2) λ tu ∈ Ψ (cid:28) ( λ tv ). Notice that the nodesin level (cid:96) ∈ {− k, − k + 1 , k − , k } are always (vacuously) out-protected. Let V t op ⊆ V denote theset of nodes that are out-protected at time t ∈ Z ≥ .7he graph G is said to be out-protected at time t ∈ Z ≥ if V = V t op . Given a level (cid:96) , the graphis said to be (cid:96) -out-protected at time t if L t (cid:0) Ψ ≥ ( (cid:96) ) (cid:1) ⊆ V t op . Notice that the graph is out-protected ifand only if it is both 1-out-protected and − u, v ) / ∈ E t p ,then sign( λ tu ) (cid:54) = sign( λ tv ). Definition (distance) . The distance between levels (cid:96) and (cid:96) (cid:48) , denoted by dist( (cid:96), (cid:96) (cid:48) ), is defined bythe recurrence dist( (cid:96), (cid:96) (cid:48) ) = , (cid:96) = (cid:96) (cid:48) { dist( (cid:96), φ − ( (cid:96) (cid:48) )) , dist( (cid:96), φ +1 ( (cid:96) (cid:48) )) } , (cid:96) (cid:54) = (cid:96) (cid:48) ;notice that this is indeed a distance function in the sense that it is symmetric and obeys the triangleinequality. We are now ready to state the fundamental properties of
AlgAU , cast in Obs. 2.12.1–2.92.9.
Observation 2.1.
If an edge e = ( u, v ) ∈ E t p and { λ tu , λ tv } (cid:54) = {− k, k } , then e ∈ E t +1p .Proof. Consider first the case that λ tu = λ tv = (cid:96) . If (cid:96) <
0, then { λ t +1 u , λ t +1 v } ⊆ { (cid:96), φ +1 ( (cid:96) ) } , thus e remains protected at time t + 1. If (cid:96) >
0, then it may be the case that the level of one of the twonodes, say u , decreases in step t due to a type FA transition so that λ t +1 u = φ − ( (cid:96) ). But this meansthat v is not good at time t (it has at least one faulty neighbor), hence it cannot experience a typeAA transition, implying that λ t +1 v ∈ { (cid:96), φ − ( (cid:96) ) } . Therefore, e remains protected at time t + 1 alsoin this case.Assume now that λ tu = (cid:96) and λ tv = φ +1 ( (cid:96) ) for a level (cid:96) (cid:54) = k . Notice that v cannot experience atype AA transition in step t as (cid:96) = φ − ( λ tv ) ∈ Λ tv . On the other hand, u can experience a type FAtransition only if (cid:96) < λ t +1 u = φ +1 ( (cid:96) ). Therefore, { λ t +1 u , λ t +1 v } ⊆ { (cid:96), φ +1 ( (cid:96) ) } and e remains protected at time t + 1. Observation 2.2.
If a node v ∈ V t p and λ tv / ∈ {− k, k } , then v ∈ V t +1p .Proof. Follows directly from Obs. 2.12.1.
Observation 2.3.
If a node v ∈ V t op , then v ∈ V t +1op .Proof. Follows from Obs. 2.12.1 by recalling that L t ( (cid:96) ) ⊆ V t op for every (cid:96) ∈ {− k, − k + 1 , k − , k } . Observation 2.4.
For a node v ∈ V , if λ t +1 v (cid:54) = λ tv , then v ∈ V t +1op .Proof. Follows from Obs. 2.32.3 as node v cannot change its level in step t unless it is out-protectedat time t . Observation 2.5.
If an edge ( u, v ) ∈ E − E t p with λ tu < λ tv , then λ tu ≤ λ t +1 u < λ t +1 v ≤ λ tv . To distinguish the level distance function from the distance function of the graph G , we denote the latter bydist G ( · , · ). roof. Follows by recalling that a node that is not protected at time t cannot experience a typeAA transition in step t and that it can experience a type FA transition in step t only if it does notsense any level (strictly) outwards of its own. Observation 2.6. If G is (cid:96) -out-protected at time t , then G remains (cid:96) -out-protected at time t + 1 .Proof. Follows from Obs. 2.32.3 and 2.42.4.
Observation 2.7.
Consider a path P of length d between nodes u ∈ V and v ∈ V in G . If E ( P ) ⊆ E t p , then dist( λ tu , λ tv ) ≤ d .Proof. By induction on d . The assertion clearly holds if d = 0 which implies that u = v . Consider a( u, v )path P of length d > v (cid:48) be the node that precedes v in P . By applying the inductivehypothesis to the ( u, v (cid:48) )-prefix of P , we conclude that dist( λ tu , λ tv (cid:48) ) ≤ d −
1. As ( v (cid:48) , v ) ∈ E t p , weconclude that dist( λ tu , λ tv ) ≤ dist( λ tu , λ tv (cid:48) ) + 1 ≤ d , thus establishing the assertion. Observation 2.8.
If V t p = V , then there exists a level (cid:96) and an integer ≤ d ≤ D such that V ⊆ L t (cid:0)(cid:8) φ + j ( (cid:96) ) | ≤ j ≤ d (cid:9)(cid:1) .Proof. Follows by applying Obs. 2.72.7 to the shortest paths in the graph G whose lengths are at most D . Observation 2.9.
Consider a path P of length d emerging from a node v ∈ V and assume that E ( P ) ⊆ E t p (resp., V ( P ) ⊆ V t p ). Fix some time t (cid:48) ≥ t and assume that | λ sv | < k − d for every t ≤ s ≤ t (cid:48) . Then, E ( P ) ⊆ E t (cid:48) p (resp., V ( P ) ⊆ V t (cid:48) p ).Proof. Fix a time s . Obs. 2.72.7 ensures that if E ( P ) ⊆ E s p (resp., V ( P ) ⊆ V s p ) and | λ sv | < k − d ,then | λ su | < k for every node u in P . This implies that E ( P ) ⊆ E s +1p (resp., V ( P ) ⊆ V s +1p ) due toObs. 2.12.1 (resp., Obs. 2.22.2). The assertion is now established by induction on s = t, t +1 , . . . , t (cid:48) − In this section, we show that the stabilization of
AlgAU is reduced to reaching a good graph.
Lemma 2.10. If G is good at time t , then G remains good at time t + 1 .Proof. If all nodes are good at time t , then the only possible state transitions in step t are of typeAA. Observing that an edge ( u, v ) with λ tu = k and λ tv = − k does not become non-protected viatype AA transitions, we conclude by Obs. 2.12.1 that E t +1p = E and hence, V t +1p = V . Since a typeAA transition does not change the turn of a node from able to faulty, it follows that all nodesremain able at time t + 1, hence all nodes are good at time t + 1. Lemma 2.11.
Assume that G is good at time t . For i = 0 , , . . . , each node v ∈ V experiences atleast i type AA transitions during the time interval (cid:2) t, (cid:37) D + i ( t ) (cid:1) . roof. Lem. 2.102.10 ensures that all nodes remain good, and in particular protected, from time t onwards. For i = 0 , , . . . , let τ ( i ) = (cid:37) i ( t ) and let (cid:96) min ( i ) and d ( i ) be the level (cid:96) and integer d promised in Obs. 2.82.8 when applied to time τ ( i ). Since all nodes are good throughout the timeinterval I = [ τ ( i ) , τ ( i + 1)), it follows that every node v ∈ L τ ( i ) ( (cid:96) min ( i )) experiences at leastone type AA transition during I (in particular, v experiences a type AA transition upon its firstactivation during I ), hence (cid:96) min ( i + 1) > (cid:96) min ( i ). The assertion follows by Obs. 2.82.8 ensuring that d (0) ≤ D . Assume that G is (cid:96) -out-protected, ≤ | (cid:96) | ≤ k , at time t . If the turn of a node v ∈ V at time t is (cid:98) (cid:96) , then v experiences a type FA transition before time (cid:37) k −| (cid:96) | )+1 ( t ) .Proof. Obs. 2.32.3 ensures that v ∈ V t (cid:48) op for every t (cid:48) ≥ t . For i = 0 , , . . . , let τ ( i ) = (cid:37) i ( t ). We proveby induction on k − | (cid:96) | that v experiences a type FA transition before time τ (2( k − | (cid:96) | ) + 1), thusestablishing the assertion. For the induction’s base, notice that if the turn of node v at time t is (cid:98) k (resp., (cid:99) − k ), then v is guaranteed to experience a type FA transition, moving to state k − − k + 1), upon its next activation and in particular before time (cid:37) ( t ) = τ (1).Assume that 2 ≤ | (cid:96) | ≤ k −
1. If v is in turn (cid:98) (cid:96) when a neighbor u of v in turn ψ +1 ( (cid:96) ) isactivated, then u experiences a type AF transition, moving to state (cid:92) ψ +1 ( (cid:96) ). Moreover, as longas v is faulty, no neighbor of v can move from level (cid:96) to level ψ +1 ( (cid:96) ). Since v has no neighborsin levels belonging to Ψ (cid:29) ( (cid:96) ) (recall that v is out-protected), it follows that as long as v does notexperience a type FA transition, no neighbor of v can move to level ψ +1 ( (cid:96) ) from another level andthus, no neighbor of v can move to turn ψ +1 ( (cid:96) ) from another turn. Therefore, it is guaranteedthat at time τ (1), all neighbors u of v whose level satisfies λ τ (1) u = ψ +1 ( (cid:96) ) are faulty. By theinductive hypothesis, these nodes u experience a type FA transition, moving to turn (cid:96) , before time τ (1 + 2( k − | (cid:96) | −
1) + 1) = τ (2( k − | (cid:96) | )). In the subsequent activation of v , which occurs beforetime (cid:37) ( τ (2( k − | (cid:96) | ))) = τ (2( k − | (cid:96) | ) + 1), v experiences a type FA transition, thus establishing theassertion. Lemma 2.13.
Consider an edge ( u, v ) ∈ E − E t p with λ tu < λ tv . If G is (cid:96) -out-protected at time t for (cid:96) ∈ { λ tu , λ tv } , then there exists a time t < t ∗ ≤ (cid:37) k −| (cid:96) | )+2 ( t ) such that(1) λ t ∗ u ≥ λ tu ;(2) λ t ∗ v ≤ λ tv ; and(3) at least one of the inequalities in (1) and (2) is strict.Proof. By Obs. 2.52.5, it is sufficient to prove that at least one of the two nodes u and v changes itslevel before time (cid:37) k −| (cid:96) | )+2 ( t ). Assume that the graph is (cid:96) -out-protected at time t for (cid:96) = λ tv ; theproof for the case that (cid:96) = λ tu is analogous. Let t ≤ t < (cid:37) ( t ) be the first time following t at which v is activated and based on that, define the time t ≤ t ≤ (cid:37) ( t ) as follows: if v is in turn (cid:98) (cid:96) at time t , then set t = t ; otherwise ( v is in turn (cid:96) at time t ), set t = t + 1 and notice that v experiencesa type AF transition in step t (due to the non-protected edge ( v, v (cid:48) )) unless v (cid:48) changes its level10eforehand. In both cases, we know that v is in turn (cid:98) (cid:96) at time t . Since Obs. 2.62.6 guarantees thatthe graph is (cid:96) -out-protected at time t , we can apply Lem. 2.122.12 to v , concluding that v experiencesa type FA transition, and in particular changes its level, before time (cid:37) k −| (cid:96) | )+1 ( t ) ≤ (cid:37) k −| (cid:96) | )+2 ( t ),thus establishing the assertion. Lemma 2.14.
Fix a level ≤ | (cid:96) | ≤ k − and assume that G is ψ +1 ( (cid:96) ) -out-protected at time t .Then, G is (cid:96) -out-protected at time (cid:37) ( k −| (cid:96) | )( k −| (cid:96) |− ( t ) .Proof. For i = 0 , , . . . , let τ ( i ) = (cid:37) i ( t ) and fix t ∗ = τ (( k − | (cid:96) | )( k − | (cid:96) | − (cid:84) t ≤ t (cid:48) ≤ t ∗ L t (cid:48) ( (cid:96) ) ⊆ V t ∗ op . To this end, consider a node v ∈ (cid:84) t ≤ t (cid:48) ≤ t ∗ L t (cid:48) ( (cid:96) ) andnotice that by Obs. 2.32.3, if v is out-protected at any time t ≤ t (cid:48) ≤ t ∗ , then it remains out-protectedsubsequently and in particular at time t ∗ . Moreover, Obs. 2.12.1 ensures that any neighbor of v whoselevel at time t belongs to Ψ ≤ ( (cid:96) ) ∪ { ψ +1 ( (cid:96) ) } cannot move to a level in Ψ (cid:29) ( (cid:96) ) as long as v is in level (cid:96) . So, it remains to consider a neighbor u ∈ N ( v ) of v with λ tu ∈ Ψ (cid:29) ( (cid:96) ) and show that the levelof u moves inwards and becomes adjacent to (cid:96) by time t ∗ ; indeed, Obs. 2.12.1 ensures that once u reaches a level adjacent to (cid:96) , it cannot move back to a level in Ψ (cid:29) ( (cid:96) ) unless v leaves level (cid:96) . Tothis end, we define f ( (cid:96) ∗ ) = (cid:80) | (cid:96) ∗ | j = | (cid:96) | +2 (2( k − j ) + 2)and prove that if λ tu = (cid:96) ∗ ∈ Ψ (cid:29) ( (cid:96) ), then u reaches level ψ +1 ( (cid:96) ) by time τ ( f ( (cid:96) ∗ )). The assertion isestablished by observing that f ( k ) = ( k − | (cid:96) | )( k − | (cid:96) | − G is ψ +1 ( (cid:96) )-out-protected at time t , Obs. 2.62.6 guarantees that G is ψ +1 ( (cid:96) )-out-protected at all times subsequent to t and hence, also (cid:96) (cid:48) -out-protected for every (cid:96) (cid:48) ∈ Ψ > ( (cid:96) ).Therefore, we can repeatedly apply Lem. 2.132.13 to edge ( u, v ) and conclude by induction on (cid:96) (cid:48) that u moves from level (cid:96) (cid:48) ∈ Ψ (cid:29) ( (cid:96) ), | (cid:96) (cid:48) | ≤ (cid:96) ∗ , to level ψ − ( (cid:96) (cid:48) ) by time τ (cid:16)(cid:80) | (cid:96) ∗ | j = | (cid:96) (cid:48) | (2( k − j ) + 2) (cid:17) . The proof is then completed by plugging (cid:96) (cid:48) = ψ +2 ( (cid:96) ).Since the graph G is (cid:96) -out-protected for (cid:96) ∈ {− k, − k + 1 , k − , k } already at time 0 and sincebeing 1-out-protected and ( − G is out-protected, Lem. 2.142.14 yields thefollowing corollary. Corollary 2.15.
There exists a time T ≤ R ( O ( k )) such that G is out-protected at all times t ≥ T . In what follows, we take T to be the time promised in Corollary 2.152.15 and consider the executionfrom time T onwards. 11 efinition (justifiably faulty, unjustifiably faulty, justified) . A node v ∈ V whose turn at time t is (cid:98) (cid:96) , 2 ≤ | (cid:96) | ≤ k , is said to be justifiably faulty if either (1) v / ∈ V t p ; or (2) v admits a neighbor whoseturn at time t is (cid:92) ψ − ( (cid:96) ). A faulty node that is not justifiably faulty is said to be unjustifiably faulty .We say that the graph G is justified if it does not admit any unjustifiably faulty node.A key feature of AlgAU is that nodes do not become unjustifiably faulty once the graph isout-protected.
Lemma 2.16.
If a node v ∈ V is not unjustifiably faulty at time t ≥ T , then v is not unjustifiablyfaulty at time t + 1 .Proof. Assume that node v is either (1) able at time t and experiences a type AF transition in step t ; or (2) justifiably faulty at time t (and remains faulty at time t + 1). In both cases, we know that v admits a neighboring node u ∈ N ( v ) that satisfies at least one of the following two conditions:(i) λ tu is not adjacent to λ tv ; or (ii) λ tu = ψ − ( λ tv ) and u is faulty at time t .Assuming that condition (i) holds, we know that sign( λ tu ) (cid:54) = sign( λ tv ) as G is out-protectedat time t . Since v is faulty at time t + 1, it follows that λ t +1 v = λ tv with | λ t +1 v | ≥
2. Thus,sign( λ t +1 u ) = sign( λ tu ) and edge ( u, v ) remains non-protected at time t + 1. Assuming that condition(ii) holds, node u cannot experience a type FA transition in step t as λ tv = ψ +1 ( λ tu ) ∈ Λ tu , thus itremains faulty at time t + 1. Therefore, we conclude that v is justifiably faulty at time t + 1.Corollary 2.172.17 is now derived by combining Corollary 2.152.15 and Lem. 2.162.16, recalling thatLem. 2.122.12 ensures that if the graph is out-protected at time T , then any (justifiably or) un-justifiably faulty node experiences a type FA transition, and in particular stops being unjustifiablyfaulty, before time (cid:37) O ( k ) ( T ) ≤ R ( O ( k )). Corollary 2.17.
There exists a time T ≤ T ≤ R ( O ( k )) such that G is justified at all times t ≥ T . In what follows, we take T to be the time promised in Corollary 2.172.17 and consider the executionfrom time T onwards. To complete the analysis, we shall need the following additional definition. Definition (grounded) . A path P of length at most D in G is said to be grounded at time t if(1) V ( P ) ⊆ V t p ; and (2) P has an endpoint u satisfying λ tu ∈ {− , } . A node v ∈ V is said to be grounded at time t if it belongs to a grounded path. Lemma 2.18. If G is protected at time t ≥ T , then G is good at time t .Proof. Assume by contradiction that the graph admits faulty nodes at time t and among thesenodes, let v ∈ V be a node that minimizes | λ tv | . Since t ≥ T , Corollary 2.172.17 ensures that v isjustifiably faulty at time t . The assumption that G is protected implies that v admits a neighbor u ∈ N ( v ) whose turn at time t is (cid:92) ψ − ( λ tv ), in contradiction to the choice of v .12wing to Lem. 2.182.18, our goal in the remainder of this section is to prove that it does not taketoo long after time T that all nodes become protected. The next three lemmas are pivotal in thejourney towards achieving this goal. Lemma 2.19.
If a node v ∈ V − V t p for some time t ≥ T , then there exists a time t ≤ t (cid:48) ≤ (cid:37) k ( k − ( t ) such that v ∈ V t (cid:48) p with λ t (cid:48) v ∈ {− , } .Proof. Since the graph is out-protected at all times after T ≥ T , it follows that if edge ( v, v (cid:48) ) ∈ E − E t p , then (1) sign( λ tv ) (cid:54) = sign( λ tv (cid:48) ); and (2) dist( λ tv , λ tv (cid:48) ) ≥
2. Obs. 2.52.5 and Lem. 2.132.13 guaranteethat the levels of v and v (cid:48) move inwards until they meet with { λ t (cid:48) v , λ t (cid:48) v (cid:48) } = {− , } at some time t ≤ t (cid:48) ≤ (cid:37) z ( t ) for z = (cid:80) kj =2 k − j ) + 2 = k ( k − v, v (cid:48) ) ∈ E − E t p . Lemma 2.20.
Consider a node v ∈ V and assume that there exist times T ≤ t < t (cid:48) such that (i) λ tv = 1 ; and (ii) λ t (cid:48) v = 2 D + 2 . Then G is protected at time t (cid:48) .Proof. By Lem. 2.102.10 and 2.182.18, if all nodes are protected at some time after time T , then all nodesremain (good and hence) protected indefinitely. Therefore, we establish the assertion by provingthe following claim and plugging d = D : Assume that there exist levels 1 ≤ (cid:96) < (cid:96) (cid:48) ≤ D + 2 with (cid:96) (cid:48) − (cid:96) = 2 d + 1 such that(I) v moves in step t from level λ tv = (cid:96) to level λ t +1 v = (cid:96) + 1;(II) v moves in step t (cid:48) − λ t (cid:48) − v = (cid:96) (cid:48) − λ t (cid:48) v = (cid:96) (cid:48) ; and(III) (cid:96) < λ sv < (cid:96) (cid:48) for all t < s < t (cid:48) .Then all nodes at distance at most d from v are protected at time t (cid:48) .Node v can move from level (cid:96) to level (cid:96) + 1 in step t only if it experiences a type AA transition,which requires v to be protected at time t . By Obs. 2.22.2, v remains protected throughout the timeinterval [ t, t (cid:48) ].We prove that all other nodes in B ( v, d ) = { u ∈ V | dist G ( u, v ) ≤ d } are protected at time t (cid:48) byinduction on d . The assertion holds trivially for d = 0 as B ( v,
0) = { v } . Assume that the assertionholds for d − ≥ u ∈ B ( v, d ). Let P be a shortest ( v, u )-path in G and let w be the node succeeding v along P .Since v experiences 2 d + 1 type AA transitions while moving from level (cid:96) to level (cid:96) (cid:48) during thetime interval [ t, t (cid:48) ], there must exist times t < t w ≤ t (cid:48) w < t (cid:48) such that(I) w moves in step t w from level λ t w w = (cid:96) + 1 to level λ t w +1 w = (cid:96) + 2;(II) w moves in step t (cid:48) w − λ t (cid:48) w − w = (cid:96) (cid:48) − λ t (cid:48) w w = (cid:96) (cid:48) −
1; and(III) (cid:96) + 1 < λ sw < (cid:96) (cid:48) − t w < s < t (cid:48) w .By the inductive hypothesis, all nodes in B ( u, d − w, u )-suffix of P , are protected at time t (cid:48) w , hence all nodes in P are protected at time t < t (cid:48) w < t (cid:48) .Recalling that 1 ≤ (cid:96) < λ sv ≤ (cid:96) (cid:48) ≤ D + 2 for all t (cid:48) w ≤ s ≤ t (cid:48) , we employ Obs. 2.92.9 to conclude thatall nodes in P are protected at time t (cid:48) , thus establishing the assertion. Lemma 2.21.
If a node v ∈ V is grounded at time t ≥ T , then v ∈ V t (cid:48) p for all t (cid:48) ≥ t . roof. The fact that node v is grounded at time t means in particular that v ∈ V t p so assume bycontradiction that v / ∈ V t (cid:48) p for a time t (cid:48) > t . Consider the path P of length at most D due towhich v is grounded at time t and let u be the endpoint of P that satisfies λ tu ∈ {− , } . Since V ( P ) ⊆ V t p , we can apply Obs. 2.92.9 to P and u , concluding that there exists a time t < s ≤ t (cid:48) suchthat | λ su | ≥ k − D . Since u moves from level λ tu ∈ {− , } to level λ su satisfying | λ su | ≥ k − D = 2 D +2during the time interval [ t, s ), it follows that there exist times t ≤ r < r (cid:48) ≤ s such that u movesfrom level λ ru = 1 up to level λ r (cid:48) u = 2 D + 2 during the time interval [ r, r (cid:48) ). Employing Lem. 2.202.20, weconclude that G is protected from time r (cid:48) onwards, which contradicts the assumption that v / ∈ V t (cid:48) p as t (cid:48) ≥ s ≥ r (cid:48) .We are now ready to prove the following lemma that, when combined with Lem. 2.102.10, 2.112.11,and 2.182.18, establishes Thm. 1.11.1 as k = O ( D ). Lemma 2.22.
There exists a time T ≤ T ≤ R ( O ( k )) such that G is protected at time T .Proof. Fix a node v ∈ V . In the context of this proof, we say that v is post-grounded at time t if v was grounded at some time T ≤ t (cid:48) ≤ t . By Lem. 2.212.21, it suffices to prove that v becomespost-grounded by time R ( O ( k )). In fact, since graph G is out-protected after time T ≥ T andsince in an out-protected graph, a non-protected node becomes protected if and only if it becomesgrounded, it follows that G becomes protected exactly when all its nodes become post-grounded.For t ≥ T , let G t p = ( V, E t p ). Assuming that G is still not protected at time t (i.e., that V t p (cid:40) V ), let x t be a node x ∈ V − V t p that minimizes dist G t p ( v, x ), and among those, a nodethat minimizes | λ tx | (breaking the remaining ties in an arbitrary consistent manner). Notice thatalthough we cannot bound the diameter of G t p , the choice of x t implies that d t = dist G t p ( v, x t ) ≤ D .Let P t be a ( v, x t )-path in G t p that realizes d t .The choice of x t and P t ensures that x t / ∈ V t p and that V ( P t ) − { x t } ⊆ V t p . If a node u ∈ V ( P t ) − { x t } becomes non-protected in step t , then d t +1 ≤ dist G t p ( v, u ) < dist G t p ( v, x t ) = d t .Moreover, if x t remains non-protected at time t + 1, then d t +1 ≤ d t . The more interesting caseoccurs when V ( P t ) − { x t } ⊆ V t +1p and x t also becomes protected in step t which means that allnodes in P t are protected at time t +1. Recalling that the graph is out-protected after time T ≥ T ,we know that λ t +1 x t ∈ {− , } , hence P t is grounded at time t + 1 and v is post-grounded from time t + 1 onwards.To complete the proof, let τ ( i ) = (cid:37) i ( T ) for i = 0 , , . . . and notice that Lem. 2.192.19 guaranteesthat if G is still not protected at time τ ( i ), then x τ ( i ) becomes protected before time τ ( i + O ( k )).Therefore, if node v is still not post-grounded at time τ ( i ), then either (1) v is post-grounded attime τ ( i + O ( k )); or (2) d τ ( i + O ( k )) < d τ ( i ) . As 0 ≤ d t ≤ D for all t ≥ T , we conclude that node v must become post-grounded by time τ ( O ( D · k )) = τ ( O ( k )).14 Algorithms for LE and MIS
In this section, we present the synchronous algorithms promised in Thm. 1.31.3 and 1.41.4. Specifically,our MIS algorithm, denoted by
AlgMIS , is developed in Sec. 3.13.1, and our LE algorithm, denotedby
AlgLE , is developed in Sec. 3.23.2.A common key ingredient in the design of
AlgMIS and
AlgLE is a (synchronous) module denotedby
Restart . This module is invoked upon detecting an illegal configuration and, as its name implies,resets all other modules, allowing the algorithm a “fresh start” from a uniform initial configuration,that is, a configuration in which all nodes share the same initial state q ∗ , chosen by the algorithmdesigner. Module Restart consists of O ( D ) states, among them are two designated states denotedby Restart - entry and Restart - exit : a node enters Restart by moving from a non-
Restart stateto
Restart - entry ; a node exits Restart by moving from
Restart - exit to the initial state q ∗ . Themain guarantee of Restart is cast in the following theorem.
Theorem 3.1.
If some node is in a
Restart state at time t , then there exists a time t ≤ t ≤ t + O ( D ) such that all nodes exit Restart , concurrently, in step t . A module that satisfies the promise of Thm. 3.13.1 is developed by Boulinier et al. [BPV05BPV05]. Dueto some differences between the computational model used in the current paper and the one usedin [BPV05BPV05], we provide a standalone implementation (and analysis) of module
Restart in Sec. 3.33.3,relying on algorithmic principles similar to those used by Boulinier et al.
AlgMIS
For clarity of the exposition, the MIS algorithm
AlgMIS is presented in a procedural style; con-verting it to a randomized state machine with O ( D ) states is straightforward. The algorithm isdesigned assuming that the execution starts concurrently at all nodes; this assumption is plausibledue to Thm. 3.13.1 and given the algorithm’s fault detection guarantees (described in the sequel).Throughout, we say that a node v ∈ V is decided if v resides in an output state; otherwise, wesay that v is undecided . An edge is said to be decided , if it least one of its endpoints is decided,and undecided if both its endpoints are undecided. Recall that in the context of the MIS problem,the output value of a decided node v is 1 (resp., 0) if v is included in (resp., excluded from) theconstructed MIS; we subsequently denote by IN (resp., OUT ) the set of decided nodes with output1 (resp, 0).The algorithm consists of three modules, denoted by
RandPhase , DetectMIS , and
Compete ; allnodes participate in
RandPhase , whereas
DetectMIS involves only the decided nodes and
Compete involves only the undecided nodes. Module
RandPhase runs indefinitely and divides the executioninto phases so that for each phase π , (1) all nodes start (and finish) π concurrently; and (2) thelength (in rounds) of π is D + O (log n ) in expectation and whp.The role of DetectMIS is to detect local faults among the decided nodes, namely, two neighboring IN nodes or an OUT node with no neighboring IN node. The module runs indefinitely (over the15ecided nodes) and is designed so that a local fault is detected in each round (independently) witha positive constant probability, which means that no local fault remains undetected for more than O (log n ) rounds whp. Upon detecting a local fault, DetectMIS invokes module
Restart and theexecution of
AlgMIS starts from scratch once
Restart is exited.Module
Compete is invoked from scratch in each phase, governing the competition of the un-decided nodes over the “privilege” to be included in the constructed MIS. Taking U ⊆ V to bethe set of undecided nodes at the beginning of a phase π and taking G ( U ) to denote the subgraphinduced on G by U , module Compete assigns (implicitly) a random variable Z ( u ) ∈ Z ≥ to eachnode u ∈ U so that the following three properties are satisfied:(1) P (cid:0)(cid:86) w ∈ W [ Z ( u ) > Z ( w )] (cid:1) ≥ Ω (cid:16) | W | +1 (cid:17) for every node subset W ⊆ U − { u } ;(2) if Z ( u ) > Z ( w ) for all nodes w ∈ N G ( U ) ( u ), then u joins IN ; and(3) u joins OUT during π if and only if node v joins IN for some v ∈ N G ( U ) ( u ) whp.It is well known (see, e.g., [ABI86ABI86, MRSZ11MRSZ11, EW13EW13]) that properties (1)–(3) ensure that apositive constant fraction of the undecided edges become decided during π with a positive constantprobability. Using standard probabilistic arguments, we deduce that all edges become decidedwithin O (log n ) phases in expectation and whp, thus, by applying properties (1) and (2) to thenodes of degree deg G ( U ) ( v ) = 0, all nodes become decided within O (log n ) phases in expectationand whp. Thm. 1.41.4 follows, again, by standard probabilistic arguments. We now turn to presentthe implementation of the three modules. RandPhase . As discussed earlier, module
RandPhase divides the execution into phases. Each phase consists ofa (random) prefix of length X and a (deterministic) suffix of length D + 2, where X is a randomvariable that satisfies (1) X ≤ O (log n ) in expectation and whp; and (2) X ≥ c log n whp for aconstant c > π concurrently, then all nodes finish π (and start the next phase) concurrently after D + 2 + X rounds (this guarantee holds with probability 1).To implement RandPhase , each node v ∈ V maintains two variables, denoted by v. flag ∈ { , } and v. step ∈ { , , . . . , D + 2 } ; the former variable controls the length of the phase’s randomprefix, whereas the latter is used to ensure that all nodes finish the phase concurrently, exactly D + 2 rounds after the random prefix is over (for all nodes).To this end, when a phase begins, v sets v. step ← v. flag ←
1. As long as v. flag = 1,node v tosses a (biased) coin and resets v. flag ← < p <
1, where p = p ( c )is a constant determined by c . Once v. flag = 0, the actions of v become deterministic: Let v. step min = min { u. step : u ∈ N + ( v ) } . If v. step min < D + 2, then v sets v. step ← step min + 1;otherwise ( v. step min = D + 2), the phase ends and a new phase begins. On top of these rules, if,at any stage of the execution, v senses a node u ∈ N + ( v ) for which | u. step − v. step | >
1, then v invokes module Restart .To analyze
RandPhase , consider a phase π that starts concurrently for all nodes and let X v be16he number of rounds in which node v ∈ V kept v. flag = 1 since π began, observing that X v is aGeom( p ) random variable. Since the random variables X v , v ∈ V , are independent, we can applyObs. 3.23.2, established by standard probabilistic arguments, to conclude that the random variable X = max v ∈ V X v satisfies (1) X ≤ O (log n ) in expectation and whp; and (2) X ≥ c log n whp,where the relation between c and p is derived from Obs. 3.23.2. Observation 3.2.
Fix some constant < p ≤ / and let Y , . . . , Y n be n independent and iden-tically distributed Geom( p ) random variables. Then, the random variable Y = max i ∈ [ n ] Y i satisfies(1) Y ≤ O (log n ) in expectation and whp; and (2) Y ≥ c log n whp for any constant c < ln(2) / (2 p ) . To complete the analysis of
RandPhase , we introduce the following notation and terminology.Given a node v ∈ V , let v. step t and v. step t min denote the values of v. step and v. step min , respec-tively, at time t . An edge e = { u, v } ∈ E is said to be valid at time t , if | u. step t − v. step t | ≤
1. Let v max be a node v ∈ V that realizes X v = X . We can now establish the following two observations. Observation 3.3.
If all edges are valid at time t , then | u. step t − v. step t | ≤ dist G ( u, v ) for everytwo nodes u, v ∈ V .Proof. Follows by a trivial induction on dist G ( u, v ). Observation 3.4.
As long as v max . step = 0 , all edges are valid and v. step ≤ D for all nodes v ∈ V .Proof. The assertion clearly holds when the phase begins and v. step = 0 for all nodes v ∈ V .Obs. 3.33.3 ensures that if all edges are valid at time t and v max . step t = 0, then max v ∈ V v. step t ≤ D .The assertion follows as RandPhase can invalidate a valid edge { u, v } only if u. step = v. step = D + 2 > D .Based on Obs. 3.33.3 and 3.43.4, we can prove the following key lemma. Lemma 3.5.
Suppose that node v max resets v max . flag ← in round t . Then, the following threeconditions are satisfied for every ≤ d ≤ D :(1) all edges are valid at time t + d ;(2) v. step t + d ≥ d for every node v ∈ V ; and(3) v. step t + d ≤ max { d, dist G ( v max , v ) } for every node v ∈ V .Proof. By induction on d = 0 , , . . . , D . The base case holds by Obs. 3.43.4 as v max . step t = 0, soassume that the assertion holds for d − t + d . The inductivehypothesis ensures that all edges are valid at time t − d − v ∈ V v. step t − d − ≤ D ,hence all edges remain valid at time t + d , establishing condition (1).To show that condition (2) holds, consider some node v ∈ V . The inductive hypothesis ensuresthat d − ≤ v. step t + d − ≤ D , hence v. step t + d = v. step t + d − + 1 ≥ d , establishing condition (2).For condition (3), consider some node v ∈ V and assume first that dist G ( v max , v ) ≤ d − v. step t + d − = d −
1, hence v. step t + d − = d − v. step is incremented in round t + d − v. step t + d − = d − v. step t + d = d =max { d, dist G ( v max , v ) } . Now, consider the case that dist G ( v max , v ) = d and let u be the node thatprecedes v along a shortest ( v max , v )-path in G . Since dist G ( v max , u ) = d −
1, we know that u. step is incremented in round t + d − u. step t + d − = d − u. step t + d = d . This implies that v. step t + d − ≤ d −
1, thus v. step t + d ≤ d = max { d, dist G ( v max , v ) } .We can now prove by a secondary induction on δ = d, d + 1 , . . . , D that v. step t + d ≤ dist G ( v max , v ) = max { d, dist G ( v max , v ) } for every node v ∈ V with dist G ( v max , v ) = δ , thus es-tablishing condition (3). The base case ( δ = d ) of the secondary induction has already beanestablished, so assume that it holds for δ and consider a node v ∈ V with dist G ( v max , v ) = δ + 1.Let u be the node that precedes v along a shortest ( v max , v )-path in G . Since dist G ( v max , u ) = δ ,we can apply the secondary inductive hypothesis, concluding that u. step t + d ≤ dist G ( v max , u ). Asedge { u, v } is valid at time t + d , we conclude by Obs. 3.33.3 that v. step t + d ≤ u. step t + d + 1 ≤ dist G ( v max , u ) + 1 = dist G ( v max , v ), establishing the step of the secondary induction.By plugging d = D into Lem. 3.53.5, we obtain the following corollary. Corollary 3.6.
Suppose that node v max resets v max . flag ← in round t . Then, all nodes v ∈ V (1) set v. step ← D + 1 concurrently in round t + D ;(2) set v. step ← D + 2 concurrently in round t + D + 1 ; and(3) start the next phase concurrently in round t + D + 2 . Compete . Consider the execution of module
Compete in a phase π and let U ⊆ V be the set of nodes thatare still undecided at the beginning of π . The implementation of Compete is based on a binaryvariable, denoted by v. candidate ∈ { , } , that each node v ∈ U maintains, indicating that v isstill a candidate to join IN during π . When π begins, v sets v. candidate ←
1; then, v proceedsby participating in a sequence of random trials that continues as long as v. candidate = 1 and v. step ≤ D (recall that v. step is the variable that controls the deterministic suffix of module RandPhase ). Each trial consists of two rounds: in the first round, v tosses a fair coin, denoted by C v ∈ r { , } ; in the second round, v computes the indicator I C = (cid:87) u ∈ N + G ( U ) ( v ) : u. candidate =1 C u . If C v = 0 and I C = 1, then v resets v. candidate ←
0; otherwise, v. candidate remains 1.If v. candidate is still 1 when v. step is incremented to v. step ← D + 1, then v joins IN . This issensed in the subsequent round by v ’s undecided neighbors that join OUT in response. Notice thatby Corollary 3.63.6, all nodes increment the step variables concurrently to D + 1 and then to D + 2,hence nodes may join IN and OUT only during the penultimate and ultimate rounds, respectively,of phase π .We now turn to analyze Compete during phase π . Assume for the sake of the analysis that anode v ∈ U keeps on participating in the trials in a “vacuous” manner, tossing the C v coins invain, even after v. candidate ←
0, until v. step ← D + 1; this has no influence on the nodes that18ruly participate in the trials as the trials’ outcome is not influenced by any node u ∈ U with u. candidate = 0.Recall that the guarantees of RandPhase ensure that at least c log n rounds have elapsed inphase π whp before node v max resets v max . flag ←
0, where c is an arbitrarily large constant;condition hereafter on this event. Moreover, v max starts to increment variable v max . step only after v max . flag ←
0. Therefore, when a node v ∈ U sets v. step ← D + 1, we know that at least c log n rounds have already elapsed in phase π which means that the undecided nodes participate in atleast (cid:98) c / (cid:99) log n trials during π .Let v. candidate i and C iv denote the values of variable v. candidate and of coin C v , respectively,at the beginning of trial i = 1 , . . . , (cid:98) c / (cid:99) log n for each node v ∈ U . By definition, a node u ∈ U joins OUT only if there exists a node v ∈ N G ( U ) ( u ) that joins IN in the previous round (this holdsdeterministically). Moreover, given two neighboring nodes u, v ∈ U and a trial 1 ≤ i ≤ (cid:98) c / (cid:99) log n ,if C iu (cid:54) = C iv , then either u. candidate i +1 = 0 or v. candidate i +1 = 0. This means that if both u and v join IN during π , then C iu = C iv for every 1 ≤ i ≤ (cid:98) c / (cid:99) log n ; the probability for thisevent is up-bounded by 2 −(cid:98) c / (cid:99) log n = 1 /n (cid:98) c / (cid:99) . Recalling that c is an arbitrarily large constant,we conclude, by the union bound, that if a node u ∈ U joins IN during π , then all its undecidedneighbors join OUT whp.For a node v ∈ U , let Z ( v ) be the random variable that takes on the largest 0 ≤ z ≤ (cid:98) c / (cid:99) log n such that (cid:86) zi =1 C v = 1. Given a node subset W ⊆ U , let Z max ( W ) = max { Z ( v ) : v ∈ W } and let Z ( W ) = { v ∈ W | Z ( v ) = Z max ( W ) } . To complete the analysis, we fix a node u ∈ U and provethat (P1) P ( Z ( W ∪ { u } ) = { u } ) ≥ Ω (cid:16) | W | +1 (cid:17) for every node subset W ⊆ U − { u } ; and (P2) if Z ( N + G ( U ) ( u )) = { u } , then u is certain to join IN .To see that property (P1) holds, notice that the random variables Z ( v ), v ∈ U , are identicallydistributed, hence P ( Z ( W ∪ { u } ) = { u } | |Z ( W ∪ { u } ) | = 1) = | W | +1 for every W ⊆ U − { u } .Since these are also independent Geom(1 /
2) random variables truncated at Z ( v ) ≤ (cid:98) c / (cid:99) log n ,and since any (non-truncated) Geom(1 /
2) random variable does not exceed (cid:98) c / (cid:99) log n whp, weknow that P ( |Z ( W (cid:48) ) | = 1) ≥ Ω(1) for every W (cid:48) ⊆ U (see, e.g., [EW13EW13]). Therefore, P ( Z ( W ∪ { u } ) = { u } ) = P ( Z ( W ∪ { u } ) = { u } | |Z ( W ∪ { u } ) | = 1) · P ( |Z ( W ∪ { u } ) | = 1) ≥ Ω (cid:18) | W | + 1 (cid:19) . It remains to establish property (P2), showing that if Z ( N + G ( U ) ( u )) = { u } , then u joins IN .To this end, we say that u is a sole survivor for trial 0 ≤ i ≤ (cid:98) c / (cid:99) log n if u. candidate i =1 and v. candidate i = 0 for all nodes v ∈ N G ( U ) ( u ). Notice that if u. candidate i = 1 and u. candidate i +1 = 0, then there must exist a node v ∈ N G ( U ) ( u ) such that v. candidate i = 1(and C iv = 1), which means that u is not a sole survivor for trial i . Therefore, if u is a sole sur-vivor for trial i , then u is guaranteed to be a sole survivor for all remaining trials and to join IN subsequently. The analysis is completed as the event Z ( N + G ( U ) ( u )) = { u } implies that u is a solesurvivor for trial Z ( u ) = Z max ( N + G ( U ) ( u )). 19 .1.3 Implementing Module DetectMIS . The implementation of module
DetectMIS is rather straightforward: In every round, each IN node v ∈ V picks a temporary (not necessarily unique) identifier uniformly at random from [ k ] for aconstant k ≥
2. An
OUT node u ∈ V with no neighboring IN node is detected as u does not senseany temporary identifier in its (inclusive) neighborhood (this happens with probability 1). An IN node v with a neighboring IN node is detected when v senses a temporary identifier different fromits own, an event that occurs with probability at least 1 − /k . AlgLE
The LE algorithm
AlgLE share a few design features with
AlgMIS that are presented in this sectionindependently for the sake of completeness. For clarity of the exposition,
AlgLE is presented ina procedural style; converting it to a randomized state machine with O ( D ) states is straightfor-ward. The algorithm is designed assuming that the execution starts concurrently at all nodes;this assumption is plausible due to Thm. 3.13.1 and given the algorithm’s fault detection guarantees(described in the sequel). Algorithm AlgLE progresses in synchronous epochs , where every epochlasts for D rounds. Each node maintains the round number within the current epoch and invokes Restart if an inconsistency with one of its neighbors regarding this round number is detected.The execution of Algorithm
AlgLE starts with a computation stage , followed by a verificationstage . The computation stage is guaranteed to elect exactly one leader whp; it runs for O (log n )epochs in expectation and whp. The verification stage starts once the computation stage halts andcontinues indefinitely thereafter. Its role is to verify that the configuration is correct (i.e., the graphincludes exactly one leader). During the verification stage, a faulty configuration is detected in eachepoch (independently) with a positive constant probability, in which case, Restart is invoked andthe execution of
AlgLE starts from scratch once
Restart is exited. Recalling that the executionof
Restart takes O ( D ) rounds, one concludes by standard probabilistic arguments that AlgLE stabilizes within O ( D log n ) rounds in expectation and whp, thus establishing Thm. 1.31.3. During the computation stage, algorithm
AlgLE runs two modules, denoted by
RandCount and
Elect . Module
RandCount implements a “randomized counter” that signals the nodes when X epochs have elapsed since the beginning of the computation stage, where X is a random variablethat satisfies (1) X ≤ O (log n ) in expectation and whp; and (2) X ≥ c log n whp for a constant c > Elect , the nodes haltthe computation stage (and start the verification stage).To implement module
RandCount , each node v ∈ V maintains a binary variable, denoted by v. flag ∈ { , } , that is set initially to v. flag ←
1. At the beginning of each epoch, if v. flag isstill 1, then v tosses a (biased) coin and resets v. flag ← < p <
1, where p = p ( c ) is a constant determined by c . The D rounds of the epoch are now employed to allow20all) the nodes to compute the indicator I flag = (cid:87) u ∈ V u. flag . If I flag = 0, then the computationstage is halted. The correctness of RandCount follows from Obs. 3.23.2.The role of module
Elect , that runs in parallel to
RandCount , is to elect exactly one leaderwhp. The implementation of
Elect is based on a binary variable, denoted by v. candidate ∈ { , } ,maintained by each node v ∈ V , that indicates that v is still a candidate to be elected as a leader.Initially, v sets v. candidate ←
1. At the beginning of each epoch, if v. candidate is still 1, then v tosses a fair coin, denoted by C v ∈ r { , } . The D rounds of the epoch are then employed to allow v (and all other nodes) to compute the indicator I C = (cid:87) u ∈ V : u. candidate =1 C u . If C v = 0 and I C = 1,then v resets v. candidate ←
0; otherwise v. candidate remains 1. If v. candidate is still 1 whenthe computation stage comes to a halt (recall that this event is determined by module RandCount ),then v marks itself as a leader.To see that module Elect is correct, let v. candidate i and C iv denote the values of variable v. candidate and of coin C v , respectively, at the beginning of epoch i for each node v ∈ V . Noticethat if v. candidate i = 1 and v. candidate i +1 = 0, then there must exist a node u ∈ V such that u. candidate i = 1 and C iu = 1, which implies that u. candidate i +1 = 1. Therefore, at least onenode v ∈ V survives as a candidate with v. candidate = 1 at the end of each epoch.Recall that the computation stage, and hence also module Elect , lasts for at least c log n epochswhp, where c is an arbitrarily large constant; condition hereafter on this event. Given two nodes u, v ∈ V , the probability that C iu = C iv for i = 1 , . . . , c log n is up-bounded by 2 − c log n = 1 /n c .Observing that if C iu (cid:54) = C iv , then either u. candidate i +1 = 0 or v. candidate i +1 = 0, and recallingthat c is an arbitrarily large constant, we conclude, by the union bound, that no two nodes surviveas candidates when Elect halts whp, thus satisfying the promise of the computation stage.
During the verification stage, algorithm
AlgLE runs a module denoted by
DetectLE . This module isdesigned to detect configurations that include zero leaders and configurations that include at leasttwo leaders; the former task is performed deterministically (and thus succeeds with probability 1),whereas the latter relies on a (simple) probabilistic tool and succeeds with probability at least p ,where 0 < p < DetectLE is implemented as follows. If a node v ∈ V is marked as a leader, then atthe beginning of each epoch, v picks a temporary (not necessarily unique) identifier id v uniformlyat random from [ k ], where k is a positive constant integer. The D rounds of the epoch are thenemployed to verify that there is exactly one temporary identifier in the graph (in the current epoch).To this end, each node u ∈ V encodes, in its state, the first temporary identifier j ∈ [ k ] that u encounters during the epoch (either by picking j as u ’s own temporary identifier or by sensing j in its neighbors’ states) and invokes module Restart if it encounters any temporary identifier j (cid:48) ∈ [ k ] − { j } ; if u does not encounter any temporary identifier until the end of the epoch, then italso invokes Restart . This ensures that (1) if no node is marked as a leader, then all nodes invoke
Restart (deterministically); and (2) if two (or more) nodes are marked as leaders, then
Restart
21s invoked by some nodes with probability at least 1 − /k . The promise of the verification stagefollows as k can be made arbitrarily large. Restart
In this section, we implement module
Restart and establish Thm. 3.13.1. The module consistsof 2 D + 1 states denoted by σ (0) , σ (1) , . . . , σ (2 D ), where states σ (0) and σ (2 D ) play the roleof Restart - entry and Restart - exit , respectively. For a node v ∈ V , we subsequently denotethe state in which v resides at time t by q t ( v ) and the set of states sensed by v at time t by S t ( v ) = { q t ( u ) | u ∈ N + ( v ) } ; we also denote the set of all node states by Q t = { q t ( v ) | v ∈ V } .The implementation of module Restart at node v obeys the following three rules: • If S t ( v ) ∩ { σ ( i ) | ≤ i ≤ D } (cid:54) = ∅ and S t ( v ) (cid:42) { σ ( i ) | ≤ i ≤ D } , then q t +1 ( v ) ← σ (0). • If S t ( v ) ⊆ { σ ( i ) | ≤ i ≤ D } and S t ( v ) (cid:54) = { σ (2 D ) } , then q t +1 ( v ) ← σ ( i min + 1), where i min = min { i : σ ( i ) ∈ S t ( v ) } . • If S t ( v ) = { σ (2 D ) } , then q t +1 ( v ) ← q ∗ . We now turn to establish Thm. 3.13.1, starting with the following two observations.
Observation 3.7. If Q t ∩ { σ ( i ) | ≤ i ≤ D } (cid:54) = ∅ and Q t (cid:42) { σ ( i ) | ≤ i ≤ D } , then there existsa node v ∈ V that enters Restart in round t so that q t +1 ( v ) = σ (0) . Observation 3.8. If Q t ⊆ { σ ( i ) | ≤ i ≤ D } and Q t (cid:54) = { σ (2 D ) } , then min { i : σ ( i ) ∈ Q t +1 } =min { i : σ ( i ) ∈ Q t } + 1 . By combining Obs. 3.73.7 and 3.83.8, we conclude that if Q t ∩ { σ ( i ) | ≤ i ≤ D } (cid:54) = ∅ , then thereexists a time t ≤ t ≤ t + O ( D ) such that either (1) all nodes exit Restart , concurrently, at time t ; or (2) σ (0) ∈ Q t . Therefore, to establish Thm. 3.13.1, it suffices to prove that if σ (0) ∈ Q t , thenthere exists a time t ≤ t ≤ t + O ( D ) such that all nodes exit Restart , concurrently, at time t .This is done based on the following three lemmas. Lemma 3.9.
Consider a node v ∈ V and suppose that q t ( v ) = σ (0) . Then, { q t + d ( u ) | dist G ( u, v ) ≤ d } ⊆ { σ ( j ) | ≤ j ≤ d } for every ≤ d ≤ D .Proof. By induction on d = 0 , , . . . , D . The assertion holds trivially for d = 0, so assume that theassertion holds for d − u ∈ V whose distance from v is dist G ( u, v ) = d . Let u (cid:48) be the node that precedes u along a shortest ( v, u )-path in G . Since dist G ( v, u (cid:48) ) = d −
1, it followsby the inductive hypothesis that q t + d − ( u (cid:48) ) ∈ { σ ( j ) | ≤ j ≤ d − } . As q t + d − ( u (cid:48) ) ∈ S t + d − ( u ),we conclude by the design of Restart that q t + d ( u ) ∈ { σ ( j ) | ≤ j ≤ d } , thus establishing theassertion. 22 emma 3.10. Assume that Q t ⊆ { σ ( j ) | ≤ j ≤ D } and let j min = min { j : σ ( j ) ∈ Q t } . Then, Q t + h ⊆ { σ ( i ) | j min + h ≤ i ≤ D + h } for every ≤ h ≤ D .Proof. By induction on h = 0 , , . . . , D . The assertion holds trivially for h = 0, so assume thatthe assertion holds for h − v ∈ V . The inductive hypothesis guarantees that S t + h − ( v ) ⊆ { σ ( i ) | j min + h − ≤ i ≤ D + h − } . The assertion follows by the design of Restart ensuring that q t + h ( v ) = σ ( i min + 1), where i min = min { i : σ ( i ) ∈ S t + h − ( v ) } . Lemma 3.11.
Assume that Q t ⊆ { σ ( j ) | ≤ j ≤ D } . Let j min = min { j : σ ( j ) ∈ Q t } and let v min be a node with q t ( v min ) = σ ( j min ) . Then, { q t + d ( v ) | dist G ( v min , v ) ≤ d } = { σ ( j min + d ) } for every ≤ d ≤ D .Proof. By induction on d = 0 , , . . . , D . The assertion holds trivially for d = 0, so assume that theassertion holds for d − v ∈ V whose distance from v min is dist G ( v min , v ) = d .Let v (cid:48) be the node that precedes v along a shortest ( v min , v )-path in G . Since dist G ( v min , v (cid:48) ) = d − q t + d − ( v (cid:48) ) = σ ( j min + d − S t + d − ( v ) ⊆ { σ ( i ) | j min + d − ≤ i ≤ D + d − < D } , hence min { i : σ ( i ) ∈ S t + d − ( v ) } = j min + d − q t + d ( v ) = σ ( j min + d ), thus establishing the assertion.We are now ready to complete the proof of Thm. 3.13.1. Consider a node v ∈ V that satisfies q t ( v ) = σ (0). By employing Lem. 3.93.9 with d = D , we deduce that Q t + D ⊆ { σ ( j ) | ≤ j ≤ D } .Therefore, we can employ Lem. 3.113.11 with d = D to conclude that there exists an index D ≤ i ≤ D such that Q t +2 D = { σ ( i ) } . From time t + 2 D onwards, all nodes “progress in synchrony” untiltime t + 2 D + 2 D − i = t + 4 D − i at which we get Q t +4 D − i = { σ (2 D ) } . Thus, all nodes exit Restart , concurrently, in round t + 4 D − i ≤ t + 3 D . Consider a distributed task T , restricted to D -bounded diameter graphs, and let Π = (cid:104) Q, Q O , ω, δ (cid:105) be a synchronous self-stabilizing algorithm for T whose stabilization time on n -node instances isbounded by f ( n, D ) in expectation and whp. Our goal in this section is to lift the synchronousschedule assumption, thus establishing Corollary 1.21.2. Specifically, we employ the self-stabilizingAU algorithm AlgAU promised in Thm. 1.11.1, combined with the ideas behind the non-self-stabilizingSA transformer of [EW13EW13] (see also [AEK18bAEK18b]), to develop a synchronizer that converts Π into aself-stabilizing algorithm Π ∗ = (cid:104) Q ∗ , Q ∗O , ω ∗ , δ ∗ (cid:105) for T with state space | Q ∗ | ≤ O ( D · | Q | ) whosestabilization time on n -node instances is bounded by f ( n, D ) + O ( D ) in expectation and whp forany (arbitrarily asynchronous) schedule.Let K be the cyclic group corresponding to the AU clock values. Let T and T K be the state setand output state set, respectively, of AlgAU . The state set Q ∗ of Π ∗ is defined to be the Cartesianproduct Q ∗ = Q × Q × T . We also define Q ∗O = { Q O × Q × T K } and for each output Π ∗ -state s = ( q, q (cid:48) , ν ) ∈ Q ∗O , define ω ∗ ( s ) = ω ( q ). 23onsider a node v ∈ V residing in a state s = ( q, q (cid:48) , ν ) ∈ Q ∗ of Π ∗ . The state transition function δ ∗ of Π ∗ is designed so that Π ∗ simulates the operation of AlgAU , encoding
AlgAU ’s current statein the third coordinate of s . Once AlgAU has stabilized, Π ∗ uses the first two coordinates of s tosimulate the operation of Π every time AlgAU advances its clock value, interpreting q and q (cid:48) as v ’scurrent and previous Π-states, respectively.More formally, suppose that node v is activated at time t and that AlgAU advances its clockvalue by changing its output state from ν ∈ T K to ν (cid:48) ∈ T K in step t . When this happens, node v moves from Π ∗ -state s = ( q, q (cid:48) , ν ) ∈ Q ∗ to Π ∗ -state s (cid:48) = ( p, q, ν (cid:48) ) ∈ Q ∗ , where the Π-state p isdetermined according to the following mechanism: Let S tv, Π ∈ { , } Q be the simulated Π-signalof v at time t defined by setting S tv, Π ( r ) = 1, r ∈ Q , if and only if v senses at time t at least oneΠ ∗ -state of the form ( r, · , ν ) or ( · , r, ν (cid:48) ). The Π-state p is then determined by applying the statetransition function of Π to q and S tv, Π , that is, p is picked uniformly at random from δ (cid:16) q, S tv, Π (cid:17) . The algorithmic model considered in the current paper is a restricted version of the SA modelintroduced by Emek and Wattenhofer [EW13EW13] and studied subsequently by Afek et al. [AEK18aAEK18a,AEK18bAEK18b] and Emek and Uitto [EU20EU20]. Specifically, the communication scheme in the latter modelrelies on asynchronous message passing, thus enhancing the power of the adversarial scheduler byallowing it to determine not only the node activation pattern, but also the time delay of each trans-mitted message. Whether our algorithmic results can be modified to work with such a (stronger)scheduler is left as an open question. The reader is referred to [AEK18aAEK18a, AEK18bAEK18b] for a discussionof various other aspects of the SA model and its variants.The communication scheme of the SA model is closely related to the beeping model [CK10CK10,FW10FW10], where in every (synchronous) round, each node either listens or beeps and a listening nodereceives a binary signal indicating whether at least one of its neighbors beeps in that round (seealso the set-broadcast (SB) communication model of [HJK + + + + n and that this parameter cannot be modified by the adversary. In contrast, our algorithmic modelis inherently size-uniform as the nodes cannot even encode (any function of) n in their internalmemory.A beeping algorithm that is more closely related to the computational limitations of our modelis that of Gilbert and Newport [GN15GN15] for LE in complete graphs. This algorithm is implemented In [SJX13SJX13], the knowledge of n is implicit and is only required for bounding the initial values in the node’sregisters.
24y nodes with constant size internal memory, hence it can be viewed as a SA algorithm with asingle message type. In fact, one of the techniques used in the current paper for implementing aprobabilistic counter resembles a technique used also in [GN15GN15]. Notice though that the algorithmof [GN15GN15] is not only restricted to complete graphs, but also requires a synchronous schedule andcannot cope with transient faults; in this regard, it is less robust than our LE algorithm.The AU task was introduced by Couvreur et al. [CFG92CFG92] as a fundamental primitive for asyn-chronous systems. Shortly after, Awerbuch et al. [AKM + +
93] observed that this task captures theessence of constructing a self-stabilizing synchronizer and developed a self-stabilizing anonymoussize-uniform AU algorithm that stabilizes in O ( D ) time, albeit with an unbounded state space. Byincorporating a reset module into their algorithm, Awerbuch et al. obtained a self-stabilizing AUalgorithm with a bounded state space and the same asymptotic stabilization time, however, thereset module requires unique node IDs and/or the knowledge of n (or an approximation thereof),which means in particular that its state space is Ω(log n ); it also relies on unicast communication.Since then, the AU task has been extensively investigated in different computational modelsand for a variety of graph classes [BPV04BPV04, BPV05BPV05, BPV06BPV06, DP12DP12, DJ19DJ19]. For general graphs,Boulinier et al. [BPV04BPV04] developed a self-stabilizing AU algorithm that can be implemented undera set-broadcast communication model (very similar to the communication model used in the currentpaper). The state space and stabilization time bounds of their algorithm are linear in the length C G of the maximal cycle of the shortest maximum cycle basis and the length T G of the longestchordless cycle of the underlying graph G . Notice that the performance of the AU algorithm ofBoulinier et al. cannot be directly compared to the performance of our AU algorithm: on the onehand, there are graphs of linear diameter in which C G + T G = O (1); on the other hand, there aregraphs of constant diameter in which C G + T G = Ω( n ).25 eferences [AAB +
11] Yehuda Afek, Noga Alon, Ziv Bar-Joseph, Alejandro Cornejo, Bernhard Haeupler, andFabian Kuhn. Beeping a maximal independent set. In David Peleg, editor,
DistributedComputing - 25th International Symposium, DISC 2011, Rome, Italy, September 20-22, 2011. Proceedings , volume 6950 of
Lecture Notes in Computer Science , pages 32–50.Springer, 2011.[ABI86] Noga Alon, L´aszl´o Babai, and Alon Itai. A fast and simple randomized parallel algo-rithm for the maximal independent set problem.
J. Algorithms , 7(4):567–583, 1986.[ADDP19] Karine Altisen, St´ephane Devismes, Swan Dubois, and Franck Petit.
Introduction toDistributed Self-Stabilizing Algorithms . Synthesis Lectures on Distributed ComputingTheory. Morgan & Claypool Publishers, 2019.[AEK18a] Yehuda Afek, Yuval Emek, and Noa Kolikant. Selecting a leader in a network offinite state machines. In Ulrich Schmid and Josef Widder, editors, , volume 121 of
LIPIcs , pages 4:1–4:17. Schloss Dagstuhl - Leibniz-Zentrumf¨ur Informatik, 2018.[AEK18b] Yehuda Afek, Yuval Emek, and Noa Kolikant. The synergy of finite state machines.In Jiannong Cao, Faith Ellen, Luis Rodrigues, and Bernardo Ferreira, editors, , volume 125 of
LIPIcs , pages 22:1–22:16. SchlossDagstuhl - Leibniz-Zentrum f¨ur Informatik, 2018.[AKM +
93] Baruch Awerbuch, Shay Kutten, Yishay Mansour, Boaz Patt-Shamir, and GeorgeVarghese. Time optimal self-stabilizing synchronization. In S. Rao Kosaraju, David S.Johnson, and Alok Aggarwal, editors,
Proceedings of the Twenty-Fifth Annual ACMSymposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA , pages652–661. ACM, 1993.[Awe85] Baruch Awerbuch. Complexity of network synchronization.
J. ACM , 32(4):804–823,1985.[BPV04] Christian Boulinier, Franck Petit, and Vincent Villain. When graph theory helps self-stabilization. In Soma Chaudhuri and Shay Kutten, editors,
Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004,St. John’s, Newfoundland, Canada, July 25-28, 2004 , pages 150–159. ACM, 2004.[BPV05] Christian Boulinier, Franck Petit, and Vincent Villain. Synchronous vs. asynchronousunison. In Ted Herman and S´ebastien Tixeuil, editors,
Self-Stabilizing Systems, 7th In- ernational Symposium, SSS 2005, Barcelona, Spain, October 26-27, 2005, Proceedings ,volume 3764 of Lecture Notes in Computer Science , pages 18–32. Springer, 2005.[BPV06] Christian Boulinier, Franck Petit, and Vincent Villain. Toward a time-optimal oddphase clock unison in trees. In Ajoy Kumar Datta and Maria Gradinariu, editors,
Stabilization, Safety, and Security of Distributed Systems, 8th International Sympo-sium, SSS 2006, Dallas, TX, USA, November 17-19, 2006, Proceedings , volume 4280of
Lecture Notes in Computer Science , pages 137–151. Springer, 2006.[CFG92] Jean-Michel Couvreur, Nissim Francez, and Mohamed G. Gouda. Asynchronous unison(extended abstract). In
Proceedings of the 12th International Conference on DistributedComputing Systems, Yokohama, Japan, June 9-12, 1992 , pages 486–493. IEEE Com-puter Society, 1992.[CK10] Alejandro Cornejo and Fabian Kuhn. Deploying wireless networks with beeps. InNancy A. Lynch and Alexander A. Shvartsman, editors,
Distributed Computing, 24thInternational Symposium, DISC 2010, Cambridge, MA, USA, September 13-15, 2010.Proceedings , volume 6343 of
Lecture Notes in Computer Science , pages 148–162.Springer, 2010.[Dij74] Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control.
Commun.ACM , 17(11):643–644, 1974.[DJ19] St´ephane Devismes and Colette Johnen. Self-stabilizing distributed cooperative reset.In , pages 379–389. IEEE, 2019.[Dol00] Shlomi Dolev.
Self-Stabilization . MIT Press, 2000.[DP12] St´ephane Devismes and Franck Petit. On efficiency of unison. In L´elia Blin and YannBusnel, editors, , pages 20–25. ACM, 2012.[DT11] Swan Dubois and S´ebastien Tixeuil. A taxonomy of daemons in self-stabilization.
CoRR , abs/1110.0334, 2011.[EU20] Yuval Emek and Jara Uitto. Dynamic networks of finite state machines.
Theor. Comput.Sci. , 810:58–71, 2020.[EW13] Yuval Emek and Roger Wattenhofer. Stone age distributed computing. In PanagiotaFatourou and Gadi Taubenfeld, editors,
ACM Symposium on Principles of DistributedComputing, PODC ’13, Montreal, QC, Canada, July 22-24, 2013 , pages 137–146. ACM,2013. 27FW10] Roland Flury and Roger Wattenhofer. Slotted programming for sensor networks. InTarek F. Abdelzaher, Thiemo Voigt, and Adam Wolisz, editors,
Proceedings of the 9thInternational Conference on Information Processing in Sensor Networks, IPSN 2010,April 12-16, 2010, Stockholm, Sweden , pages 24–34. ACM, 2010.[GN15] Seth Gilbert and Calvin C. Newport. The computational power of beeps. In YoramMoses, editor,
Distributed Computing - 29th International Symposium, DISC 2015,Tokyo, Japan, October 7-9, 2015, Proceedings , volume 9363 of
Lecture Notes in Com-puter Science , pages 31–46. Springer, 2015.[HJK +
15] Lauri Hella, Matti J¨arvisalo, Antti Kuusisto, Juhana Laurinharju, Tuomo Lempi¨ainen,Kerkko Luosto, Jukka Suomela, and Jonni Virtema. Weak models of distributed com-puting, with connections to modal logic.
Distributed Comput. , 28(1):31–53, 2015.[MRSZ11] Yves M´etivier, John Michael Robson, Nasser Saheb-Djahromi, and Akka Zemmari. Anoptimal bit complexity randomized distributed MIS algorithm.
Distributed Comput. ,23(5-6):331–340, 2011.[SJX13] Alex Scott, Peter Jeavons, and Lei Xu. Feedback from nature: an optimal distributedalgorithm for maximal independent set selection. In Panagiota Fatourou and GadiTaubenfeld, editors,
ACM Symposium on Principles of Distributed Computing, PODC’13, Montreal, QC, Canada, July 22-24, 2013 , pages 147–156. ACM, 2013.28
PPENDIX
A A Failed Attempt
In this section, we present a failed attempt to design a self-stabilizing AU algorithm based on thedesign feature of restarting the algorithm when a fault is detected. Specifically, the algorithm con-sists of two components: the main component is responsible for the liveness condition, controllingthe execution when no faults occur; the second component is a reset mechanism, responsible forrestarting the execution from a fault free initial configuration when a fault is detected.Given a constant c >
1, let T = { (cid:96) | ≤ (cid:96) ≤ cD } be the set of turns of the main component andlet R = { R i | ≤ i ≤ cD } be the set of reset turns. For a node v ∈ V , let θ tv be the turn of v attime t and let Θ tv = { θ tu | u ∈ N + ( v ) } be the set of turns that v senses at time t . The protocol hasthree types of state transitions presented from the perspective of a node v ∈ V . State transition of type (ST1).
The first type of state transitions is equivalent to the typeAA transitions of
AlgAU . Suppose that v is activated at time t and that θ tv = (cid:96) ∈ T and let (cid:96) (cid:48) = (cid:96) + 1 mod 2 D . Then, v performs a type (ST1) transition if Λ tv ⊆ { (cid:96), (cid:96) (cid:48) } . This type of statetransition updates the turn of v from θ tv = (cid:96) to θ t +1 v = (cid:96) (cid:48) . State transition of type (ST2).
The second type of state transition is applied when v sensesa fault. Specifically, suppose that v is activated at time t and that θ tv = (cid:96) ∈ T and let (cid:96) (cid:48) = (cid:96) + 1 mod cD + 1 and (cid:96) (cid:48)(cid:48) = (cid:96) − cD + 1. Then, (1) if (cid:96) (cid:54) = 0, then v performs a type(ST2) transition if Θ tv (cid:42) { (cid:96), (cid:96) (cid:48) , (cid:96) (cid:48)(cid:48) } ; and (2) if (cid:96) = 0, then v performs a type (ST2) transition ifΘ tv (cid:42) { (cid:96), (cid:96) (cid:48) , (cid:96) (cid:48)(cid:48) , R cD } . This type of state transition updates the turn of v from θ tv = (cid:96) to θ t +1 v = R . State transition of type (ST3).
The third type of state transitions is responsible for the theprogress of the reset mechanism. Suppose that v is activated at time t and that θ tv = R i . Then, v performs a type (ST3) transition if either (1) i (cid:54) = cD and Θ tv ⊆ { R j | i ≤ j ≤ cD } ; or (2) i = cD andΘ tv ⊆ { R cD , } . This type of state transition updates the turn of v from θ tv = R i to (1) θ t +1 v = R i +1 if i (cid:54) = cD ; (2) θ t +1 v = 0 if i = cD . Counter Example
Consider the configuration depicted in Figure 2(a)2(a), where D = 2 and assume that c = 2 (theexample can be easily adapted to other choices of the constant c ). Suppose that node v t − isactivated in step t for t = 1 , . . . ,
8. Notice that(1) nodes v and v do not change their turns;(2) node v performs a type (ST2) transition; and(3) node v i performs a type (ST3) transition for 3 ≤ i ≤ IGURES AND TABLES
Table 1: The transition types of
AlgAU in step t .Type Pre-transition turn Post-transition turn ConditionAA (cid:96) , 1 ≤ | (cid:96) | ≤ k φ +1 ( (cid:96) ) v is good and Λ tv ⊆ (cid:8) (cid:96), φ +1 ( (cid:96) ) (cid:9) AF (cid:96) , 2 ≤ | (cid:96) | ≤ k (cid:98) (cid:96) v / ∈ V t p or v senses turn (cid:92) ψ − ( (cid:96) )FA (cid:98) (cid:96) , 2 ≤ | (cid:96) | ≤ k Ψ − ( (cid:96) ) Λ tv ∩ Ψ > ( (cid:96) ) = ∅ − − − (cid:96)(cid:96) − kk (cid:98) (cid:100) − (cid:98) (cid:96) (cid:100) − (cid:96) (cid:98) k (cid:100) − k Figure 1: The turns of
AlgAU and their transition diagram. The type AA transitions, type AFtransitions, and type FA transitions are depicted by the solid (black) arrows, dashed (red) arrows,and dotted (blue) arrows, respectively. 310 R R R R R R v v v v v v v v (a) R R R R R R v v v v v v v v (b)(b)