[PDF] Improved Deterministic Network Decomposition

Abstract

Network decomposition is a central tool in distributed graph algorithms. We present two improvements on the state of the art for network decomposition, which thus lead to improvements in the (deterministic and randomized) complexity of several well-studied graph problems. - We provide a deterministic distributed network decomposition algorithm with O( log 5 n) round complexity, using O(logn) -bit messages. This improves on the O( log 7 n) -round algorithm of Rozhoň and Ghaffari [STOC'20], which used large messages, and their O( log 8 n) -round algorithm with O(logn) -bit messages. This directly leads to similar improvements for a wide range of deterministic and randomized distributed algorithms, whose solution relies on network decomposition, including the general distributed derandomization of Ghaffari, Kuhn, and Harris [FOCS'18]. - One drawback of the algorithm of Rozhoň and Ghaffari, in the CONGEST model, was its dependence on the length of the identifiers. Because of this, for instance, the algorithm could not be used in the shattering framework in the CONGEST model. Thus, the state of the art randomized complexity of several problems in this model remained with an additive 2 O( loglogn √ ) term, which was a clear leftover of the older network decomposition complexity [Panconesi and Srinivasan STOC'92]. We present a modified version that remedies this, constructing a decomposition whose quality does not depend on the identifiers, and thus improves the randomized round complexity for various problems.

Full PDF

aa r X i v : . [ c s . D S ] J u l Improved Deterministic Network Decomposition

Mohsen Ghaﬀari

ETH Zurichghaﬀ[email protected]

Christoph Grunau

ETH [email protected]

V´aclav Rozhoˇn

ETH [email protected]

Abstract

Network decomposition is a central tool in distributed graph algorithms. We present two im-provements on the state of the art for network decomposition, which thus lead to improvementsin the (deterministic and randomized) complexity of several well-studied graph problems.- We provide a deterministic distributed network decomposition algorithm with O (log n )round complexity, using O (log n )-bit messages. This improves on the O (log n )-round algo-rithm of Rozhoˇn and Ghaﬀari [STOC’20], which used large messages, and their O (log n )-round algorithm with O (log n )-bit messages. This directly leads to similar improvementsfor a wide range of deterministic and randomized distributed algorithms, whose solutionrelies on network decomposition, including the general distributed derandomization ofGhaﬀari, Kuhn, and Harris [FOCS’18].- One drawback of the algorithm of Rozhoˇn and Ghaﬀari, in the CONGEST model, was itsdependence on the length of the identiﬁers. Because of this, for instance, the algorithmcould not be used in the shattering framework in the

CONGEST model. Thus, the state ofthe art randomized complexity of several problems in this model remained with an additive2 O ( √ log log n ) term, which was a clear leftover of the older network decomposition complexity[Panconesi and Srinivasan STOC’92]. We present a modiﬁed version that remedies this,constructing a decomposition whose quality does not depend on the identiﬁers, and thusimproves the randomized round complexity for various problems. ontents Introduction and Related Work

Network decomposition is a central tool in distributed graph algorithms that was ﬁrst introduced inthe seminal work of Awerbuch, Goldberg, Luby, and Plotkin [AGLP89]. Currently, the complexityof a wide range of deterministic and randomized distributed algorithms for various local graphproblems rests on the complexity of network decomposition. In this work, we present (quantitativeand qualitative) improvements on the state of the art network decomposition algorithm.

Distributed Model : We work with the standard synchronous message passing modeling of dis-tributed algorithms on networks. The network is abstracted as an n -node graph and there is oneprocessor on each node of the graph. Per round, each processor/node can send one message toeach neighbor. If the message size is unbounded, the this is known as the LOCAL model [Lin87].If the message size is bounded, to some B bits, this is known as the CONGEST model; the typicalassumption then is that B = Θ(log n ). Initially, nodes do not know the topology of the network G , except for potentially some estimates on basic global parameters such as the number of nodes n (which is tight up to a polynomial). When discussing deterministic algorithms, we assume thateach node has a unique b -bit identiﬁer, and again the most typical case is to assume b = Θ(log n ).At the end of the algorithm, each node should know its own part of the output, e.g., its own colorwhen coloring the vertices. The main measure of interest is the round complexity of the algorithm,i.e., the number of rounds until all nodes have ﬁnished their computation. Network Decomposition : A (

C, D ) network decomposition of a graph G = ( V, E ) is a partitionof the vertices into disjoint clusters such that each cluster has diameter at most O ( D ) and whereclusters are colored with O ( C ) colors in a way that adjacent clusters have diﬀerent colors. Asmall subtlety is in the deﬁnition of the term “diameter”, according to which we can categorizedecompositions into two types: (A) in a strong-diameter decomposition , any two vertices of a clusterhave distance O ( D ) in the subgraph induced by that cluster, (B) in a weak-diameter decomposition ,any two vertices of a cluster have distance O ( D ) in the base graph G .For any n -node graph, there is an (log n, log n ) network decomposition, and this can be com-puted sequentially via a simple ball carving algorithm [AP90, LS93]. Network decomposition isimmediately useful for distributed algorithms. As a simple example, given a ( C, D ) network de-composition (even with weak-diameter), we can compute a maximal independent set (MIS) of thegraph in O ( CD ) rounds in the LOCAL model, by simulating the corresponding sequential greedyalgorithm, as follows: We process the colors one by one. Per color, each cluster gathers the topol-ogy of the cluster and its immediate neighborhood to the center of the cluster, in O ( D ) rounds,and decides which vertices of the cluster can be added to the MIS. See [GKM17] for a more gen-eral explanation of how one can transform a certain class of sequential algorithms (formally, in the SLOCAL -model) to distributed algorithms in the

LOCAL model, using network decomposition, withan O ( CD ) overhead in locality. See also Section 1.4 for other related work. Deterministic Algorithms : Awerbuch et al. [AGLP89] gave an algorithm that deterministi-cally computes (

C, D ) strong-diameter network decomposition in T rounds (even in the CONGEST model), where C = D = T = 2 O ( √ log n log log n ) . Panconesi and Srinivasan [PS92] provided a variantof this deterministic algorithm (for the LOCAL model) that improved the bounds to C = D = T =2 O ( √ log n ) . However, this 2 O ( √ log n ) bound remained the state of the art complexity for network1ecomposition for over 25 years. It also remained the state of the art deterministic complexity fora long list of other fundamental graph problems whose solutions deterministic relied on networkdecomposition, including maximal independent set, ∆ + 1 coloring, Lov´asz Local Lemma, etc, andwhich were known to admit poly(log n ) round randomized algorithms. This signiﬁcant gap betweenrandomized and deterministic algorithms was a central open problem in distributed graph algo-rithms; see, e.g., the open problems chapter of the 2013 book by Barenboim and Elkin book [BE13].Surprisingly, it was also (a provable) bottleneck in the complexity of many randomized algorithms,as shown by Chang, Kopelowitz, and Pettie [CKP16]. See Section 1.4 for some other related work.Recently, Rozhoˇn and Ghaﬀari [RG20] presented the ﬁrst deterministic decomposition algo-rithm with poly-logarithmic parameters and complexity. Concretely, they obtained a (log n, log n )weak-diameter decomposition in O (log n ) rounds of the LOCAL model or O (log n ) rounds ofthe CONGEST model. They also explained how this leads to a (log n, log n ) strong-diameter de-composition in O (log n ) rounds of the LOCAL model. These results led to the ﬁrst poly(log n )round deterministic distributed algorithms for a wide range of local graph problems, as wellas signiﬁcant improvements for many randomized algorithms (in the shattering framework, see,e.g., [BEPS16, Gha16, CLP18, CFG + Our contributions provide improvements on the result of Rozhoˇn and Ghaﬀari [RG20], in twoessentially-orthogonal directions:

Direction 1 – Faster Decomposition, and Applications : Our ﬁrst contribution is to presenta faster algorithm that also computes a qualitatively better network decomposition:

Theorem 1.1 (Informal Version of Theorem 2.1) . There is a deterministic distributed algorithm,in the

CONGEST model, that computes a (log n, log n ) network decomposition in O (log n ) rounds. This should be contrasted with the (log n, log n ) network decomposition of Rozhoˇn and Ghaf-fari [RG20] that had a O (log n ) round complexity in the LOCAL model and O (log n ) roundcomplexity in the CONGEST model. As in their work, in the

LOCAL model, one can turn this intoa strong-diameter (log n, log n ) network decomposition, in O (log n ) rounds.Our faster O (log n )-round algorithm immediately leads to a similar round complexity improve-ment for all the applications of deterministic network decomposition. As concrete examples, weshow how we can deterministically solve maximal independent set and ∆ + 1 coloring problems in O (log n ) and O (log n ) rounds of the CONGEST model, respectively. These algorithms improve onthe O (log n )-round LOCAL model algorithms of Rozhoˇn and Ghaﬀari [RG20] for the

LOCAL model,as well as the O (log n )-round CONGEST -model algorithms of Censor-Hillel et al. [CHPS17, RG20]for MIS and of Bamberger et al. [BKM20] for coloring. We comment that these improvements,besides the new network decomposition, also use some other ideas for pipelining information in the

CONGEST model to save an additional factor of log n . Corollary 1.2.

There is a deterministic distributed algorithm, in the

CONGEST model, that com-putes a maximal independent set in O (log n ) rounds. Corollary 1.3.

There is a deterministic distributed algorithm, in the

CONGEST model, that com-putes a ∆ + 1 coloring, where ∆ is an upper bound on the maximum degree, in O (log n ) rounds. Direction 2 – Identiﬁer-Independent Decomposition, with Application : One drawbackof the construction of Rozhoˇn and Ghaﬀari [RG20] in the

CONGEST model was that the quality2f the obtained network decomposition depends on the length of the identiﬁers provided. Forinstance, in a network with b -bit identiﬁers — and where thus b -bit messages are permitted—theiralgorithm computes a (log n, b log n ) decomposition, in O ( b log n ) rounds. This bad dependencyon the length of the identiﬁers becomes a bottleneck in some applications: in particular, it was notpossible to use their algorithm in the shattering framework for randomized algorithms with smallmessages, and the best known algorithm in the CONGEST -model remained with an 2 O ( √ log log n ) term in the round complexity, which was a clear remnant of the old 2 O ( √ log n ) round complexity ofdeterministic network decomposition [PS92, Gha19]. We present a variant of their algorithm thatcomputes a (log n, log n ) decomposition in O (log n + log n log ∗ b ) rounds in the setting with b -bitidentiﬁers and using b -bit messages. This is achieved by replacing the reliance of the construction’sinvariant on the bits of the identiﬁers by some semi-balanced 2-coloring of the clusters, which iscomputed in the course of the construction.Furthermore, we show that this second improvement is compatible with the ﬁrst, in the sensethat we can put the two ideas together and get a faster algorithm that constructs an identiﬁer-independent network decomposition. In particular, we get an algorithm that computes a (log n, log n )decomposition in O (log n + log n log ∗ b ) rounds in the setting with b -bit identiﬁers and using b -bitmessages. Theorem 1.4 (Informal Version of Theorem 4.1) . There is a deterministic distributed algorithm, inthe

CONGEST model, that computes a (log n, log n ) network decomposition in O (log n +log n log ∗ b ) rounds in the setting with b -bit identiﬁers and using b -bit messages. This leads to improvements for randomized algorithms in the

CONGEST -model, in the shatteringframework. For instance, for MIS, we get this result:

Corollary 1.5.

There is a randomized distributed algorithm that computes a maximal independentset in O (log ∆ · log log n + log log n ) rounds of the CONGEST model, with high probability.

In contrast, the previous best algorithm had complexity O (log ∆ · log log n )+2 O ( √ log log n ) [GP19].We get a similar result for ∆ + 1 coloring. Corollary 1.6.

There is a randomized distributed algorithm, in the

CONGEST model, that computesa ∆ + 1 coloring in any n -node graph with maximum degree at most ∆ in O (log ∆ + log log n ) rounds, with high probability. Here, we discuss some of the other related work that were not mentioned before.

Usages of Decompositions : Network decomposition has been a central algorithm tool in dis-tributed graph algorithms, since the work of Awerbuch et al. [AGLP89]. The work of [GKM17,GHK18] generalized this much further: (1) [GKM17] showed that one can use algorithms for (

C, D )decomposition to transform any sequential local algorithm (formally, in the

SLOCAL model deﬁnedby [GKM17]) to the

LOCAL model with only a slow down proportional to CD , when using a ( C, D )decomposition algorithm. (2) the work of [GHK18] showed, how using the former together with themethod of conditional expectation, one can derandomize any T -round randomized LOCAL modelalgorithm for any problem whose solution can be checked deterministically in R rounds to a deter-ministic LOCAL model algorithm with round complexity O ( CD ( R + T )) plus the time necessaryto construct the network decomposition. Because of this, and the recent network decompositionalgorithm of Rozhoˇn and Ghaﬀari [RG20], there is now a general eﬃcient derandomization the-orem for the LOCAL model, which states that any poly(log n )-round randomized LOCAL model3lgorithm for any locally checkable problem can be transformed to a deterministic

LOCAL modelalgorithm for the same problem, with only a poly(log n ) round slow down. With our improvednetwork decomposition, the slow down is now improved to O (log n ). Decomposition Construction, Randomized Algorithms : Linial and Saks [LS93] gave a ran-domized algorithm that computes a (log n, log n ) weak-diameter network decomposition in O (log n )rounds of the CONGEST model, with high probability. Elkin and Neiman [EN16] presented arandomized algorithm that computes a (log n, log n ) strong-diameter network decomposition in O (log n ) rounds of the CONGEST model, with high probability.

Decomposition Construction, Other Deterministic Results : Let us also mention some otherdeterministic results on constructing decompositions. As discussed before, the classic deterministicalgorithm of Panconesi and Srinivasan provided a (

C, D ) decomposition in T rounds of the LOCAL model where C = D = T = 2 O ( √ log n ) . Awerbuch et al. [ABCP96] showed that, in the LOCAL model, one can turn this into a (log n, log n ) decomposition in 2 O ( √ log n ) rounds. Ghaﬀari [Gha19]gave a network decomposition algorithm matching the C = D = T = 2 O ( √ log n ) bounds of Panconesiand Srinivasan in the CONGEST model. Ghaﬀari and Portmann [GP19] gave an extension of thisto power graphs G k : in k O ( √ log n ) rounds of the CONGEST model, their algorithm creates clusterscolored with 2 O ( √ log n ) colors, such that clusters of the same color have distance at least k , andeach cluster has diameter at most k O ( √ log n ) in graph G . They also discussed the applicationsof this power-graph decomposition for various problems including MIS, spanners, dominating setapproximation, and neighborhood covers. The bounds were improved considerably in the work ofRozhoˇn and Ghaﬀari [RG20]: in kO (log n ) rounds of the CONGEST model, their algorithm createsclusters colored with O (log n ) colors, such that clusters of the same color have distance at least k ,and each cluster has diameter O ( k log n ) in graph G . In this section we state our ﬁrst technical contribution, a faster network decomposition algorithm.

Theorem 2.1.

Let G be a graph on n nodes where each node has a unique b = O (log n ) -bitidentiﬁer. There is a deterministic distributed algorithm that computes a network decomposition of G with O (log n ) colors and weak-diameter O (log n ) , in O (log n ) rounds of the CONGEST modelwith

Θ(log n ) sized messages.Moreover, for each cluster C of vertices in the output network decomposition, we have a Steinertree T C with radius O (log n ) in G , for which the set of terminal nodes is equal to C . Each vertexof G is in O (log n ) Steiner trees of any given color out of the O (log n ) color classes. Our improvement of the decomposition result of [RG20] comes from the improvement of their ball carving algorithm. That is, we get a faster O (log n )-round algorithm that clusters at leasthalf of the yet unclustered vertices into non-adjacent clusters, each cluster having a weak diameterof O (log n ). We remark that there is a randomized ball carving algorithm that, in O (log n )rounds of the CONGEST model, clusters at least half of the vertices into non-adjacent clusters with O (log n ) weak-diameter in O (log n ) rounds [LS93], and one can also achieve the same with strong-diameter [EN16]. These directly lead to (log n, log n ) weak and strong diameter decompositions inthese two papers [LS93, EN16], in O (log n ) rounds of the CONGEST model.

Theorem 2.2. (Ball carving algorithm) Consider an arbitrary n -node graph G where each nodehas a unique b = O (log n ) -bit identiﬁer, together with a subset S ⊆ V of living vertices. Thereis a deterministic distributed algorithm that in O (log n ) rounds of the CONGEST model ﬁnds a ubset S ′ ⊆ S of living vertices, where | S ′ | ≥ | S | / , such that the subgraph G [ S ′ ] induced by S ′ ispartitioned into non-adjacent disjoint clusters, each of weak-diameter O (log n ) in G .Moreover, for each cluster C of vertices, we have a Steiner tree T C with radius O (log n ) in G for which the set of terminal nodes is equal to C . Each vertex in G is in O (log n ) Steiner trees.

Theorem 2.1 is obtained by log n applications of Theorem 2.2, starting from S = V . For eachiteration j ∈ [1 , log n ], the set S ′ are exactly nodes of color j in the network decomposition, andwe continue to the next iteration by setting S ← S \ S ′ . The rest of this section describes the distributed ball carving algorithm that proves Theorem 2.2. Our algorithm builds on the algorithm of Rozhoˇn and Ghaﬀari [RG20]. Thus, before provingTheorem 2.2, we start by reviewing their algorithm. Afterwards, we discuss where their algorithmhas room for improvement and how our algorithm makes use of that.

A Recap of the Ball Carving Algorithm of Rozhoˇn and Ghaﬀari : The ball carving algo-rithm of Rozhoˇn and Ghaﬀari [RG20] that produces the clusters of one color class runs in O (log n )rounds of the LOCAL model and the weak-diameter of each cluster is bounded by O (log n ).Theorem 2.2 improves these two bounds to O (log n ) and O (log n ), respectively. In the origi-nal algorithm, at each point in time, a node in S is either living or dead. Once a node is dead,it remains dead. Each living vertex is part of some cluster at every point in time, where eachcluster is simply some set of vertices that changes over time. At the beginning of the algorithm,each node forms a singleton cluster and the ID of that cluster is simply the b -bit identiﬁer of thenode. Throughout the algorithm, new nodes might join a given cluster, whereas other nodes mightleave the cluster in order to join diﬀerent clusters or because they got killed. The ID of the clusterdoes not change throughout the algorithm. A cluster might also cease to exist if all of its nodeseither got killed or decided to join a diﬀerent cluster. After the algorithm terminates, at least halfof the vertices in S are still alive. Moreover, each cluster is the union of one or more connectedcomponents in the graph induced by all the alive vertices. That is, there are no two neighboringnodes that are contained in diﬀerent clusters.The algorithm consists of b phases. The following is a a crucial invariant of the algorithm: atthe end of the i -th phase, two neighboring clusters have the lowest i bits of their ID in common.To preserve this invariant at the end of the i -th phase, given that it holds at the end of the i − i -th phase based on their i -th bit.That is, if the i -th bit of the identiﬁer is equal to 1, we refer to a cluster as a blue cluster andotherwise, that is if the i -th bit is equal to 0, we refer to a cluster as a red cluster. During the i -th phase, blue clusters can only grow, whereas red clusters can only shrink. At the end of the i -th phase, no blue cluster is neighboring with a red cluster. This suﬃces to preserve the invariant.Each phase consists of multiple steps. In each step, each node contained in a red cluster simplyremains in the red cluster if it is not neighboring with any node in a blue cluster. Otherwise, thenode in the red cluster proposes to join an arbitrary neighboring blue cluster. Thus, each bluecluster receives a certain number of proposals from neighboring nodes in red clusters. If the totalnumber of proposals is at least a 1 / (2 b )-fraction of the size of the blue cluster, all the proposingnodes join the blue cluster. Otherwise, the blue cluster decides to kill all proposing nodes and thusthe blue cluster is not neighboring with any red cluster. The total number of killed vertices in eachof the b phases is at most a 1 / (2 b )-fraction of the total number of nodes in S . Hence, throughoutall of the b phases, at most half of the vertices get killed. Moreover, each time a blue cluster doesnot kill all the proposing red nodes, its size increases by a (1 + 1 / (2 b ))-factor. Thus, after j such5teps, the size of the blue cluster is at least (1 + 1 / (2 b )) j . As the size of each cluster is triviallybounded by n , each blue cluster can grow for at most O ( b log n ) steps and hence all blue clusters getseparated from neighboring red clusters in at most O ( b log n ) steps. Hence, each of the b = O (log n )phases consists of O ( b log n ) = O (log n ) steps. As the weak diameter of each cluster grows by atmost 2 in each step, this directly implies that the weak diameter of each cluster is bounded by O ( b · b log n ) = O (log n ). Every single step can be implemented in O (log n ) rounds of the LOCAL model, resulting in an overall round complexity of O (log n ) in the LOCAL model.

Improved Version : Next, we discuss on an intuitive level our improved algorithm comparedto the original algorithm of Rozhoˇn and Ghaﬀari [RG20]. Let us start with their algorithm andsimply reduce the number of steps in each phase of the algorithm from O ( b log n ) down to O ( b ).What would be the issue? The problem is that then, at the end of the phase, there might stillbe blue clusters neighboring red clusters. However, each such blue cluster would have grown by a(1 + 1 / (2 b ))-factor for all of the O ( b ) steps in the phase, resulting in a constant factor increase ofthe cluster size. In some sense, this can also be seen as progress, as a constant factor growth canhappen at most O (log n ) times, at least if we assume that a cluster never shrinks (which it can).Alas, even assuming shrinking does not happen, the crucial invariant that after the i -th phase, theIDs of two neighboring clusters agree on the i least signiﬁcant bits does not hold anymore.We need, hence, a reﬁned invariant. First, at each point in time, a given cluster C is in some level lev( C ) that ranges from 0 to b . The level is measuring the progress of a cluster in disconnectingitself from the other clusters; importantly, it is an individual measure for each cluster, whereas inthe previous algorithm, this progress was measured for all clusters globally, by enforcing that atthe end of the i -th phase, all clusters agree on the i least signiﬁcant bits in their identiﬁer. Ournew invariant, whose full statement is deferred to Section 2.4, implies that the identiﬁers of twoneighboring clusters C and C ′ agree in the min(lev( C ) , lev( C ′ )) least signiﬁcant bits. For the purposeof this explanatory section, we call this property the level invariant .Note that if at the end of the algorithm, each cluster is in level b , there are no two neighboringclusters, as desired. Furthermore, we would recover the old invariant if we assume that the levelof each cluster increases by exactly one in each phase. But not every cluster’s level will increase ineach phase. Instead, in a given phase, the level of a cluster either increases by one or some otherprogress property happens: the cluster signiﬁcantly “grows” in terms of the number of vertices thatjoined the cluster. Growing Rule and Preserving the New Invariant : We now describe our new algorithmin more detail: it has O ( b + log n ) phases, each consisting of O ( b + log n ) steps. In each step,some vertices are proposing to join new clusters, according to the following rule. Recall that inthe previous algorithm [RG20], in phase i , vertices of clusters with the i -th bit equal to 0 wereproposing to join neighboring clusters with the i -th bit equal to 1. Similarly, in our algorithm,vertices contained in some cluster C that are neighboring with a cluster C ′ having the same level as C are proposing to join C ′ if the (lev( C ) + 1)-th bit of the identiﬁer of C is 0, while the respectivebit in the identiﬁer of C ′ is 1. However, there is one more rule: if a vertex of C neighbors with acluster having a strictly smaller level than C , it prefers to propose to one such neighboring cluster C ′ having the smallest level among all such neighboring clusters. As in the previous algorithm, ifa suﬃcient amount of nodes propose to C , it decides to accept all proposals, while if there are notenough proposals, it kills proposing vertices, “stalls” until the end of the phase and at the end ofthe phase increases its level.The rule that a smaller level cluster C is “eating” its higher level neighbor C ′ is to enforce ourlevel invariant: we know that the two clusters C and C ′ , with C having a strictly smaller level,agree on their lev( C ) least signiﬁcant bits. This invariant can fail once C decides to increment its6evel. Hence, to justify going to the next level, C also deletes the boundary with all higher-levelneighboring clusters. The level invariant follows from this new rule. The formal proof (of a moregeneral invariant) is postponed to Section 2.4. Bounding the Number of Growing Steps : A crucial step in the analysis of the previousalgorithm [RG20] is to argue that in each phase, each cluster can grow for at most O ( b log n ) stepsby a multiplicative factor of 1 + Θ(1 /b ); otherwise, the cluster necessarily contains all the verticesof the graph. In our case, the picture is more complicated, as each cluster is eating the boundaryvertices of its higher-level neighbors, while it is simultaneously eaten by its lower-level neighboringclusters. The rule that a cluster C grows if the number of newly joined vertices is large with respectto the current number of vertices in C does not work anymore.To remedy this problem, in our algorithm, each cluster C possesses a certain number of tokensat every point in time. Initially, each cluster has a single token. During the course of the algorithm, C obtains one token for every node that joins it. However, C does not lose a token when a nodeleaves the cluster . Instead, C only loses tokens if it decides to kill all nodes proposing to it. In thatcase, C pays a certain number of tokens (to be described later) for every node it kills.Each cluster decides to accept all proposals if the number of proposing nodes is a Ω(1 / ( b +log n ))fraction of its current number of tokens. Otherwise, the cluster kills all the proposing nodes. Theparameters are set in such a way that the following holds: whenever a cluster is growing duringthe whole phase, the number of tokens it possesses at least doubles. On the other hand, if a clusteradvances to the next level during a phase, then the number of its tokens remains at least half ofwhat it was before (cf. Invariant 1 in Section 2.2). Notice that unlike this number of tokens, thesize of the cluster can drop arbitrarily. Either way, each cluster progresses during each phase interms of the number of tokens it possesses or by advancing to the next level.The ﬁnal ingredient is that each cluster can create at most O ( b + log n ) tokens by joining newclusters. This will be proven later on. It implies that all clusters ﬁnish, i.e. are in the highest level,after O ( b + log n ) phases (cf. Proposition 2.9). If that would not be the case, then the total numberof tokens an unﬁnished cluster would possess would exceed the total number of tokens that couldpossibly be created throughout the algorithm, a contradiction. Moreover, one can also show thatat most half of the vertices get killed during the algorithm (cf. Proposition 2.7). In this section we explain our algorithm for Theorem 2.2. Its analysis follows in Sections 2.3 and 2.4.

Construction outline : The construction has 2( b + log n ) = O (log n ) phases. Each phase has28( b + log n ) = O (log n ) steps. Initially, all nodes of G are living , during the construction someliving nodes die . Each living node is part of exactly one cluster. Initially, there is one cluster C v foreach vertex v ∈ V ( G ) and we deﬁne the identiﬁer id( C ) of C as the unique identiﬁer of v and useid i ( C ) to denote the i -th least signiﬁcant bit of id( C ). From now on, we talk only about identiﬁers ofclusters and do not think of vertices as having identiﬁers, though they will still use them for simplesymmetry breaking tasks. Also, at the beginning, the Steiner tree T C v of a cluster C v contains justone node, namely v itself, as a terminal node. Clusters will grow or shrink during the iterations,while their Steiner trees collecting their vertices can only grow. When a cluster does not containany nodes, it does not participate in the algorithm any more. Parameters of each cluster : Each cluster C keeps two other parameters besides its identiﬁerid( C ) to make its decisions: its number of tokens t ( C ) and its level lev( C ). The number of tokenscan change in each step – more precisely it is incremented by one whenever a new vertex joins C ,while it does not decrease when a vertex leaves C . The number of tokens only decreases when C t i ( C ) as the number of tokens of C at the beginning of the i -thphase and set t ( C ) = 1.Each cluster starts in level 0. The level of each cluster does not change within a phase i andcan only increment by one between two phases; it is bounded by b . We denote with lev i ( C ) thelevel of C during phase i . Moreover, for the purpose of the analysis, we keep track of the potentialΦ( C ) of a cluster C deﬁned as Φ i ( C ) = 3 i − i ( C ) + id lev i ( C )+1 ( C ). The potential of each clusterstays the same within a phase. Description of a step : In each step, ﬁrst, each node v of each cluster C checks whether itis adjacent to a cluster C ′ such that lev( C ′ ) < lev( C ). If so, then v proposes to an arbitraryneighboring cluster C ′ among the neighbors with the smallest level lev( C ′ ) and if there is a choice,it prefers to join clusters with id lev( C ′ )+1 ( C ′ ) = 1. Otherwise, if there is a neighboring cluster C ′ with lev( C ′ ) = lev( C ) and id lev( C ′ )+1 ( C ′ ) = 1, while id lev( C )+1 ( C ) = 0, then v proposes to arbitrarysuch cluster.Second, each cluster C collects the number of proposals it received. Once the cluster has collectedthe number of proposals, it does the following. If there are p proposing nodes, then they join C if and only if p ≥ t ( C ) / (28( b + log n )). The denominator is equal to the number of steps. If C accepts these proposals, then C receives p new tokens, one from each newly joined node. On theother hand, if C does not accept the proposals as their number is not suﬃciently large, then C decides to kill all those proposing nodes. These nodes are then removed from G . Cluster C pays p · b + log n ) tokens for this, i.e., it pays 14( b + log n ) tokens for every vertex that it deletes.These tokens are forever gone. Then the cluster does not participate in growing anymore, until theend of the phase and throughout that time we call that cluster stalling . The cluster tells that itis stalling to neighboring nodes so that they do not propose to it. At the end of the phase, eachstalling cluster increments its level by one.If the cluster is in level b − b , it will not grow anymore during thewhole algorithm, and we say that it has ﬁnished . Other neighboring clusters can still eat its vertices(by this we mean that vertices of the ﬁnished clusters may still propose to join other clusters).Whenever a node u joins a cluster C via a vertex v ∈ C , we add u to the Steiner tree T C as anew terminal node and connect it via an edge uv . Whenever a node u ∈ C is deleted or eaten by adiﬀerent cluster, it stays in the Steiner tree T C , but it is changed to a non-terminal node. Construction invariants : The construction is such that it preserves the following two invariants,as we formally prove in the next subsection.1. Invariant 1: At the beginning of each phase i , we have t i ( C ) ≥ i − i ( C ) − unless C is ﬁnished.2. Invariant 2: Whenever a node u changes its cluster during some step in phase i , say it goesfrom C to C ′ , it is the case that Φ i ( C ′ ) > Φ i ( C ). Whenever we go to the next phase, thepotential Φ( C ) of each cluster does not decrease, i.e., Φ i +1 ( C ) ≥ Φ i ( C ). In this subsection, we prove Invariants 1 and 2 and that they imply that our algorithm outputsclusters of O (log n ) weak-diameter, while deleting at most 1 / Proposition 2.3.

Invariant 1 is satisﬁed. That is, at the beginning of phase i , the current numberof tokens t i ( C ) satisﬁes t i ( C ) ≥ i − lev i ( C ) − , unless cluster C is ﬁnished. roof. At the beginning of phase 1, we have lev ( C ) = 0 and t ( C ) = 1, hence Invariant 1 is satisﬁed.Now ﬁx a phase i and a cluster C that is not ﬁnished at the end of the i -th phase. If the clusterdecided to go to the next level during this phase, we have at the beginning of the phase i + 1 thatlev i +1 ( C ) = lev i ( C ) + 1 and, moreover, for the number of tokens t i ( C ), we have t i +1 ( C ) ≥ t i ( C ) − ( | t i ( C ) | / (28( b + log n ))) · (14( b + log n )) = t i ( C ) / , because a given cluster can delete its boundary at most once in a given phase. Hence, by induction, t i +1 ( C ) ≥ t i ( C ) / ≥ i − i ( C ) − / i − i +1 ( C )+2 − / ( i +1) − i +1 ( C ) − . Otherwise, we know that lev i +1 ( C ) = lev i ( C ) and C was growing for all of the 28( b + log n ) stepsof phase i . Hence, the number of tokens t i ( C ) at the beginning of phase i + 1 satisﬁes t i +1 ( C ) ≥ (1 + 1 / (28( b + log n ))) b +log n ) t i ( C ) ≥ t i ( C ) . This implies by the induction hypothesis that t i +1 ( C ) ≥ · t i ( C ) = 2 · i − i ( C ) − = 2 ( i +1) − i ( C ) − . Proposition 2.4.

Invariant 2 is satisﬁed. That is, whenever node u changes its cluster duringsome step, say goes from C to C ′ , it is the case that Φ i ( C ′ ) > Φ i ( C ) . Moreover, whenever we go tothe next phase, we have Φ i +1 ( C ) ≥ Φ i ( C ) .Proof. If u goes from cluster C to some cluster C ′ , then it is either because lev i ( C ′ ) < lev i ( C ), orbecause lev i ( C ′ ) = lev i ( C ) and id lev i ( C )+1 ( C ) = 0 while id lev i ( C ′ )+1 ( C ′ ) = 1. In the ﬁrst case,Φ i ( C ′ ) = 3 i − i ( C ′ ) + id lev i ( C ′ )+1 ( C ) ≥ i − i ( C ) −

1) + id lev i ( C ′ )+1 ( C ) > i − i ( C ) . In the second case,Φ i ( C ′ ) = 3 i − i ( C ′ ) + id lev i ( C ′ )+1 ( C ′ ) > i − i ( C ) + id lev i ( C )+1 ( C ) . Whenever we go from phase i to phase i + 1, we haveΦ i +1 ( C ) = 3( i + 1) − i +1 ( C ) + id lev i ( C )+2 ( C ) ≥ i + 3 − i ( C ) + 1) ≥ i − i ( C ) + id lev i ( C )+1 ( C ) = Φ i ( C ) . Proposition 2.5.

Each node can change its cluster at most b + log n ) + 1 times.Proof. At the beginning of phase 1 of the algorithm each node u in a cluster C has Φ ( C ) ≥

0. Onthe other hand, during any phase i , if u ∈ C , thenΦ i ( C ) := 3 i − i ( C ) + id lev i ( C )+1 ( C ) ≤ i + 1 . Since the number of phases is equal to 2( b + log n ), we have Φ i ( C ) ≤ b + log n ) + 1 . Then, due toInvariant 2 (Proposition 2.4), this means that u changed its cluster at most 6( b + log n ) + 1 times,as whenever it changed its cluster, it went from C to C ′ such that C ′ satisﬁes Φ i ( C ′ ) > Φ i ( C ) andwhen a new phase starts, we have for all clusters C that Φ i +1 ( C ) ≥ Φ i ( C ). Proposition 2.6.

The total number of tokens generated by nodes throughout the algorithm is atmost | S | ( b + log n ) . roof. Each node generates a token at the very beginning of the algorithm and then it generatesone token whenever it changes its cluster. By Proposition 2.5, each node can generate at most6( b + log n ) + 1 tokens by changing a cluster. Hence, the total number of tokens generated is atmost | S | (6( b + log n ) + 2) ≤ | S | ( b + log n ). Proposition 2.7.

In the end, the number of deleted vertices is at most | S | / .Proof. Whenever a node is deleted from S , we permanently set aside 14( b + log n ) tokens. Hence,by Proposition 2.6, the total number of nodes deleted is at most7 | S | ( b + log n )14( b + log n ) = | S | / . Proposition 2.8.

Per step, the diameter of every Steiner tree T C grows additively by at most .Hence, in the end of the algorithm, the diameter of each graph T C and, therefore, the weak-diameterof each C , is bounded by O (log ( n )) . Moreover, each vertex of G is in at most O (log n ) diﬀerentSteiner trees T C .Proof. In one step of a phase, we increase the Steiner tree T C only by adding new leaves to it(though the fact that each vertex is added to T C at most once and hence it is a tree is provedonly in Proposition 2.11). We have O (log n ) phases and each phase has O (log n ) steps, hence thediameter of each T C is bounded by O (log n ), in the end. The last part follows from the fact thatwhenever a vertex u is added to a new Steiner tree, u changes its cluster. This can happen at most6( b + log n ) + 1 = O (log n ) times, by Proposition 2.5. Proposition 2.9.

At the end of phase i last = 2( b + log n ) , the level of each cluster C is equal tolev i last ( C ) = b , i.e., C is ﬁnished.Proof. The ﬁrst part follows from Invariant 1 (Proposition 2.3) as follows. Unless C is ﬁnished,Invariant 1 maintains that t i ( C ) ≥ i − i ( C ) − . This means that if C is still not ﬁnished at the endof the phase i last = 2( b + log n ), then we would have t i last ( C ) ≥ i last − ilast ( C ) − ≥ b +log n ) − b − ≥ n / , a contradiction with Proposition 2.6. In this subsection, we show that the ﬁnal clustering produced by the algorithm described inSection 2.2 satisﬁes that there are no two neighboring clusters. This is stated as the followingproposition.

Proposition 2.10.

At the end of the algorithm, resulting clusters are nonadjacent.

That is, once the algorithm terminates, there does not exist an edge with both endpoints beingalive and contained in diﬀerent clusters. We also prove the following fact.

Proposition 2.11.

Each vertex v is added at most once to each T C , hence, the graphs T C are trees. To that end, we deﬁne an invariant that holds throughout the execution of the algorithm andwhich implies the properties stated above. To deﬁne the invariant, we consider a ﬁxed 4( b + log n )-ary rooted tree (i.e., the branching factor is twice the number of phases) of depth b called the ranscript tree T , where the root is deﬁned to have depth 0. Throughout the course of thealgorithm, we map each non-empty cluster to one of the nodes in the tree T by a mapping π . Atthe beginning, each cluster simply maps to the root of T . A cluster only changes the node it mapsto when its level is increased, using the following rule. If a cluster C advances from level lev( C ) tolevel lev( C ) + 1 between phases i and i + 1, it is remapped to the (2 i + id lev ( C )+1 ( C ))-th child ofthe node it previously mapped to. Notice that for each non-root node of T , there is only one phasewhen new clusters can be mapped to it (if the node is the (2 i )-th or (2 i + 1)-th child, it is phase i ). From that time on, unless the node is a leaf node of T , the clusters are gradually reassignedto its children or completely deleted from T if they become empty. Notice that the current levelof each cluster is equal to the depth of the node that this cluster currently maps to. Finally, ourconstruction satisﬁes the following two properties: Observation 2.12.

The identiﬁers of all clusters that map to a given node at depth d agree on the d least signiﬁcant bits. Proposition 2.13.

Suppose that C is a stalling cluster. Then it does not neighbor with higher levelclusters and if id lev i ( C )+1 ( C ) = 1 , it does not neighbor with any cluster C ′ of the same level withid lev i ( C ′ )+1 ( C ′ ) = 0 .Proof. Whenever a cluster C deletes its boundary and starts stalling, each neighboring node u thatconsidered proposing to C , but did not, either proposed to a cluster of level strictly smaller thanlev( C ), or it proposed to a cluster C ′ in the same level, but then id lev( C ′ )+1 ( C ′ ) ≥ id lev( C )+1 ( C ).Then, u is either deleted, or it joins C ′ . So, a cluster C that starts stalling can be neighboringwith another cluster C ′ , but then the level of C ′ is either strictly smaller, or it is the same, butid lev( C ′ )+1 ( C ′ ) ≥ id lev( C )+1 ( C ).In the following steps, a node of C can be eaten by one of the neighboring clusters, but thisdoes not create new neighbors of C , or a connection with a diﬀerent cluster C ′′ is created by thatcluster eating a node of some neighboring cluster C ′ . However, C ′′ is either of smaller level than C ′ ,or it is the same level, but with id lev( C ′′ )+1 ( C ′′ ) ≥ id lev( C ′ )+1 ( C ′ ). Hence, this new connection is stillallowed.We now prove that the algorithm described in Section 2.2 satisﬁes the following crucial invariantthroughout the course of the algorithm. Fig. 1 might help to obtain a better intuition. Proposition 2.14.

Whenever two clusters C and C ′ are neighboring, then either π ( C ) is an ancestorof π ( C ′ ) —i.e., C is mapped to a node that lies on the unique path between the node C ′ maps to andthe root in T —or π ( C ′ ) is an ancestor of π ( C ) .Proof. We prove Proposition 2.14 by induction on the number of executed steps of the algorithm.We prove that it stays satisﬁed after every step of the algorithm, and also between any two phases,when stalling clusters go to the next level. We note that the property to prove holds at the beginningof the algorithm, since all the clusters are mapped to the root node of T .Next, ﬁx a step j of some phase i and assume that the property to prove is satisﬁed right at thebeginning of the step. We now consider some arbitrary edge { u, v } , where both u and v have notbeen deleted. In order to prove that the invariant holds after step j , it suﬃces to show that afterstep j , nodes u and v are not contained in two diﬀerent clusters such that none of the two clustersis an ancestor of the other cluster. Try saying it three times in a row. C C C C C C C C C C C C C C C C Figure 1: The ﬁgure captures a possible change in cluster mapping between the beginning of phase i andthe beginning of phase i + 1 of the algorithm, with focus on one node of the transcript tree T in depth d containing clusters C , C , C , C , C that are colored blue if their ( d + 1)’th bit is equal to 1 and red otherwise(i.e., red vertices are proposing to blue clusters). Two clusters are connected by an edge in the ﬁgure if theyare neighboring.The cluster C is eating clusters C and C (by this we mean their vertices propose to C ) and during thephase, it decides to delete its boundary with C and C and to go to the next level d at the end of the phase– it is reassigned to a node of T in depth d . The cluster C is eating C during the whole phase and it willcontinue eating it even in the next phase. The cluster C is eating C , until it decides to delete its boundarywith it and to go to the next level d + 1. The cluster C is eating C , C , C and later in the phase also C .All vertices of C leave that cluster at some point during this phase so the whole cluster is dissolved and wedo not map it to T any more. The cluster C is eating C and later in the phase also C , after C decides todelete its boundary to C . u ∈ C u or v ∈ C v , respectively, proposes to some cluster C ′ u or C ′ v ,respectively, by the induction hypothesis, π ( C ′ u ) is an ancestor of π ( C u ) (possibly, π ( C ′ u ) = π ( C u ))and similarly we have that π ( C ′ v ) is an ancestor of π ( C v ). By the induction hypothesis, we also knowthat either π ( C u ) is an ancestor of π ( C v ), or the other way around. Putting these facts together,we get that either π ( C ′ u ) is the ancestor of π ( C ′ v ), or the other way around, as desired.Second, we show that the property stays satisﬁed between two phases i and i + 1. We againconsider an arbitrary edge { u, v } with u ∈ C u and v ∈ C v . If neither u nor v stalled, there is nothingto prove. If both C u and C v stalled, by Proposition 2.13, we have lev i ( C u ) = lev i ( C v ) = lev andid lev+1 ( C u ) = id lev+1 ( C v ). By the induction hypothesis, π ( C u ) = π ( C v ), hence both C u and C v areremapped to the same node of the transcript tree T between the two phases. If u stalled but v didnot, by Proposition 2.13 and the induction hypothesis, π ( C v ) is an ancestor of π ( C u ). Hence, afterremapping C u to one of the children of the node it previously mapped to, the induction hypothesisis still satisﬁed.Now, we are ready to prove Proposition 2.10 and Proposition 2.11. Proof of Proposition 2.10.

By Proposition 2.9, at the end of the algorithm, all resulting clustersare in level b . Hence, by Proposition 2.14, two adjacent clusters need to map to the same node of T at depth b . However, by Observation 2.12, the two clusters then agree on their identiﬁers, whichis a contradiction with their uniqueness. Proof of Proposition 2.11.

Fix some T C and a vertex u that was added to C at some point duringthe algorithm. Suppose u leaves C and joins some cluster C ′ . We prove that u cannot join C in thefuture. First, suppose C ′ is currently in strictly smaller level than C . Then we claim u cannot join acluster from the subtree of π ( C ), and C in particular, anymore. This is because clusters cannot beremapped to π ( C ) anymore and clusters from the subtree of π ( C ) do not have any connections toother clusters beside clusters in the path from π ( C ) to the root, by Proposition 2.14. But verticesin those clusters never propose to clusters in the subtree of π ( C ), since they have a smaller level.Similarly, if u leaves C and joins a cluster C ′ that is currently in the same level d , by Proposition 2.14we have π ( C ) = π ( C ′ ) and id d +1 ( C ) = 0 while id d +1 ( C ′ ) = 1. Whenever u is later eaten by a clusterwith strictly smaller level than d or C goes to the next level, we argue as in the previous case.Otherwise, after C ′ deletes its boundary to C and starts stalling, we have that C ′ cannot becomeadjacent to C during this phase and this holds also during next phases, since, by induction, C caneat only vertices from some other branches of the subtree of π ( C ) than the branch of π ( C ′ ) andclusters in those branches are not adjacent to π ( C ′ ) by Proposition 2.14. Hence, each vertex isadded to T C at most once and T C is a tree. We are now ready to wrap up the analysis of our distributed ball carving algorithm and presentthe proof of Theorem 2.2.

Proof of Theorem 2.2 .

The total number of deleted nodes is at most | S | / O (log n ) and each edge is inat most O (log n ) Steiner trees by Proposition 2.8.Finally, we bound the running time. In the LOCAL model, it is bounded by O (log n ), sincethe algorithm has O (log n ) phases, each having O (log n ) steps and each step can be implemented13n the number of rounds proportional to the weak diameter of each cluster, which is bounded by O (log n ).In the CONGEST model, we ﬁrst verify that an O (log n ) upper bound holds because each stepcan be implemented in O (log n ) rounds as follows: First, every step starts by nodes proposingto join a neighboring cluster, provided there is a suitable one. This step is implemented in two CONGEST model rounds. Second, each root of the Steiner tree C needs to collect how many nodesare proposing to the cluster. Since each edge is contained in O (log n ) Steiner trees and the diameterof each Steiner tree is O (log n ), this can be done in O (log n ) steps. Finally, the cluster C needs todecide whether it will grow or not and this information is then broadcasted via T C to all proposingnodes. This can again be done in O (log n ) rounds.Using Corollary 5.3 from Section 5, we can speed up the aggregation of the summation and thebroadcast in every cluster so that it runs in parallel for all the clusters in O (log n ) rounds. Thisrecovers the same round complexity of O (log n ) for the CONGEST model, matching that of the

LOCAL model.

As two prominent examples of applications, below we mention how we obtain O (log n ) rounddeterministic CONGEST model algorithms for the maximal independent set and ∆ + 1 coloringproblems. These improve on the O (log n )-round LOCAL model and O (log n )-round CONGEST model algorithms of Rozhoˇn and Ghaﬀari [RG20]. We note that similar polynomial improvementshappen for all other applications of network decompositon, many of which are discussed in [RG20].

Corollary 2.15.

There is a deterministic distributed algorithm, in the

CONGEST model, thatcomputes a maximal independent set in O (log n ) rounds.Proof Sketch. We process the color classes of the network decomposition, one by one. When pro-cessing clusters of color i , ﬁrst, we remove each vertex that is adjacent to a node that is alreadyin the MIS. Then, for each cluster, we run the deterministic MIS algorithm of Censor-Hillel etal. [CHPS17], which computes an MIS in O ( D log n ) rounds in any n -node graph of diameter D .Since each cluster has weak diameter O (log n ), running this algorithm in one cluster would bedoable in O (log n ) rounds. Running the algorithm for diﬀerent clusters needs more care, as theirSteiner trees are not edge disjoint: The MIS algorithm of Censor-Hillel et al. [CHPS17] is based onderandomizing the O (log n ) round algorithm of [Gha16]. They observe that each round needs onlypairwise independence, which thus means only O (log n ) bits of randomness. Then, these bits areﬁxed one by one, using the method of conditional expectation. To perform this, the key step is todetermine how to ﬁx each single bit (conditioned on the bits ﬁxed so far). For that, each node needsto compute (a certain pessimistic estimator of) the probability of it being in the MIS or neighboringan MIS node, under the two possibilities of the single randomness bit that we are examining. Thisis done via 1 round of communication with the neighbors in the MIS problem, and that part wecan easily do in our setting as the nodes of diﬀerent clusters are disjoint (even though their Steinertrees are not). Then, the algorithm of Censor-Hillel et al. [CHPS17] aggregates the sum of theseprobability estimators, using a convergecast on the global BFS tree of the network, with depth D ,in D rounds. To perform this part, we make each cluster use its Steiner tree. These Steiner treesare not disjoint, but fortunately, each vertex is in at most O (log n ) Steiner trees. Hence, we canapply the pipelining of Corollary 5.3, which allows us to aggregate the summations for diﬀerentclusters in parallel, in O (log n + log n ) = O (log n ) rounds. Once these sums are gathered atthe center, it can be decided how to ﬁx this one bit of the randomness of this round of [Gha16],and we can proceed to the next bit. There are O (log n ) rounds and we need to ﬁx O (log n ) bits14or each. Hence, overall, the round complexity of computing an MIS for each cluster of one colorclass, all at the same time, is O (log n ) rounds of the CONGEST model. This is the complexityfor one color class of the decompositon. Since the decomposition has O (log n ) colors, the overallcomplexity of solving MIS, given the network decomposition, is O (log n ). When put together withthe O (log n ) round complexity needed for computing the decomposition via Theorem 2.1, we havea deterministic MIS algorithm that runs in O (log n ) rounds. Corollary 2.16.

There is a deterministic distributed algorithm, in the

CONGEST model, thatcomputes a ∆ + 1 coloring, where ∆ is an upper bound on the maximum degree, in O (log n ) rounds.Proof. The proof is similar to the MIS result, with only one exception: when solving the problemin each cluster, instead of the

CONGEST -model MIS algorithm of Censor-Hillel et al. [CHPS17],we apply the

CONGEST -model list-coloring algorithm of Bamberger et al. [BKM20].

In this section, we explain how one can obtain a much milder dependence of the round complexityon the length of identiﬁers (and bit capacity of each edge) b . Speciﬁcally, the round complexitypoly( b · log n ) is improved to (log ∗ b ) · poly(log n ). Note that in the LOCAL model, i.e., withoutconstraints on the capacity of edges, this is a direct implication of distance coloring (cf. Remark2.10 in [RG20]).In standard, deterministic, applications, we have b = Θ(log n ), so we do not get an improvementover the previous formulation of the algorithm. However, in the shattering framework, we have N = O (log n ) and b = Θ(log n ), so we get an improved complexity from poly(log n ) down topoly(log log n ).In this section, the idea of our improvement is explained by modifying the algorithm of [RG20]explained in Section 2.1. The complexity of their algorithm is O ( b log n ) and we show how tochange it to O (log n + (log ∗ b ) · log n ). In Section 4, we improve the round-complexity of thealgorithm from Theorem 2.1 from O ( b log n ) to O (log n + (log ∗ b ) · log n ). Lemma 3.1.

Consider a graph G = ( V, E ) that has no isolated vertices and where each nodehas a b -bit identiﬁer. There is a deterministic distributed algorithm in the CONGEST model that,in O (log ∗ b ) rounds, colors the vertices of V blue or red such that each color has at most | V | / vertices.Proof. Let each node v choose one of its edges in G arbitrarily, and indicate this as an outgoingedge from v . Let H be the spanning subgraph of G deﬁned by the set of all chosen edges. Call avertex u heavy if its in-degree in H is at least 10, and light otherwise. Since H has at most | V | outgoing edges, there are at most | V | /

10 heavy vertices. Let H ′ be the subgraph of H induced bylight vertices. We handle vertices of H ′ in two categories of isolated and non-isolated vertices.(A) Light vertices that are isolated in H ′ must have their chosen outgoing edge connect to aheavy vertex. These outgoing edges deﬁne stars, at most one centered on each heavy vertex. Eachheavy vertex computes a coloring of itself and all the isolated light edges that point to it, such thatthe number of colors in the star diﬀer by at most 1. This way, we have a discrepancy—i.e., theabsolute diﬀerence in the number of nodes of the two colors—of at most 1 in each star, and thusoverall a discrepancy of at most | V | /

10. 15B) Non-isolated vertices of H ′ form a graph with minimum degree at least 1 and maximumdegree at most 11. Compute a maximal independent set S of ( H ′ ) — that is, the graph on verticesof H ′ where we connect two of them if their distance is at most 2 in H ′ —in O (log ∗ b ) rounds, usingLinial’s classical algorithm [Lin87]. Then, each node of H ′ that is not in S chooses the closestnode in S as its cluster center. Since we have a maximal independent set of ( H ′ ) , each node hasa cluster center within distance 3 in H ′ . Moreover, each cluster has at least two vertices, i.e., thecluster center and all of its neighbors, which is at least one neighbor. Each node in S computesa coloring of the vertices of its own cluster, in a manner that the number of colors in the clusterdiﬀer by at most one. We have no cluster with a single vertex. Each cluster with 2 vertices has nodiscrepancy and each cluster with 3 or more vertices has discrepancy at most 1. This means, thediscrepancy in the coloring of H ′ is at most | V | / | V | (1 / /

10) = 13 | V | / . Therefore, each color has at least 17 / | V | > | V | / Lemma 3.2.

Consider a cluster graph where no cluster is isolated, and each cluster has a unique b -bit identiﬁer. Moreover, each cluster C has a Steiner tree T C of diameter R , such that each nodeis in at most O (log n ) of these Steiner trees. There is a deterministic distributed algorithm in the CONGEST model with O ( b ) -bit messages that, in O (( R + log n ) · log ∗ b ) rounds, colors the clustersblue or red such that each color has at most a / fraction of the clusters.Proof. We follow an approach similar to Lemma 3.1, but we have to deal with two issues: (1) nodesare replaced with clusters of weak-diameter R , (2) the Steiner trees of the clusters are not disjoint,and each node can be in up to O (log n ) Steiner trees. Selecting An Outgoing Edge Per Cluster : First, we select one outgoing edge for each cluster, inthe cluster graph. For that, any two neighbors exchange their cluster identiﬁer, in one round. Then,any node w in a cluster C that is neighboring some node w ′ in another cluster C ′ creates a proposedoutgoing edge hC ′ .ID, w.ID, w ′ .ID i . We then convergecast the minimum of these proposals tothe root of the cluster C . We do this for all the clusters at the same time, in O ( R + log n )rounds, using the pipeling of Corollary 5.3. At the end, the center of C knows the winning proposal hC ′ .ID, w.ID, w ′ .ID i that connects it to some other cluster C ′ . In this case, the outgoing edge inthe cluster graph is C → C ′ , and we consider the edge w → w ′ as the physical embodiment of thisoutgoing edge. By performing a broadcast in each cluster, and all clusters at the same time, we caninform all nodes of the cluster of the selected single outgoing edge, in O ( R + log n ) rounds, usingthe pipelining of Corollary 5.3. In particular, node w learns that its edge { w, w ′ } is selected as theoutgoing edge w → w ′ of its cluster. It can also inform w ′ about this, in one additional round. Identifying Light and Heavy Clusters : We call a cluster heavy if it has at least 10 incomingedges, and light otherwise. Our next task is to inform each cluster whether it is heavy or light. Foreach cluster C ′ , each node w ′ ∈ C ′ that has an incoming edge w ′ ← w from another cluster starts amessage describing this edge as hC .ID, w ′ .ID, w.ID i . We then convergecast all of these incomingedge messages in each cluster, or at most 11 of them, if there are more. This can be done for allclusters at the same time in O ( R + log n ) rounds, using the pipelining of Corollary 5.3. At the end,each cluster center knows whether it has more than 11 incoming edges or not, i.e., whether it isheavy or not. Moreover, every light node knows all of its incoming edges. Using one broadcast percluster, by Corollary 5.3, we can also inform all nodes of the cluster whether the cluster is heavyor light, and about all the incoming edges if it is light, in O ( R + log n ) additional rounds. Coloring Non-Isolated Light Clusters : Consider all the incoming and outgoing edges as undi-rected edges, and consider the subgraph H made of light clusters who have at least one such edge.16y repeating the above communication scheme, we can identify all such clusters and in fact imple-ment one round of the CONGEST model on the graph H ′ , in O ( R + log n ) rounds of communicationon the base graph. At this point, it is easy to follow the steps of Lemma 3.1 to color light clustersof H ′ : we compute an MIS of H , in O (( R + log n ) log ∗ b ) rounds, and then each MIS cluster C hasto determine the red/blue colors of itself and its neighboring clusters. It does so in a way that thediscrepancy between the number of red and blue colors that C gives out is at most 1. Coloring Heavy Clusters, and their Incoming Isolated Light Clusters : What is left iscoloring each heavy cluster C , as well as all the light clusters isolated in H ′ and whose selectedoutgoing edge was therefore to a heavy cluster. Each cluster C does this on its own, for itself,and all such light clusters that have an outgoing edge to C . First, we initiate a token (carrying O (1) bits), at the physical embodiment of every such incoming edge. We also start one token atthe root of the heavy cluster. Then, we convergecast these tokens on the Steiner tree of C , in asynchronized manner from depth R to the root. That is, we start with nodes of depth R , theysend their tokens to nodes of depth R − R − O ( R + log n ) rounds, by allocating O (1) bits of the messages of each round to each of the Steinertrees that includes the edge. Notice that this is possible as we have b = Ω(log n )-bit messagesand each node is in at most O (log n ) trees. Now, for each Steiner tree, every time that a node v on this Steiner tree receives some tokens from its children, node v pairs the tokens up with eachother in pairs of two, except for leaving at most one token not paired if the number is odd. Tokensthat are paired are sent backward along the same tree, from v to the physical incoming edge thatinitiated the token. In each pair, one token carries color red and the other carries token blue. If thenumber of tokens that v had received was odd, then it forwards the one remaining unpaired tokento its parent in the Steiner tree, in the next round. If a token is left unpaired at the root, we colorit arbitrarily. After performing this for 2 R rounds, all tokens are paired up, with the exceptionof at most one token in the case their number is odd. Moreover, they have arrived back at theincoming endpoint of the physical incoming edge. Then, using one additional round we can sendthe color to the other endpoint of the physical incoming edge, and using another convergecast ineach cluster, we can inform the center of each light cluster (that had no neighbor in H ′ ) of the colorthat it received in this scheme, in O ( R + log n ) rounds, for all clusters at the same time, using thepipelining of Corollary 5.3. This concludes the description of the procedure that implements thebalanced coloring algorithm of Lemma 3.1 on the clusters, in O (( R + log n ) log ∗ b ) rounds. Remark 3.3.

Any deterministic

LOCAL -model algorithm for balanced coloring needs

Ω(log ∗ n ) rounds, even on a cycle.Proof. Suppose for the sake of contradiction that there is a deterministic algorithm A that onany n -node cycle with O (log n )-bit identiﬁers, in T ≤ (log ∗ n ) /

100 rounds, computes a balancedcoloring, such that at most 3 / / n separate n -node graphs, where the i th one has identiﬁers in [( i − · n + 1 , i · n ]. By Linial’s well-known lower bound [Lin87], we know that on each cycle, there is a conﬁguration of the identiﬁerssuch that algorithm A , when run on that cycle with those identiﬁers, colors some consecutive set ofat least H ≥ (log ∗ n ) / A to a 4-coloring, by processing each consecutive monochromatic pathin time at most H and computing a 2-coloring of its vertices. This would result in a 4-coloringof the cycle in H + T ≤ (log ∗ n ) / H ≥ log ∗ n/ H that are colored monochromatically, for each cycle. We callthese monochromatic paths . Now, we have n monochromatic paths, one for each cycle, and thusat least n/ n/H ≪ n/ n . If we run A onthis new cycle, with running time at most (log ∗ n ) / ∗ n ) /

100 of the other paths may notice that they are not intheir original cycle. Hence, at most (log ∗ n ) /

50 nodes switch their cycle per path. That is a totalof at most n log ∗ n/ · log ∗ n = n nodes. Hence, we have at most n/

10 red nodes. Hence, on a certain n node cycle with ID assignments from { , . . . , n } , algorithm A fails to compute a coloring withat most 3 / A having round complexity T ≤ (log ∗ n ) / / ∗ n ). Similar lower bound holdsfor any other constant balance requirement. Next, we show how to incorporate Lemma 3.1 in the algorithm of Rozhoˇn and Ghaﬀari [RG20].This implies the following theorem, which provides a decomposition that, compared to the originalalgorithm of Rozhoˇn and Ghaﬀari, has a much better dependency on the number of bits in theidentiﬁers.

Theorem 3.4.

Consider an arbitrary graph G on n nodes where each node has a unique b -bitidentiﬁer, where b = Ω(log n ) . There is a deterministic distributed algorithm that computes anetwork decomposition of G with O (log n ) colors and weak-diameter O (log n ) in O (log n +(log ∗ b ) · log n ) rounds of the CONGEST model, using O ( b ) -bit messages.Moreover, for each cluster C of vertices, we have a Steiner tree T C with radius O (log n ) in G ,for which the set of terminal nodes is equal to C . Each vertex of G is in O (log n ) Steiner trees ofany given color out of the O (log n ) color classes.Proof. We show how to incorporate Lemma 3.1 in the algorithm of Rozhoˇn and Ghaﬀari [RG20].Note that their algorithm was explained in Section 2.1. Recall that in the i -th phase of theiralgorithm, each cluster is given a color based on the i -th bit of its identiﬁer. After the phase,clusters of diﬀerent colors are disconnected and will never be connected again.Now in each phase i , instead of coloring based on the i -th bit, we invoke Lemma 3.1 to geta coloring such that in each connected component of clusters consisting of at least two clusters,at most 3 / / / n , at the end ofthe algorithm, each connected component of clusters contains only one cluster.The dependence on the number of bits in the algorithm of Rozhoˇn and Ghaﬀari comes from thefact that we need b phases. In particular, their algorithm needs b phases, each with O ( b log n ) steps,and as such, it computes a (log n, b log n ) weak-diameter network decomposition in O ( b log n )rounds.Using the balanced coloring scheme, we can now set b to log / n , and thus get a (log n, log n )weak-diameter network decomposition in O (log n ) rounds, modulo that we also need to spend18 (log n log ∗ b ) additional rounds in each phase to compute the coloring, using Lemma 3.1. Hence,the overall round complexity of the algorithm is O (log n + (log n ) · log ∗ b ) rounds. Remark 3.5.

The above Theorem 4.1 shows that in order to construct a network decomposition inthe

CONGEST model, we do not need to assume O (log n ) bit unique identiﬁers, but instead it suﬃcesto have a port numbering of the edges and access to an oracle that colors a locally constructed graphof constant degree with constantly many colors. We now present two corollaries of the above statements that are later improved in Section 4.

Corollary 3.6.

There is a randomized distributed algorithm that computes a maximal independentset in O (log ∆ · log log n + log log n ) rounds of the CONGEST model, with high probability.Proof.

First, we run the randomized MIS algorithm of Ghaﬀari [Gha16] for O (log ∆) rounds. Asproven in [Gha16, Lemma 4.2], this algorithm computes an independent set S such that, afterremoving all nodes of S and those that have a neighbor in S from the graph, we are left with“small” connected components, with high probability. Here, small components shows that (A)each component has at most O (∆ log n ) nodes, (B) any 5-independent set in each component —a set where any two nodes have distance at least 5 — has size at most O (log n ).At this point, we run the CONGEST model randomized ruling set algorithm of Ghaﬀari [Gha19,Lemma 2.2] which computes a (6 , O (log log n )) ruling set of each component, in O (log log n ) rounds,with high probability. That is, for each component C , we get a ruling set T such that (I) eachtwo vertices of the ruling set have distance at least 6 from each other, (II) each node v in thecomponent knows the closest node T to itself (ties broken arbitrarily) and that node is withindistance O (log log n ). This induces a clustering of the component, i.e., a partitioning of all verticesinto disjoint clusters, each with radius O (log log n ): there is one cluster for each node v ∈ T and itincludes all nodes u in the component for which v is the closest node in T to u .Now, we run the network decomposition algorithm of Theorem 3.4 on the cluster graph whereeach virtual vertex is a cluster of diameter O (log log n ) around u ∈ T . This runs in O (log log n )rounds; the additional slowdown of O (log log n ) comes from the fact that each vertex of the clustergraph is actually a cluster of strong diameter O (log log n ). The fact that the whole construction stillworks is veriﬁed in Remark 4.2. We get a partition of the cluster graph into vertex-disjoint clusters,each with weak-diameter O (log log n ). In the original graph, this means clusters of weak-diameter O (log log n ), colored with O (log log n ) colors and such that adjacent clusters have diﬀerent colors.We now process the color classes of the network decomposition one by one, and compute theMIS for each of them separately. When we process a color, each cluster of that color worksindependently, as follows: we ﬁrst remove nodes of the cluster that already have a neighbor in theMIS. Then, we run O (log n ) independent instances of the MIS algorithm of Ghaﬀari [Gha16], eachfor R = O (log ∆ + log log n ) rounds, on this cluster. We note that since this algorithm works withsingle-bit messages, we can run O (log n ) independent instances of it in parallel in the CONGEST model, with no round complexity overhead. The analysis of this algorithm [Gha16, Theorem 4.2]shows that in each run, each node is either in the computed MIS or has a neighbor in it, withprobability at least 1 − − Θ( R ) . Since the cluster, and even the entire component, has at most N = O (∆ log n ) nodes, each run succeeds to compute a correct MIS with probability at least1 − N − Θ( R ) = 1 / (∆ log n ) . Then, we locally check each run to see if it produced a correct MIS,again using one-bit messages, so that all runs can be checked in parallel. Finally, we aggregateover a breadth ﬁrst search tree of the cluster whether each run was successful or not, again using a19ingle bit indicator for each run. Since we have O (log n ) runs, at least one is successful, with highprobability. Since the diameter of the cluster is O (log log n ), we can aggregate these indicators in O (log log n ) additional rounds. We pick one successful run, add the computed MIS to the overallindependent set, and we can then proceed to the next color of the decomposition.Processing each color takes O (log ∆ + log log n ) rounds. Since we have O (log log n ) colors inthe decomposition, the round complexity of computing the MIS atop the given decomposition is O (log ∆ · log log n + log log n ). We also spent O (log log n ) rounds to compute the decomposition,which makes the overall round complexity O (log ∆ · log log n + log log n ).We get a similar result for ∆ + 1 coloring: Corollary 3.7.

There is a randomized distributed algorithm, in the

CONGEST model, that computesa ∆ + 1 coloring in any n -node graph with maximum degree at most ∆ in O (log ∆ + log log n ) rounds, with high probability.Proof sketch. The proof follows in a similar manner as the proof of Corollary 3.6, by incorporatingthe network decomposition that uses balanced coloring into the

CONGEST -model shattering-basedcoloring algorithm of [Gha19, Theorem 1.3].

In this section, we show how to put the two improvements of Section 2 and Section 3 together.The main result is that a network decomposition can be constructed with a round complexity of O (log n + log n log ∗ b ). Here, we prove a formal version of Theorem 1.4.

Theorem 4.1.

Consider an arbitrary graph G on n nodes where each node has a unique b -bit iden-tiﬁer, where b = Ω(log n ) . There is a deterministic distributed algorithm that computes a networkdecomposition of G with O (log n ) colors and weak-diameter O (log n ) , in O (log n + (log ∗ b ) log n ) rounds of the CONGEST model with b -bit messages.Moreover, for each cluster C of vertices, we have a Steiner tree T C with radius O (log n ) in G ,for which the set of terminal nodes is equal to C . Each vertex of G is in at most O (log n ) Steinertrees of each color.Proof.

We explain how to adapt the algorithm from the proof of Theorem 2.1 by using balancedcoloring from Lemma 3.1.We describe what needs to be changed in the description of the algorithm from Section 2.2.First, the number b is not deﬁned as the number of bits, but as b = 1 + log / n . At the beginningof each phase i , clusters in each level d will run the algorithm from Proposition 4.3 to compute apartial red and blue coloring of clusters; this has one exception, namely clusters that are in thesame level as they were during the previous phase and which were, hence, already considered bythe partial coloring of Proposition 4.3. These clusters already ran this algorithm for their currentlevel during some previous phase and they retain their color from that previous run (if they werecolored). The parameter h Prop 4.3 is set such that h log n ≥ n + b ) .The computed color of a cluster C plays the same role in this phase as the bit ℓ lev( C )+1 playsin the original algorithm, i.e., if u and v are neighboring nodes such that clusters C u and C v havethe same level, u will consider proposing to v to join C v if C u is colored red and C v is colored blue.20s we will shortly see, although Proposition 4.3 only outputs a partial coloring, it guarantees thatuncolored clusters will not neighbor with a cluster in the same level at any point in time duringthe current phase, so the fact that not all clusters are colored does not matter for the descriptionof the algorithm.We now describe how to adapt the analysis of Theorem 2.2 to the new algorithm. First, theanalysis from Section 2.3, i.e., the proof of the facts that we delete at most 1 / O (log n ) and they have an accompanied Steiner tree ofdiameter O (log n ) such that each vertex is in O (log n ) Steiner trees, stays the same.The round complexity of the network decomposition construction is O (log n + (log ∗ b ) · log n ).The ﬁrst term comes from the analysis of Theorem 2.1, while the second term comes from thefact that in each of the O (log n ) phases to construct one of the O (log n ) colors of the resultingdecomposition, we need to construct a balanced coloring via Proposition 4.3, with R = O (log n ).What remains to be argued is that the resulting clusters are non-adjacent and their Steiner treesare correctly formed, i.e., we will conclude by showing how to adapt the proof of Proposition 2.10and Proposition 2.11 from Section 2.4. We will slightly change the deﬁnition of the transcript tree T : each non-leaf vertex of T does not just have 4( b + log n ) children, i.e., two times the number ofphases, but 6( b + log n ), i.e., three times the number of phases. After phase i , when a cluster C ,mapped to a node π ( C ) of T of depth lev( C ), decides to go to the next level, we assign it to the3 i -th, (3 i + 1)-th, or (3 i + 2)-th child of π ( C ), based on whether the cluster C was assigned a colorand if so, which color was assigned to it.The proof of Proposition 2.14 from Section 2.4 works after the following slight change: Wewill now observe that if a cluster C is left uncolored by Proposition 4.3 – we call such cluster isolated –, we know that it will not meet with a diﬀerent cluster of the same level during the restof the algorithm (this also shows the algorithm is correctly deﬁned). This is because, as we areproving Proposition 2.14 by induction, clusters mapped to the subtree of π ( C ) in the transcripttree T can, by induction, only eat vertices from clusters in that particular subtree during the next(2( b + log n )) · (28( b + log n )) ≤ h log n/ C can neverbe adjacent with a cluster on the same level, throughout the whole algorithm. Moreover, once anisolated cluster C goes to the next level, it will not eat vertices of other clusters anymore, as itis connected only to clusters of strictly smaller level. Hence, in future rounds, vertices of isolatedclusters can only propose to lower level clusters and join them or be deleted; whenever a lower levelcluster neighboring with C decides to go to the next level, it deletes its boundary with C and doesnot neighbor with it anymore. This means that Proposition 2.14 holds also in the new algorithm.Similarly, the proof of Proposition 2.11 readily generalizes.Finally, we observe that due to the balanced property of the coloring of Proposition 4.3, when-ever new clusters are mapped to some node r in T , which happens only once during some phase i of the algorithm, unless all these clusters are isolated, their number is at most 3 / r at the beginning of the phase i . Hence, after 1 + log / n rounds, allresulting clusters are isolated and, by Proposition 2.9, of level b . Hence, there are no edges betweenthe ﬁnal clusters, as needed.For the shattering applications in the CONGEST model in Section 4.3, we will need the factthat the above Theorem 4.1 generalizes to the following, more restrictive, setting.

Remark 4.2.

The above proof of Theorem 4.1 works even if each node u of the graph G is inthe communication graph simulated by a tree of strong-diameter R . The round complexity thenchanges to O ( R (log n + (log ∗ b ) log n )) .Proof. We need to check that both the main network decomposition algorithm from Theorem 4.121nd the balanced coloring from Proposition 4.3 generalize to this more restrictive setting, whereeach virtual node of G is a tree of diameter R in the underlying communication graph. In thecase of Proposition 4.3, we observe that expanding each node u of the Steiner tree T C u of diameter O (log n ), in its underlying tree, makes T C u a tree of diameter O ( R · log n ). Similarly, we can setthe parameter h in Proposition 4.3 such that h log n = 3 R · n + b ), i.e., R times bigger thanits size in Theorem 4.1. This implies that the resulting coloring will have the desired properties,while its round complexity is still O (( R · R + R · log n ) · log ∗ b ).Second, we verify that the network decomposition algorithm generalizes to this setting. When-ever a cluster C collects some information (e.g., the number of proposing vertices) through itsSteiner tree, we expand each virtual vertex in it to its corresponding tree and send the informationin the new, expanded Steiner tree of diameter O ( R · log n ). Since each virtual node is a partof O (log n ) Steiner trees, each edge in the expanded Steiner trees is a part of O (log n ) expandedSteiner trees. This means that gathering information through clusters is done with an additional R multiplicative increase in the round complexity. Similarly, whenever a virtual node proposesto a cluster, it can decide who to propose to in O ( R ) rounds by gathering information from theleaves of its communication tree. Hence, the ﬁnal round-complexity is multiplied by a factor of R ,which concludes the proof. Proposition 4.3.

Consider a network G with b -bit identiﬁers, where b = Ω(log n ) , and O ( b ) -bit message sizes. Suppose that the vertices are partitioned into clusters of weak-diameter R . Inparticular, for each cluster, we are also given a Steiner tree of depth R , such that each vertex isin O (log n ) of these Steiner trees. Furthermore, suppose that each cluster C has a level lev ( C ) ∈ [1 , O (log n )] . There is an algorithm that, in O (( R + h log n ) · log ∗ b ) rounds, returns a partialcoloring of the clusters with the following guarantees:Let U i be the set of vertices in clusters of level i and deﬁne U i + = ∪ j ≥ i U j . We deﬁne a clustergraph G i for each level i , where vertices are clusters of level i and two clusters are connected iﬀtheir distance in the subgraph of G induced by U i + is at most h log n , for a given value h ≥ .In the output partial coloring, each cluster which is contained in a connected component with atleast two level- i clusters is colored red or blue such that at most / of the level- i clusters are blueand similarly at most / of them are red. Clusters that are alone in their connected component in G i are left uncolored.Proof. We construct the coloring in parallel for each of the O (log n ) levels. First, for each level i ,we construct in parallel an extended cluster of C denoted by ˇ C as follows. We run a simultaneousBFS in U i + starting from all nodes that are contained in some level- i cluster. Each level- i BFS onlyuses a single bit in each b = Ω(log n ) bit message that can be send across each edge. Each nodeof each level- i cluster starts by sending a one-bit token through the one-bit channel to each of itsneighbors in U i + . In general, we are allowed to forward this level- i token only among nodes of U i + .Each node v ∈ U i + , upon receiving one (or more) level- i BFS tokens, remembers the ﬁrst node w itreceives a token from as its parent in the BFS tree, breaking ties arbitrarily. Moreover, in the nextround, v forwards this token to its own neighbors in U i + . We repeat this for h log n iterations.At the end, each node in U i + that can be reached from a level- i cluster via h log n hops in G [ U i + ]is reached and belongs to one level- i BFS. Each level- i cluster now has one (potentially singleton)tree T u attached to each of its vertices u , which contains all nodes of U i + that were reached by thetoken initiated in u .We deﬁne the extended cluster ˇ C of C as the union of all trees T u over u ∈ C and the Steinertree T ˇ C as the union of the Steiner tree T C together with trees T u for u ∈ C . Note that the above22onstruction adds each node to only O (log n ) extended clusters (at most one for each level), henceit is still the case that each vertex is in O (log n ) Steiner trees. Thus, using Corollary 5.3, eachcluster can broadcast its label to all its vertices in parallel O ( R + h log n ) rounds.We can now apply the algorithm from Lemma 3.2 for each level- i cluster graph G i deﬁned suchthat the nodes are extended clusters of level i and connections are between adjacent clusters.Whenever we collect or broadcast an information in a cluster during that algorithm, it can bedone for all clusters of all levels in parallel in O ( R + h log n ) rounds by Corollary 5.3, due to thefact that the total number of Steiner trees overlapping at any vertex is O (log n ). Whenever we usea particular edge connecting two Steiner trees, it can be used by O (log n ) runs for each level atthe same time, hence instead of one CONGEST round we need O (log n ) rounds. This complexityis, however, dominated by the complexity of broadcasting on Steiner trees, so in total, the roundcomplexity is bounded by O (( R + h log n ) log ∗ b ), as needed. Remark 4.2 has the following two corollaries that were mentioned in Section 1.3.

Corollary 4.4.

There is a randomized distributed algorithm that computes a maximal independentset in O (log ∆ · log log n + log log n ) rounds of the CONGEST model, with high probability.Proof.

The proof is the same as Corollary 3.6 with only one exception: The O (log log n ) roundcomplexity of building network decomposition is now replaced with an O (log log n ) round com-plexity, thanks to the faster decomposition provided by Theorem 4.1 which needs O ( R · log log n )rounds, where R = O (log log n ) is the diameter of each cluster formed after construction of theruling set.Similarly, we get the following improvement for the round-complexity of ∆ + 1-coloring. Corollary 4.5.

There is a randomized distributed algorithm, in the

CONGEST model, that computesa ∆ + 1 coloring in any n -node graph with maximum degree at most ∆ in O (log ∆ + log log n ) rounds, with high probability. In this section, we explain how to use pipelining to speed up broadcasting and information aggre-gation in our setting with overlapping broadcast trees. Our end result is Corollary 5.3 that we relyon whenever we want to optimize the round complexity of our algorithms in the

CONGEST model.Recall that we face the following problem in several

CONGEST algorithms in this paper. Wehave a collection of rooted trees T C such that the depth of each tree is R = O (log n ) and eachedge of the underlying graph G is present in up to O (log n ) trees. We now want to solve one of thefollowing two problems:1. Broadcast : The root of T C wants to send an m = O (log n )-bit message to all nodes in T C –this is useful e.g. when a cluster root tells the vertices in it whether the cluster grows in thisstep or not;2. Summation : Each node u ∈ T starts with a nonnegative m = O (log n )-bit number x u . Atthe end, the root of T knows the value of ( P u ∈ T x u ) mod 2 O ( m ) – this is useful e.g. when acluster collects how many nodes are proposing to it.23or the applications in Sections 3 and 4, we also need to quickly solve the following two opera-tions:1. Convergecast : We have O (1) special nodes u ∈ T , where each special node starts with aseparate message. At the end, the root of T knows all messages;2. Minimum : Each node u ∈ T starts with a nonnegative number x u . At the end, the root of T should know the value of min u ∈ T x u .To deal with the overlap of Steiner trees, in case there are P trees using the same edge, weallocate only b ′ = b/P bits (typically, b ′ = Θ(1)) of the capacity of each edge to a given tree. Wethen show how to solve the four aforementioned operations on a single tree with b ′ -bit messagesin time O ( R + m/b ′ ), where m is the length of the messages we are transmitting/aggregating.Performing broadcast and convergecast operations can be done by “pipelining” the messages [Pel00].For example, to perform broadcast of a message of length m > b ′ , the root splits the messageinto chunks of length m/b ′ and starts the broadcast of the i ’th chunk in the i ’th round. Thesubsequent broadcasts of diﬀerent chunks do not interfere, so all of them ﬁnish in O ( R + m/b ′ )rounds. Convergecast is handled similarly. To perform summation and taking minimums, eachnode needs to do a little bit more additional work as explained in the following lemma. Lemma 5.1.

Let T be a rooted tree with depth r . The tree is oriented towards its root and eachnode knows its parent, as well as its own depth and the overall depth of T . Moreover, each node u has an m -bit number x u . In one round of communication, each node can send a b -bit message forsome b ≤ m to all its neighbors in T . There is a protocol such that, in O ( r + m/b ) rounds, we canperform the following operations1. Broadcast : The root of T sends a m -bit message to all nodes in T ;2. Convergecast : We have O (1) special nodes u ∈ T , where each special node starts with aseparate m -bit message. At the end, the root of T knows all messages;3. Minimum : Each node u ∈ T starts with a nonnegative m -bit number x u . At the end, the rootof T knows the value of min u ∈ T x u ;4. Summation : Each node u ∈ T starts with a nonnegative m -bit number x u . At the end, theroot of T knows the value of ( P u ∈ T x u ) mod 2 O ( m ) ;Proof. For simplicity, we prove only the case m = b , as the generalization to b ≤ m is direct. TheBroadcast and Convergecast operations were already sketched above.The summation algorithm works as follows: each node u in depth d is sleeping except of rounds r − d + 1 to r − d + m . The node u starts with a value x u that will change over time. In everyround r − d ≤ i ≤ r − d + m , the node u sends the value of the ( i + d − r )’th least signiﬁcant bit b u to its parent (this only applies if it has a parent and if i > r − d ) and from each of its children v , the node u receives the corresponding value b v equal to the ( i + ( d + 1) − r )’th least signiﬁcantbit of x v . Then, u updates the value of x u as follows: x u ← x u − b u · i + d − r − + X v child of u b v · i + d − r . Note that after this one-round update, the total sum P u ∈ T x u did not change. On the other hand,we can easily see by induction that after round i , each non-root node u in depth d has the i + d − r x u set to zero. Hence, after r + O ( m ) rounds, for each u exceptthe root, we have (cid:0) x u mod 2 O ( m ) (cid:1) = 0 and, hence, the root has the value (cid:0)P u ∈ T x u (cid:1) mod 2 O ( m ) ,i.e., the ﬁnal sum.The case when addition is replaced by taking the minimum (or maximum) is handled similarly,but the nodes start sending the information from the most signiﬁcant bit. More concretely, thealgorithm aggregating min u ∈ T x u works as follows: each node u in depth d is sleeping except ofrounds r − d to r − d + m . The node u starts with a value x u and, moreover, it has a bit variable b u that at the beginning of round i contains the i + d − r ’th most signiﬁcant bit of min w ∈ T ( u ) x u (here, T ( u ) denotes the subtree of T rooted at u ). The node u also maintains a possibly emptysubset S u ⊆ { u } ∪ S v child of u { v } such that each child v of u is contained in S u if and only if, atthe beginning of round i , the i + d − r most signiﬁcant bits of min w ∈ T ( v ) x w are equal to those ofmin w ∈ T ( u ) x w . Similarly, u ∈ S u if and only if the i + d − r most signiﬁcant bits of x u are equal tothose of min w ∈ T ( u ) x w . Initially, we set S u = { u } ∪ S v child of u { v } and the variable c u is ﬁrst set inround r − d .In every round r − d ≤ i ≤ r − d + m , the node u sends the value of the bit b u to its parent (if ithas a parent and if i > r − d ) and from each of its children v , the node u receives the correspondingvalue b v . To update b u for the next round, the node u considers all values b v where v ∈ S u , and the( i + ( d + 1) − r )’th most signiﬁcant bit of x u if u ∈ S u . If at least one of those bits is equal to 0, b u is set to 0 and we remove all children v with b v = 1 from S u , as well as u if the ( i + ( d + 1) − r )’thmost signiﬁcant bit of x u is 1. Otherwise, the value of b u is set to 1 and S u is left the same.The correctness of the algorithm follows from the following induction argument. During round i , the node u got to know the i + ( d + 1) − r ’th most signiﬁcant bit of all v ∈ S u such thatmin w ∈ T ( v ) x w and min w ∈ T ( u ) x w agree on the i + d − r rightmost bits from its children as values b v .The node u then correctly updates b u as the i + ( d + 1) − r ’th most signiﬁcant bit of min w ∈ T ( u ) x w and accordingly updates the set S u afterwards. Hence, after r + m rounds, the root node knows all m bits of the value min u ∈ T x u , as needed. Remark 5.2.

In general, we are only using the property that the respective operation ◦ (such as + or min( · , · ) ) is associative and if p i ( x ) denotes the rightmost (leftmost) i bits of x , then p i ( x ◦· · ·◦ x k ) can be computed from p i ( x ) , . . . , p i ( x k ) . For example, multiplication also has this property. The above Lemma 5.1 is used via the following corollary.

Corollary 5.3.

Let G be a communication graph on n vertices. Suppose that each vertex of G ispart of some cluster C such that each such cluster has a rooted Steiner tree T C of diameter at most R and each node of G is contained in at most P such trees. Then, in O ( P + R ) rounds of the CONGEST model with b -bit messages for b ≥ P , we can perform the following operations for allclusters in parallel:1. Broadcast : The root of T C sends a b -bit message to all nodes in C ;2. Convergecast : We have O (1) special nodes u ∈ C , where each special node starts with aseparate b -bit message. At the end, the root of T C knows all messages;3. Minimum : Each node u ∈ C starts with a nonnegative b -bit number x u . At the end, the rootof T C knows the value of min u ∈C x u ;4. Summation : Each node u ∈ C starts with a nonnegative b -bit number x u . At the end, the rootof T C knows the value of ( P u ∈C x u ) mod 2 O ( b ) ; roof. Each edge allocates ⌊ b/P ⌋ bits to each Steiner tree that is using it. Then, for each Steinertree T C in parallel, we use Lemma 5.1 to perform the given operation in O ( R + b/ ( b/P )) = O ( R + P )rounds. Acknowledgment

This project has received funding from the European Research Council (ERC) under the EuropeanUnions Horizon 2020 research and innovation programme (grant agreement No. 853109)

References [ABCP96] Baruch Awerbuch, Bonnie Berger, Lenore Cowen, and David Peleg. Fast network de-compositions and covers.

J. of Parallel and Distributed Computing , 39(2):105–114, 1996.[AGLP89] Baruch Awerbuch, Andrew V. Goldberg, Michael Luby, and Serge A. Plotkin. Networkdecomposition and locality in distributed computation. In

Proc. 30th IEEE Symp. onFoundations of Computer Science (FOCS) , pages 364–369, 1989.[AP90] Baruch Awerbuch and David Peleg. Sparse partitions. In

Proc. 31st IEEE Symp. onFoundations of Computer Science (FOCS) , pages 503–513, 1990.[BE13] Leonid Barenboim and Michael Elkin.

Distributed Graph Coloring: Fundamentals andRecent Developments . Morgan & Claypool Publishers, 2013.[BEPS16] Leonid Barenboim, Michael Elkin, Seth Pettie, and Johannes Schneider. The locality ofdistributed symmetry breaking.

Journal of the ACM , 63:20:1–20:45, 2016.[BKM20] Philipp Bamberger, Fabian Kuhn, and Yannic Maus. Eﬃcient deterministic distributedcoloring with small bandwidth. In

Proc. Principles of Distributed Computing (PODC) ,pages to appear, arXiv:1912.02814, 2020.[CFG +

19] Yi-Jun Chang, Manuela Fischer, Mohsen Ghaﬀari, Jara Uitto, and Yufan Zheng. Thecomplexity of (∆+ 1) coloring in congested clique, massively parallel computation, andcentralized local computation. In

Proceedings of the 2019 ACM Symposium on Principlesof Distributed Computing , pages 471–480. ACM, 2019.[CHPS17] Keren Censor-Hillel, Merav Parter, and Gregory Schwartzman. Derandomizing localdistributed algorithms under bandwidth restrictions. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Infor-matik, 2017.[CKP16] Y.-J. Chang, T. Kopelowitz, and S. Pettie. An exponential separation between random-ized and deterministic complexity in the LOCAL model. In

Proc. 57th IEEE Symp. onFoundations of Computer Science (FOCS) , 2016.[CLP18] Yi-Jun Chang, Wenzheng Li, and Seth. Pettie. An optimal distributed (∆ + 1)-coloringalgorithm? In

Proc. 50th ACM Symp. on Theory of Computing (STOC) , 2018.[EN16] Michael Elkin and Ofer Neiman. Distributed strong diameter network decomposition. In

Proc. 35th ACM Symp. on Principles of Distributed Computing (PODC) , pages 211–216,2016. 26Gha16] Mohsen Ghaﬀari. An improved distributed algorithm for maximal independent set. In

Proc. ACM-SIAM Symp. on Discrete Algorithms (SODA) , pages 270–277, 2016.[Gha19] Mohsen Ghaﬀari. Distributed maximal independent set using small messages. In

Proc.ACM-SIAM Symp. on Discrete Algorithms (SODA) , pages 805–820, 2019.[GHK18] Mohsen Ghaﬀari, David Harris, and Fabian Kuhn. On derandomizing local distributedalgorithms. In

Proc. Foundations of Computer Science (FOCS) , pages 662–673, 2018.[GKM17] Mohsen Ghaﬀari, Fabian Kuhn, and Yannic Maus. On the complexity of local distributedgraph problems. In

Proc. 49th ACM Symp. on Theory of Computing (STOC) , pages784–797, 2017.[GP19] Mohsen Ghaﬀari and Julian Portmann. Improved network decompositions using smallmessages with applications on mis, neighborhood covers, and beyond. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.[Lin87] Nathan Linial. Distributive graph algorithms – global solutions from local data. In

Proc.28th IEEE Symp. on Foundations of Computer Science (FOCS) , pages 331–335, 1987.[LS93] Nati Linial and Michael Saks. Low diameter graph decompositions.

Combinatorica ,13(4):441–454, 1993.[Pel00] David Peleg.

Distributed Computing: A Locality-Sensitive Approach . SIAM, 2000.[PS92] Alessandro Panconesi and Aravind Srinivasan. Improved distributed algorithms forcoloring and network decomposition problems. In

Proc. 24th ACM Symp. on Theory ofComputing (STOC) , pages 581–592, 1992.[RG20] V´aclav Rozhoˇn and Mohsen Ghaﬀari. Polylogarithmic-time deterministic network de-composition and distributed derandomization. In