Round- and Message-Optimal Distributed Graph Algorithms
RRound- and Message-Optimal Distributed Graph Algorithms ∗ Bernhard Haeupler † , D. Ellis Hershkowitz † , David Wajc † Carnegie Mellon University, {haeupler,dhershko,dwajc}@cs.cmu.edu
Abstract
Distributed graph algorithms that separately optimize for either the number ofrounds used or the total number of messages sent have been studied extensively. How-ever, algorithms simultaneously efficient with respect to both measures have been elu-sive. For example, only very recently was it shown that for Minimum Spanning Tree(MST), an optimal message and round complexity is achievable (up to polylog terms)by a single algorithm in the CONGEST model of communication.In this paper we provide algorithms that are simultaneously round- and message-optimal for a number of well-studied distributed optimization problems. Our mainresult is such a distributed algorithm for the fundamental primitive of computing simplefunctions over each part of a graph partition. From this algorithm we derive round- andmessage-optimal algorithms for multiple problems, including MST, Approximate Min-Cut and Approximate Single Source Shortest Paths, among others. On general graphsall of our algorithms achieve worst-case optimal ˜ O ( D + √ n ) round complexity and ˜ O ( m )message complexity. Furthermore, our algorithms require an optimal ˜ O ( D ) rounds and˜ O ( n ) messages on planar, genus-bounded, treewidth-bounded and pathwidth-boundedgraphs. ∗ An extended abstract of this work will appear in PODC 2018. † Supported in part by NSF grants CCF-1527110, CCF-1618280 and NSF CAREER award CCF-1750808. Throughout this paper, n , m and D denote respectively the number of nodes, number of edges and the graphdiameter respectively. In addition, we use tilde notation, ˜ O, ˜Ω and ˜Θ, to suppress polylogarithmic terms in n . a r X i v : . [ c s . D C ] M a y Introduction
Over the years, a great deal of research has focused on characterizing the optimal runtime fordistributed graph algorithms in the CONGEST model of communication. Fundamental problemsthat have been studied include Shortest Paths [10, 22, 23, 28, 29, 32], MST [13, 25, 26, 37], Min-Cut [16, 33], and Max Flow [17]. Runtime is measured by the number of synchronous rounds ofcommunication, and for these problems ˜Θ( D + √ n ) rounds are known to be necessary and sufficient[5, 7, 16, 37].Another common performance metric optimized for in the CONGEST model is the total num-ber of messages sent. For MST, an ˜Ω( m ) lower bound is known [2]. However, for several decadesthe only MST algorithms known to match this message lower bound had sub-optimal round com-plexity [1, 2, 3, 9, 11, 12]. The question of whether algorithms attaining both optimal round andmessage complexity has been a long-standing problem. For instance, Peleg and Rubinovich [37]asked whether it might be achievable for MST. In a recent breakthrough work Pandurangan et al.[35] answered this question in the affirmative, providing a randomized MST algorithm with sim-ulataneously optimal round and message complexities (up to polylog terms). Shortly thereafterElkin [8] provided the same result without randomization. However, simultaneously round- andmessage-optimal algorithms for other fundamental problems have remained elusive.
In this paper we advance the study of simultaneously round- and message-optimal distributedalgorithms. In particular, we provide such algorithms for multiple distributed graph problems.Underlying these contributions is our main result – a round- and message-optimal algorithm for afundamental distributed problem, which we refer to as Part-Wise Aggregation (or PA for short). Weelaborate on some applications of this algorithm in Section 1.2, as well as Appendix A. Informally,Part-Wise Aggregation is the problem of computing the result of a function applied to each partof a graph partition. Formally, the problem is as follows.
Definition 1.1 (Part-Wise Aggregation) . The input to
Part-Wise Aggregation (PA) is:1. a graph G = ( V, E ) ;2. a partition ( P i ) Ni =1 of V , where each P i induces a connected subgraph on G . Each node v ∈ P i knows an O (log n ) -bit value associated with it, val(v), and which of its neighbors are in P i ;3. a function f that takes as input two O (log n ) -bit inputs, outputs an O (log n ) -bit output andis commutative and associative.The problem is solved if for every P i every v ∈ P i knows its part’s aggregate value f ( P i ) := f ( val ( v ) , f ( val ( v ) , . . . )) , where P i = { v , v , . . . } . The performance of our algorithm is determined by the quality of the shortcuts that the inputgraph admits. Shortcuts, as well as the parameters which determine their quality – termed the Strictly speaking, the ˜Ω( m ) message lower bound for MST only holds if the algorithm is (1) deterministic (2) inthe KT0 model, or (3) “comparison-based". (Our deterministic algorithm satisfies (1),(2) and (3).) If these conditionsare not met it is possible to solve MST using ˜ O ( n ) messages – beating the ˜Ω( m ) bound for sufficiently dense graphs.For more see Mashreghi and King [30]. lock parameter , b , and congestion , c – are formally defined in Section 2.2. For now, we note onlythat every graph admits a shortcut with b = 1 and c = √ n . Our main result is as follows. Theorem 1.2.
There exists a Part-Wise Aggregation on a graph G admitting a tree-restrictedshortcut with congestion c and block parameter b w.h.p. in ˜ O ( bD + c ) rounds and ˜ O ( m ) messages,and deterministically in ˜ O ( b ( D + c )) rounds and ˜ O ( m ) messages. The power of Part-Wise Aggregation – and by extension Theorem 1.2 – is that numerous distributedprimitives can be cast as instances of this problem. For example, it is not hard to see that electinga leader, computing the number of nodes in each tree in a forest or having every part of a graphpartition agree on a minimum value are all instances of this problem. Consequently, many previousalgorithms rely on subroutines which are implementable using Part-Wise Aggregation [5, 6, 12, 13,14, 15, 16, 17, 18, 26, 33]. Perhaps unsurprisingly then, using our new PA algorithm as a subroutinein some of these previous works’ algorithms, we obtain round- and message-optimal solutions tonumerous problems.In the following three corollaries we highlight three such applications of our algorithm: round-and message-optimal algorithms for MST, Approximate Min-Cut and Approximate SSSP. We giveproofs of these corollaries and also discuss further applications of our PA algorithm and our sub-routines in Appendix A. For a flavor of these corollaries’ proofs, we note that Borůvka’s MSTalgorithm [34] can be implemented easily using O (log n ) applications of Part-Wise Aggregation,implying Corollary 1.3. Corollaries 1.4 and 1.5 are obtained by using our PA algorithm in thealgorithms of Ghaffari and Haeupler [15] and Haeupler and Li [18], respectively.The input to all three problems consists of an undirected weighted graph, with edge weightsin [1 , poly( n )]. Initially, every node knows the weight associated with each of its incident edges.Since every graph admits a shortcut with b = 1 and c = √ n , our algorithms simultaneously achievemessage complexities of ˜ O ( m ) and runtimes of essentially worst-case optimal ˜ O ( D + √ n ). MST.
MST is solved when every node knows which of its incident edges are in the MST.
Corollary 1.3.
Given a graph G admitting a tree-restricted shortcut with congestion c and blockparameter b , one can solve MST w.h.p in ˜ O ( bD + c ) rounds and ˜ O ( m ) messages and deterministicallyin ˜ O ( b ( D + c )) rounds with ˜ O ( m ) messages. Approximate Min-Cut.
Min-cut is (1 + (cid:15) )-approximated when every node knows whether ornot it belongs to a set S ⊂ V such that the size of the cut given by ( S, V \ S ) is at most (1 + (cid:15) ) λ ,where λ is the size of the minimum cut of G with the prescribed weights. Corollary 1.4.
For any (cid:15) > and graph G admitting a tree-restricted shortcut with congestion c and block parameter b , one can (1 + (cid:15) ) -approximate min-cut w.h.p. in ˜ O ( bD + c ) · poly(1 /(cid:15) ) roundsand ˜ O ( m ) · poly(1 /(cid:15) ) messages. Throughout the paper, by w.h.p. we mean with probability 1 − n ) . pproximate SSSP. An instance of α -approximate single source shortest paths (SSSP) consistsof an undirected weighted graph G as above and a source node s , which knows that it is thesource node. For v ∈ V we denote by d ( s, v ) the shortest path length between s and v in G and L = max u,v d ( u, v ). The problem is solved once every node v knows d v such that d ( s, v ) ≤ d v ≤ α · d ( s, v ). Corollary 1.5.
For any β = O (1 / poly log n ) , given a graph G admitting a tree-restricted shortcutwith congestion c and block parameter b , one can L O (log log n ) / log(1 /β ) -approximate SSSP w.h.p in ˜ O (cid:16) β ( bD + c ) (cid:17) rounds and ˜ O (cid:16) mβ (cid:17) messages. The value of β determines a tradeoff between the quality of the SSSP approximation and theround and message complexity of our algorithm. Taking β = log − Θ(1 /(cid:15) ) n , Corollary 1.5 yields an O ( L (cid:15) )-approximation algorithm using ˜ O ( bD + c ) rounds and ˜ O ( m ) messages. There are a few salient points worth noting regarding our results, on which we elaborate below.
Round- and Message-Optimality of our Algorithms.
As all graphs admit a tree-restrictedshortcut with block parameter b = 1 and congestion c = √ n , our algorithms all terminate within˜ O ( D + √ n ) rounds, which is optimal for all our applications of our PA algorithm, by [5]. As formessage complexity, our ˜ O ( m ) bound is tight for MST in the KT model by [2]. For the otherproblems an Ω( n ) message lower bound is trivial; for sparse graphs, then, our message complexitybound is tight for these problems. Finally, we note that our proof of Corollary 1.3 relies on solvingPart-Wise Aggregation O (log n ) times to solve MST, which implies that our algorithms for PA areboth round- and message-optimal (again, up to polylog terms). Beyond Worst-Case Optimality.
As stated above, every graph admits tree-restricted shortcutswith block parameter b = 1 and congestion c = √ n , which implies an ˜ O ( D + √ n ) bound for ouralgorithms’ round complexity on general graphs. However, as observed in prior work, a numberof graph families of interest – planar, genus-bounded, bounded-treewidth and bounded-pathwidthgraphs – admit shortcuts with better parameters [15, 19, 20]. As a result, our algorithms havea round complexity of only ˜ O ( D ) times the relevant parameter of interest (e.g., genus, treewidthor pathwidth). Provided these parameters are constant or even polylogarithmic, our algorithmsrun in ˜ O ( D ) rounds. Another recent result [21] implies that our algorithms run in ˜ O ( D ) time onminor-free graphs. We elaborate on our results for all the above graphs in Appendix C. We alsonote that our algorithms need not know the optimal values of block parameter and congestion, asa simple doubling trick can be used to approximate the best values (see [19]). In particular, ouralgorithms perform as well as the parameters of the best shortcut that the input graph admits. Future Applications of This Work.
Non-trivial shortcuts likely exist for graph families beyondthose mentioned above. As such, demonstrating even better runtimes for our algorithms on manynetworks may be achieved in the future by simply proving the existence of efficient shortcuts on saidnetworks. Moreover, given the pervasiveness of PA in distributed graph algorithms, the applicationsof our PA algorithm we present are likely non-exhaustive. We are hopeful that our PA algorithmwill find applications in deriving round- and message-optimal bounds for even more problems.3
Preliminaries
Before moving onto our formal results, we explicitly state the model of communication we considerand then review relevant concepts from previous work in low-congestion shortcuts.
Throughout this paper we work in the classic CONGEST model of communication [36]. In thismodel, the network is modeled as a graph G = ( V, E ) of diameter D with n = | V | nodes and m = | E | edges. Communication is conducted over discrete, synchronous rounds. During eachround each node can send an O (log n )-bit message along each of its incident edges. Every nodehas an arbitrary and unique ID of O (log n ) bits, first only known to itself (this is the KT modelof Awerbuch et al. [2]). Low-congestion shortcuts were originally introduced by Ghaffari and Haeupler [15] to solve PA.These shortcuts allow high-diameter parts to communicate efficiently, by using edges outside ofparts; this effectively decreases the diameter of the parts. Ghaffari and Haeupler [15] showed how,given a simple low-congestion shortcut, PA can be solved in an optimal number of rounds – i.e.˜ O ( D + √ n ) – w.h.p. Formally, a low congestion shortcut is defined as follows. Definition 2.1 (Low-Congestion Shortcuts [15]) . Let G = ( V, E ) be a graph and ( P i ) Ni =1 be apartition of G ’s vertex set. H = H , . . . , H N where H i ⊆ E is a c -congestion shortcut with dilation d with respect to ( P i ) Ni =1 if it satisfies1. Each edge e ∈ E belongs to at most c of the H i .2. The diameter of ( P i ∪ V ( H i ) , E [ P i ] ∪ H i ) for any i is at most d . Ghaffari and Haeupler [15] also showed how to compute near-optimal ˜ O ( D )-congestion and˜ O ( D )-dilation shortcuts for planar graphs, given an embedding of such a graph. This allowed themto obtain ˜ O ( D )-round MST algorithms for this problem, among other results. However, it was notuntil the work of Haeupler, Izumi, and Zuzic [19] that it was demonstrated that shortcuts could beefficiently computed in general. This work showed that high quality instances of a certain type ofshortcut – tree-restricted shortcuts – can be efficiently approximated. These types of shortcuts aredefined as follows. Definition 2.2 (Tree-Restricted Shortcuts [19]) . Let G = ( V, E ) be a graph and ( P i ) Ni =1 be apartition of G ’s vertex set. A shortcut H = ( H i ) Ni =1 is a T -restricted shortcut with respect to ( P i ) Ni =1 if there exists a rooted spanning tree T of G with H i ⊆ E [ T ] for all i ∈ [ N ] . Since a rooted BFS tree has minimal depth, and the ˜ O ( D )-round ˜ O ( m )-message deterministicleader election algorithm of Kutten et al. [27] allows us to compute a BFS tree in the same bounds,throughout this paper T will be a rooted BFS tree. The same work that introduced tree-restrictedshortcuts also introduced a convenient alternative to dilation, termed block parameter . Here V ( H i ) denotes all endpoints of edges in H i and E [ P i ] denote the edges of G with both endpoints in P i . efinition 2.3 (Block Parameter [19]) . Let H = ( H i ) ni =1 be a T -restricted shortcut on the graph G = ( V, E ) with respect to parts ( P i ) ni =1 . For any part P i , we call the connected components of ( P i ∪ V ( H i ) , H i ) the blocks of P i , and the number of blocks of P i its block parameter . The blockparameter of H , b , is the maximum block parameter of any part P i . As shown in [19], if T is a depth- D tree, the dilation of a T -restricted shortcut with blockparameter b is at most O ( bD ). As such, block parameter is a convenient alternative to dilation.See Figure 1 for an example of a T -restricted shortcut. Blocks of P : v V : H : E [ T ]: P i : Legend P P P P , , , , Figure 1: An example of a T -restricted shortcut on 4 parts. e ∈ E [ T ] labeled with { i : e ∈ H i } .Edges directed towards root of T . Here c = 3 and b = 2. In this section we outline our general algorithmic approach. We begin by demonstrating the messagesub-optimality of previous shortcut algorithms for Part-Wise Aggregation on a particular example.We then give a workaround for this example and sketch how we develop this workaround into afull-fledged algorithm.
Several prior round-optimal randomized algorithms for PA used tree-restricted shortcuts [19, 20].To solve PA, these algorithms repeatedly aggregate within blocks. To aggregate within a block,every node in the block transmits its value up the block (along the tree’s edges); when values fromthe same part arrive at a node in the block, they are aggregated by applying f and then forwardedup the block as a single value. By the end of this process, the root of the block has computed f of the block and can broadcast the result back down. This approach can be implemented using anoptimal ˜ O ( D + √ n ) rounds.Unfortunately, there exist PA instances for which the above approach requires ω ( m ) messages.For example, consider the D × ( n − /D grid graph with an additional node, r , neighboring all5 a) Ω( nD ) message example.Parts are indicated by dashed rectangles.Tree edges are directed towards the root. (b) A workaround.Here sub-parts are indicated by (smaller)dotted rectangles. Figure 2: A bad example for prior shortcut algorithms, and a workaround.of the top row’s nodes. Suppose each row is its own part, and all the column edges are shortcutedges, forming a single block rooted at r . See Figure 2a. Aggregating within this block requiresΩ( nD ) messages: a message cannot be combined with other messages in its part until it has at leastreached r and so each node is responsible for sending a unique message to r along a path of length D/ nD ) messages,which is sub-optimal for any D = ω (1), since m = O ( n ) for this network. A Workaround.
We can improve the poor message complexity of aggregating within blocks onthis particular network as follows. Partition each of the D parts into sub-parts , each with O ( D )connected nodes; we have O ( n/D ) sub-parts in total. See Figure 2b. First, sub-parts aggregate:the right-most node in the sub-part broadcasts its value left and every other node broadcasts leftthe aggregation of its own value and what it receives from its neighbor to the right. The leftmostnode of a sub-part then uses the block’s edges to transmit the sub-part’s aggregate value to r ,which then computes the aggregate value for each part. Symmetrically to the above procedure, r then broadcasts to every node the aggregate value for its part.Aggregating within each sub-part requires O ( n ) messages, as it requires each node to broadcastat most once. Moreover, there are O ( n/D ) sub-parts, each responsible for broadcasting up anddown the block once and so using the shortcut requires O ( n/D ) · O ( D ) = O ( n ) messages. Therefore,for this network, our workaround requires an optimal O ( m ) = O ( n ) messages. The workaround of the previous subsection is heavily tailored to the particular example of Figure 2a.Moreover, it requires that nodes know significantly more about the network topology than we allow.However, the above example and workaround motivate and highlight some of the notable strategiesof our algorithm for Part-Wise Aggregation.
Sub-Part Divisions.
As illustrated in the example, having all nodes use a shortcut in orderto send their private information to their part leader rapidly exhausts our ˜ O ( m ) message budget.To solve this issue, we refine the partition of our network into what we call a sub-part division .In a sub-part division each part P i containing more than D nodes is partitioned into ˜ O ( | P i | /D )6ub-parts each with a spanning tree rooted at a designated node termed the representative ofthe sub-part. In the preceding example the representatives are the left-most nodes of each sub-part. Each sub-part uses its spanning tree to aggregate towards its representative, who then aloneis allowed to use shortcut edges to forward the result toward the part leader. This decreasesthe number of nodes that use the shortcut from O ( n ) to ˜ O ( n/D ), thereby reducing the messagecomplexity of aggregating within a block from O ( nD ) to ˜ O ( n ). Applying this observation and somestraightforward random sampling ideas to previous work on low-congestion shortcuts to solve PAalmost immediately implies our message-efficient randomized solutions to PA. Message-Efficient (and Deterministic) Shortcut Construction.
If our algorithms are touse shortcuts as we did in the preceding example, they must construct them message efficiently;i.e., with ˜ O ( m ) messages. No previous shortcut construction algorithm achieves low message com-plexity. We show that not only do sub-part divisions allow us to use shortcuts message efficiently,but they also allow us to construct shortcuts message efficiently. In particular, we give both ran-domized and deterministic message-efficient shortcut construction algorithms. The latter is thefirst round-optimal deterministic shortcut construction algorithm and is based on a divide-and-conquer strategy that uses heavy path decompositions [39]. Though the general structure of ourdeterministic shortcut construction algorithm is similar to that used in previous low-congestionshortcut work – nodes try to greedily claim the shortcut edges they get to use – the techniquesused to deterministically implement this structure are entirely novel with respect to past work inlow-congestion shortcuts. Star Joinings.
To use sub-part divisions as above, we must demonstrate how to compute themwithin our bounds. To do so, we begin with every node in its own sub-part and repeatedly mergesub-parts until the resulting sub-parts are sufficiently large. However, it is not clear how manysub-parts can be efficiently merged together at once, as obtained sub-parts can have arbitrarilylarge diameter, rendering communication within a sub-part infeasible. We overcome this issue byalways forcing sub-parts to merge in a star-like fashion; this limits the diameter of the new sub-part, enabling the new sub-part to adopt the representative of the center of the star. We call thisbehavior star joinings . As we show, enforcing this behavior is easily accomplished with random coinflips. We also accomplish the same behavior deterministically but with significantly more technicaloverhead, drawing on the coloring algorithm of Cole and Vishkin [4].
In this section we show how to solve PA, given shortcuts and a sub-part division. The subroutinesnecessary to compute shortcuts and sub-part divisions randomly and deterministically within ourtime and message bounds are given in Section 5 and Section 6, respectively. Those subroutinestogether with our algorithm for PA given a sub-part division and shortcuts imply our main result,Theorem 1.2.For our purposes it is convenient to assume that in our PA instance each part P i also has a leader l i ∈ P i where every v ∈ P i knows the ID of l i . As we show in Appendix B, we can dispensewith this assumption at the cost of logarithmic overhead in round and message complexity. Aswe ignore multiplicative polylogarithmic terms, for the remainder of the paper we assume that aleader for each part is always known in our PA instances.7ne of the crucial ingredients we will rely on to solve PA instances as above is sub-part divisions. Definition 4.1 (Sub-part division) . Given partition ( P i ) Ni =1 of V , a sub-part division is a partitionof every part P i into ˜ O (cid:16) | P i | D (cid:17) sub-parts S , . . . , S k i . Each sub-part S j also has a spanning tree ofdiameter O ( D ) rooted at a node r ∈ S j , termed the sub-part’s representative . b b b P i Figure 3: A division of a part, P i , incident to blocks b , b and b , into 4 sub-parts. Sub-partrepresentatives: stars. Solid colored lines: edges in the tree of each sub-part according to the colorof the representative. Dashed lines: edges in E between sub-parts.We note that sub-parts are not necessarily related to blocks in any way; e.g. a single sub-partmight span multiple blocks and blocks need not contain sub-part representatives. See Figure 3 foran illustration of how sub-parts and blocks might interact. The second ingredient we rely on is tree-restricted shortcuts, along which we will route (some of) ourmessages. To do so, we must first restate an algorithm of Haeupler, Izumi, and Zuzic [19] which werefer to as
BlockRoute , that convergecasts/broadcasts within shortcut blocks. As convergecastand broadcast are symmetric, we only discuss convergecast.
Lemma 4.2. ([19, Lemma 2]) Let T be a tree of depth D . Given a family of subtrees such that anyedge of T is contained in at most c subtrees, there is a deterministic algorithm that can perform aconvergecast/broadcast on all of the subtrees in O ( D + c ) CONGEST rounds.Specifically, for convergecasts, if multiple messages are scheduled over the same edge, the algo-rithm forwards the packet with the smallest depth of the subtree root, breaking ties with the smallestID of the subtree.
One observation we make about this algorithm, and which will prove crucial since we only allowrepresentatives to use shortcuts, is the following.
Observation 4.3.
Let S be the set of nodes with a value to be convergecasted in the algorithmdescribed in Lemma 4.2. Then the number of messages used by the algorithm is O ( | S | D ) . .2 Solving PA and Verifying the Block Parameter We now show how given a sub-part division and a T -restricted shortcut, we can round- and message-efficiently solve PA with and without randomization. Our method is given by Algorithm 1 (whichcontains both our deterministic and randomized algorithm), and works as follows. First, each leader l i of part P i broadcasts an arbitrary message m i to all nodes in P i . Then, symmetrically to how m i was broadcast, each l i computes f ( P i ) and then broadcasts f ( P i ) to all nodes of P i . The mosttechnically involved aspect of our algorithm is how l i broadcasts m i to all nodes in P i . If | P i | issmaller than D , broadcast can be trivially performed along the spanning tree of the single sub-partof P i in O ( D ) rounds with O ( | P i | ) messages. However, if | P i | is larger than D , we use shortcuts,as follows.For our deterministic algorithm, we repeat the following b times: every representative in a blockwhich received the message m i spreads m i to other representatives in its block using BlockRoute along the shortcut. Next, representatives with m i spread m i to nodes in their sub-part. Lastly,nodes with m i spread m i to neighboring nodes in adjacent sub-parts. Crucially, only our represen-tatives use shortcuts, thereby limiting our message complexity, by Observation 4.3. We illustratethe broadcast of m i in Figure 4.Our randomized algorithm works similarly, with the following modification: each part leaderindependently delays itself – and subsequently, its entire part – before sending its first message atthe beginning of the algorithm, by a delay chosen uniformly in the range [ c ] (here c is the shortcut’scongestion). This limits the number of parts which would use any given edge during any roundto O (log n ) w.h.p. As only one message can be sent along an edge, we execute BlockRoute asbefore, but rather than break ties as in Lemma 4.2, we simply spend O (log n ) rounds between each“meta-round”, to allow each node to forward all of its O (log n ) messages. This broadcast withinblocks requires O ( D log n ) CONGEST rounds.The following lemma states the performance of our algorithms. Lemma 4.4.
Given a sub-part division and a T -restricted shortcut with congestion c and blockparameter b , Algorithm 1 uses ˜ O ( m ) messages to solve PA either w.h.p in ˜ O ( bD + c ) rounds ordeterministically in ˜ O ( b ( D + c )) rounds.Proof. We first prove our round complexities. We start by proving the stated round complexityfor broadcasting m i . Any part that is of fewer than D nodes clearly only requires O ( D ) rounds.For any part of more than D nodes, we argue that each of the b iterations requires only ˜ O ( D + c )rounds or O ( D log n ) if a random delay of U ( c ) is added. Routing m i from l i to r ( l i ) requires O ( D ) rounds. Running BlockRoute only requires O ( D + c ) rounds by Lemma 4.2. Moreover,if a random delay is added, a Chernoff and union bound show that w.h.p an edge never has morethan O (log n ) distinct parts’ aggregate messages that should be transmitted along it. By allowingeach node to send up to O (log n ) parts’ aggregate message in each meta-round, BlockRoute requires O ( D log n ) rounds, and therefore this approach requires ˜ O ( bD + c ) rounds overall. Next,broadcasting m i within any sub-part requires ˜ O ( D ) rounds as sub-parts are of diameter ˜ O ( D ).Broadcasting m i to adjacent subparts requires only a single round. Lastly, computing f ( P i ) andbroadcasting f ( P i ) symmetrically require ˜ O ( b ( D + C )) rounds.We now prove a message complexity of ˜ O ( m ). We start by proving this message complexityfor broadcasting m i . Message complexity is trivial if the part is of fewer than D nodes. Nextconsider parts of more than D nodes. Notice that nodes in a given sub-part only send messagesin those rounds where the sub-part is active. Moreover, once a sub-part becomes inactive, it never9 lgorithm 1 PA given shortcut and sub-part division.
Input : PA instance;
Input : sub-part division;
Input : T -restricted shortcut Output : solves PA
Notation : S ( v ) ⊆ V := v ∈ V ’s sub-part; Notation : r ( v ) ∈ V := v ∈ V ’s representative; Notation : S ( U ) := S u ∈ U S ( u ) for U ⊆ V ; Notation : R ( U ) := { r ( U ) | u ∈ U } for U ⊆ V ; Notation : R i := r ( P i ); Notation : B i ( v ) ⊆ V := the block of part P i containing v ∈ V for Part P i do if | P i | < D then Broadcast m i from l i to all of P i along P i ’s spanning tree. else if Randomized algorithm then Delay part P i by (independent) ∼ U ( c ); Blow up subsequent calls to
BlockRoute by O (log n ). Route m i from l i to r ( l i ) using BlockRoute . A ← { r ( l i ) } , I ← {} . . Initialize sets of “active”/“inactive” representatives. for b iterations do Run
BlockRoute on A to send m i to all nodes in S r ∈A B i ( r ) ∩ R i . A ← A S r ∈A B i ( r ) ∩ R i . for all r ∈ A do Broadcast m i from r to S ( r ) along S ( r )’s representing tree. Broadcast m i over edges in E that exit sub-parts in S ( A ). for all Vertex v S ( A ) ∪ S ( I ) do if v received a message in line 15 then v routes m i to r ( v ). I ← I ∪ A . A ← representatives that received a message in line 18.
Symmetrically to lines 1-20, compute f ( P i ) at l i . Symmetrically to lines 1-20, broadcast result of f ( P i ) from l i to all nodes in P i .10 b b b b b b b b Active:Inactive:Active:Inactive:Active:Inactive: I t e r a t i o n I t e r a t i o n I t e r a t i o n Figure 4: Nodes with m j (yellow circles) and (in)active representatives at the end of each broadcastiteration for a part with 3 blocks b , b and b . The leader l i is indicated by a black square, l i ,while sub-part representatives are indicated by stars; solid lines and dotted black lines correspondto intra- and inter-sub-part edges.again becomes active. Routing m i from l i to r ( l i ) requires O ( D ) messages. Moreover, each of the˜ O (cid:16) | P i | D (cid:17) sub-parts in part P i use BlockRoute at most once, using O ( D ) messages per sub-part byObservation 4.3; as a result this step uses ˜ O ( | P i | ) messages for part P i and so ˜ O ( n ) messages total.Broadcasting within all sub-parts requires O ( n ) messages since each sub-part only does so onceand has a spanning tree with which to do so. Broadcasting across sub-parts uses each edge at mosttwice and so uses O ( m ) messages. Lastly, computing f ( P i ) and broadcasting f ( P i ) symmetricallyrequire O ( m ) messages.Correctness of broadcasting m i is trivial if | P i | < D . Moreover, if | P i | ≥ D , a simple argumentby induction over blocks shows that b iterations suffices for parts of more than D nodes. Correctnessof computing f ( P i ) and broadcasting f ( P i ) symmetrically follow.Because our PA algorithm is essentially the same algorithm we use to verify that our shortcutshave good block parameter, we now describe this second algorithm. We verify the block parameterof a fixed part P i as follows. Run Algorithm 1 to broadcast an arbitrary message m i . If our blockparameter is sufficiently small then every node will receive m i and assume it as such. Moreover,if our block parameter is too large but Algorithm 1 succeeds we can still use Algorithm 1 toinform all nodes in P i of P i ’s block parameter symmetrically to how m i was broadcast. However,if Algorithm 1 fails –i.e. some node does not receive m i – then we must somehow inform all nodesthat the block parameter is too large. We do so by having each node that does not receive m i P i that it did not receive m i . There must be some such neighbor in P i thatdid receive m i . By one additional call to Algorithm 1 this neighbor can inform all nodes that didreceive m i that the block parameter is, in fact, too large. This algorithm gives the following lemma. Algorithm 2
Block parameter verification.
Input : partition of V , ( P i ) Ni =1 , where v ∈ P i knows leader l i ; Input : sub-part division;
Input : T -restricted shortcut; Input : desired block parameter b Output : for every P i , v ∈ P i learns if P i has block parameter b in the input shortcut for part P i do Run Algorithm 1 to broadcast arbitrary message m i from l i . for v ∈ P i that did not receive m i do v broadcasts ¯ m i to neighbors in P i . Run Algorithm 1 to broadcast if a node that received m i also received ¯ m i . for every i and v ∈ P i do if v did not receive m i or received ¯ m i then v decides block parameter of P i exceeds b . else Run Algorithm 1 to compute the block number of P i . Lemma 4.5.
Given parts ( P i ) Ni =1 , a sub-part division, a c -congestion T -restricted shortcut, H , anddesired block parameter b , one can deterministically (resp., w.h.p.) inform every node whether itspart’s block parameter in H exceeds b in ˜ O ( b ( D + c )) (resp. ˜ O ( bD + c ) ) rounds with ˜ O ( m ) messages.Proof. Round and message complexities follow trivially from Lemma 4.4 and Lemma 4.2. We nowargue correctness. If a node does not receive m i when Algorithm 1 is first run then the blockparameter of P i is certainly larger than b . When this occurs, all nodes will either be told by l i thatthe block parameter is larger than b or they will not receive m i , implicitly informing them that theblock parameter of P i is larger than b . If all nodes receive m i , then l i clearly distributes to all nodesin P i the number of blocks incident to P i and so the block number of P i is correctly determined tobe above or below b as desired. In this section we outline how we construct sub-part divisions and shortcuts round- and message-optimally using randomization.
We first show how a sub-part division can be computed with randomization, by randomly samplingsub-part representatives in Algorithm 3. In particular, for large parts ( | P i | ≥ D ), every nodedecides to be a representative with probability min { , log nD } and then representatives claim balls ofradius D around them as their sub-part. Algorithm 3’s properties are given below. Lemma 5.1.
Algorithm 3 computes a sub-part division of a part with a known leader w.h.p in O ( D ) rounds with O ( m ) messages. lgorithm 3 Randomized sub-part division.
Input : partition of V given by ( P i ) Ni =1 where v ∈ P i knows leader l i Output : sub-part division for part P i do if | P i | ≤ D then Let P i have one sub-part with representative l i . Compute sub-part spanning tree by an O ( D ) round BFS restricted to P i starting at l i . else for v ∈ P i do With prob. min { , log nD } , node v is a representative and sends its ID to P i neighbors. for O ( D ) rounds do v broadcasts the first representative ID it hears to neighbors in P i once. v ’s sub-part parent is the neighbor from which it first heard a representative ID. v determines for which of its neighbors it is a parent. Proof.
Runtime and message complexity are trivial. Correctness is trivial for parts of fewer than D nodes, so consider parts of more than D nodes. By construction, each claimed sub-part hasdiameter O ( D ). It remains to show that every node has a representative and there are ˜ O (cid:16) | P i | D (cid:17) sub-parts in P i . Fix node v and consider the ball of radius D around v . Since P i has at least D nodes, this ball is of size at least D and so a Chernoff bound shows that w.h.p Θ(log n ) nodes in thisball will elect themselves a representative, meaning v will have a representative. A union boundover all v shows this to hold for every node. Moreover, the expected number of representatives inpart P i is | P i | log nD and so a Chernoff bound shows that w.h.p there are ˜ O (cid:16) | P i | D (cid:17) sub-parts in P i . Aunion bound shows this holds for all parts w.h.p. We now show in Algorithm 4 how we message-efficiently construct a T -restricted shortcut withrandomization. We rely on the CoreFast shortcut construction algorithm of Haeupler et al. [19].In
CoreFast , a sub-sampled set of vertices broadcast up T , attempting to “claim” edges; edgeswith too many vertices trying to claim them are discarded. To control the message complexity,we only have the ˜ O ( n/D ) sub-part representatives attempt to claim edges. The correctness andruntime of Algorithm 4 is given by Lemma 5.2. Lemma 5.2.
Given partition partition of V , ( P i ) Ni =1 where v ∈ P i knows leader l i , a sub-partdivision, spanning tree T and the existence of a T -restricted shortcut with congestion c and blockparameter b , Algorithm 4 computes a T -restricted shortcut with congestion at most ˜ O ( c ) and blockparameter at most b in ˜ O ( bD + c ) rounds with O ( n ) messages w.h.p.Proof. We first argue runtime and message complexity. Haeupler, Izumi, and Zuzic [19, Lemma 4]show that
CoreFast takes O ( D log n + c ) rounds. However, in this algorithm every node potentiallysends a message up T once leading to super-linear message complexity. By amending CoreFast so only the ˜ O (cid:0) nD (cid:1) sub-part representatives send a message up T once as we do, it is easy to seethat the algorithm uses only ˜ O ( n ) messages total. Lastly, Lemma 4.5 shows that block parameterwith randomization uses only ˜ O ( bD + c )) and ˜ O ( m ) messages.13 lgorithm 4 Randomized shortcut construction.
Input : partition of V , ( P i ) Ni =1 where v ∈ P i knows leader l i ; Input : BFS tree T ; Input : sub-part division
Output : T -restricted shortcut with congestion ˜ O ( c ) and block parameter < b Set all P i active . for O (log n ) iterations do Run
CoreFast [19] shortcut construction algorithm on representatives in active parts. Run Algorithm 2 to compute if block parameter of parts exceed 3 b . Set every P i with block parameter < b on CoreFast result inactive . Let every newly inactive P i use the shortcut edges assigned to it by CoreFast .We now argue correctness. Haeupler, Izumi, and Zuzic [19, Lemma 4] show that each time
CoreFast is run, it computes a T -restricted shortcut with block parameter at most 3 b for at leasthalf of the nodes and congestion at most 8 c . It is easy to see that only having sub-part representa-tives participate in CoreFast does not affect correctness and so we conclude that after O (log n )iterations every P i has been rendered inactive. By construction every P i has block parameter < b and since the congestion of any edge increases by at most 8 c in any iteration of Algorithm 4, thetotal congestion of our returned shortcut is ˜ O ( c ). In this section we show how to construct sub-part divisions and shortcuts deterministically.
Our algorithm for constructing sub-part divisions repeatedly merges together sub-parts until theyare of sufficient size. However, if sub-parts are allowed to merge arbitrarily, the resulting sub-parts may have prohibitively large diameter. The diameter of resulting sub-parts can be limited byforcing sub-parts to always join in a star-like fashion. As such, we begin by providing a deterministicalgorithm to enable such behavior. We term this behavior a star joining . Definition 6.1 (Star joining) . Let ( P i ) Ni =1 partition V . We say a star joining is computed overparts ( P i ) Ni =1 if the following holds: a constant fraction of the parts P i are designated as receivers ,and the other parts P i are designated as joiners . For every joiner part P i , all v ∈ P i knows some(common) edge with one endpoint in P i and another end-point in some receiver part P j . We now show how a star joining can be computed deterministically, given a deterministic PAsolution. We use as a sub-routine the 3-coloring algorithm of Cole and Vishkin [4]. Roughly,the Cole and Vishkin [4] algorithm works as follows. Every node begins with its ID as its color,meaning there are initially n colors. Next, every node updates its color based on its neighbors’colors, logarithmically reducing the number of possible colors. This is then repeated log ∗ n times.For more, see Cole and Vishkin [4]. The properties of this algorithm are as follows. Lemma 6.2. ([4, Corollary 2.1]) An oriented n -vertex graph with maximum out-degree of one canbe -colored in O (log ∗ n ) rounds with O ( m log ∗ n ) messages. lgorithm 5 Deterministic star joining.
Input : ( P i ) Ni =1 s.t. v ∈ P i knows edge e i exiting P i and leader l i ; Input : PA algorithm A Output : a star joining on ( P i ) Ni =1 J , R ← ∅ . Initialize joiners and receiver G ← (cid:16) (( P i ) Ni =1 ) , { e i } Ni =1 (cid:17) R ← R ∪ { P i : δ − G ( P i ) ≥ } by running A . J ← J ∪ { P i : P i
6∈ R ∧ e i = ( P i , P i ) s.t. P i ∈ R} by running A . G ← G \ ( R ∪ J ). Run the 3-coloring algorithm of Cole and Vishkin [4] on G . for color k = 1 , , do R ← R ∪ { P i : P i colored k } by running A . J ← J ∪ { P i : P i
6∈ R ∧ e i = ( P i , P i ) s.t. P i ∈ R} by running A .We give our algorithm for deterministically computing star joinings in Algorithm 5 which worksas follows. Take the super-graph whose nodes are parts and whose edges are the chosen (directed)edges. First, designate parts with at least two incoming edges receivers and all parts with anoutgoing edge into one such part a joiner. These parts constitute all trees in our super-graph andso we next remove them from the super-graph, leaving only (directed) paths and (directed) cycles.On the remaining paths and cycles, simulate the Cole-Vishkin algorithm to compute a 3-coloringof the remaining nodes in the super-graph. For colors k = 1 , , k -colored parts receivers,their neighbors joiners and remove these parts from this process. The properties of our deterministicstar joining algorithm are given by the following lemma. Lemma 6.3.
Let ( P i ) Ni =1 partition V and suppose every v ∈ P i knows some edge e i ∈ E exiting P i .If algorithm A solves PA over ( P i ) Ni =1 , then Algorithm 5 computes a star joining over ( P i ) Ni =1 with O (log ∗ n ) calls to A .Proof. We begin by proving correctness. Line 4 yields stars of joiners centered around receivers.Moreover, the union of all nodes designated in Line 4 from a forest with trees of internal degreeat least 2. Therefore, the number of internal (marked) super-nodes (and therefore the number ofstars) in Line 4 is at most one half of the super-nodes of the tree.Now consider the result of Line 8. As no super-node in G has in-degree at least two at thispoint in the algorithm, the super-graph considered in Line 8 consists of directed cycles and paths.Thus, each time we remove a P i from the super-graph we remove at most three super-nodes fromthe graph and P i gets to merge with its neighbor. It follows that at least of these super-nodesare merged. Combining the first and second stage, we find that the super-nodes are combined intostars, where the number of obtained nodes is less than 2/3 of the original nodes. Therefore theabove algorithm computes a star joining.We now argue that our algorithm requires O (log ∗ n ) runs of A . This clearly holds for all sub-routines of our algorithm except for Line 6. In particular, we must argue how the Cole-Vishkinalgorithm can be efficiently simulated on our super-graph using O (log ∗ n ) runs of A . We repeat thefollowing O (log ∗ n ) times. Let l i be the known leader of P i . Each P i begins with the color of l i ’sID. Next, the node in P i incident to the edge chosen by P i routes the color it received to l i using A .Then, l i performs the Cole-Vishkin computation and then broadcasts V i ’s new color to all nodes in15 i using A . We now use star joinings to deterministically compute sub-part divisions in Algorithm 6 as follows:start with each node in its own sub-part; compute star joinings and merge stars of joiners centeredaround receivers O (log n ) times, fixing sub-parts once they have at least D nodes. Correctness andruntime of Algorithm 6 are given by the following lemma. Algorithm 6
Deterministic sub-part division.
Input : partition of V , ( P i ) Ni =1 Output : a sub-part division for part P i do I i ← {{ v } : v ∈ P i } . Initialize incomplete sub-parts C i ← {} . Initialize complete sub-parts for O (log n ) iterations do for F j ∈ I i do if ∃ edge ( u, v ) ∈ F j × ( S F j ∈I i F j \ F j ) then e j ← e for such edge e = ( u, v ) using PA. else e j ← e for some edge e = ( u, v ) ∈ F j × ( S F j ∈C i F j ) using PA. Run Algorithm 5 on I i sub-parts with edges { e j } to compute a star-joining. for Joiner F j with edge e j = ( u, v ) with v ∈ F j do F j merges with F j using PA. . if F j ∈ C i , update C i accordingly. u remembers v as its parent. F j orients its tree edges to v using PA. C i ← { F j ∈ I i : | F j | ≥ D } using PA. C i ← C i ∪ C i . return Division given by {C i } Ni =1 . Lemma 6.4.
Given partition ( P i ) Ni =1 of V , Algorithm 6 computes a sub-part division of ( P i ) Ni =1 in ˜ O ( D ) rounds with ˜ O ( n ) messages.Proof. Round and message complexities are trivial apart from the fact that we must show thatPA can be solved within our bounds on incomplete sub-parts. However, notice that an incompletesub-part has fewer than D nodes by definition along with a spanning tree in which every nodeknows its parent; as such aggregating within each incomplete sub-part is trivially achievable with O ( D ) rounds and O ( m ) messages.We now argue correctness. Correctness for parts of fewer than D nodes is trivial. Considerparts of more than D nodes. Sub-parts continue to merge until they are complete and have atleast D nodes and so our division clearly has ˜ O (cid:16) P i D (cid:17) sub-parts. It remains to show that everycomplete sub-part’s spanning tree has diameter ˜ O ( D ). When a complete sub-part results from twoincomplete sub-parts joining, its spanning tree has diameter at most 2 D . Call these nodes the core of the complete sub-part. When an incomplete sub-part F j – which has spanning tree with diameter16t most D since it has fewer than D nodes by definition – joins a complete sub-part, it necessarilyjoins by way of nodes in the core. Thus, any node in F j is within 3 D of any node in the core byway of the resulting sub-part’s spanning tree. Similarly, any other subsequent incomplete sub-partthat joins the complete sub-part will be within 4 D of any nodes in F j by way of the associatedspanning tree. Thus, every complete sub-part has spanning tree with diameter at most 4 D . Having shown how sub-part divisions can be computed in a deterministic fashion, we now turn toour deterministic shortcut construction. We rely on heavy path decompositions [39].
Definition 6.5 (Heavy Path Decomposition [39]) . Given a directed tree T , an edge ( u, v ) of T is heavy if the number of v ’s descendants is more than half the number of u ’s descendants; otherwise,the edge is light . A heavy path decomposition of T consists of all the heavy edges in T . It is immediate from the definition that each leaf-to-root path on an n -node tree T intersectsat most b log n c different paths of T ’s heavy path decomposition. Given a rooted tree T of depth D , a heavy path decomposition of T can be easily computed in O ( D ) rounds using O ( n ) messages.Our deterministic shortcut construction algorithm, Algorithm 8, first computes a heavy pathdecomposition and then computes shortcuts on the obtained paths in a bottom-up order. Thus,we first provide a sub-routine, Algorithm 7, that computes shortcuts of congestion O ( c log D ) on apath P . Algorithm 7 assumes every node v begins with a set S ( v ) of part IDs that would like to use v ’s parent edge in the path. For simplicity, we assume vertices of P are numbered by their height, v = 1 , , . . . (i.e., the source of the path is number 1, its parent is numbered 2, etc’). Algorithm 7iteratively extends paths used for shortcuts, repeatedly doubling them in length, unless too muchcongestion results. See Figure 5. This algorithm’s properties are as follows. Algorithm 7
Deterministic shortcut construction for paths.
Input : Path P ⊆ V ; Input : Mapping S : V → ( P j ) Nj =1 ; Input : Desired congestion c Output : Mapping S f : V → ( P j ) Nj =1 for v ∈ V do Set S ( v ) ← S ( v ). for i = 0 , , , . . . , log D − do for every node v ≡ i mod 2 i +1 do if | S i ( v ) | ≥ c then Break ( v, v + 1) and set S i ( v ) ← ∅ . else u ← v + 2 i if no broken edges between v and u then Transmit S i ( v ) from v to u along P . Set S i +1 ( u ) ← S i ( u ) ∪ S i ( v ). return S f = S log D . 17 = 0 and v ⌘ i = 1 and v ⌘ i = 2 and v ⌘ Figure 5: Illustration of Algorithm 7. Source of colored edge gives v that transmits S i ( v ) and sinkgives u that updates S i +1 ( u ) (assuming no edges broken). Black edges give path edges. Lemma 6.6.
Given directed path P of length D , desired congestion c and S : V → ( P i ) Ni =1 whichdenotes for each vertex v which parts want to use v ’s parent edge in P , Algorithm 7 returns S f : V → ( P i ) Ni =1 s.t. for every v ∈ P it holds that | S f ( v ) | = O ( c log D ) in O ( c log D + D ) rounds.Proof. To bound the running time, observe that iteration i of the algorithm can be implementedin c + 2 i rounds. Summing over all iterations i = 0 , , . . . , log D −
1, the bound on the numberof rounds follows. To bound the congestion of the output shortcuts, we prove by induction thatbefore the i -th iteration the congestion on any edge is at most 2 ci . This clearly holds for i = 0.Assume as an inductive hypothesis that before iteration i all edges are used by at most 2 ci parts.In iteration i the only edges whose congestion are potentially increased are those edges exiting u (i.e. edge ( u, u + 1)) such that u ≡ i +1 . The congestion on this edge increases by | S i ( v ) | where v ≡ i mod 2 i +1 . Applying our inductive hypothesis we get that the total congestion onsuch an edge is at most | S i ( v ) | + | S i ( u ) | = 2 ci , implying the claimed bound on the congestion.We now turn to describing the overall shortcut construction algorithm, Algorithm 8, and analyzethe resulting block parameter there. We limit message complexity by only allowing sub-part repre-sentatives to send a message requesting that an edge be used in their part’s shortcut. As we show,each bottom-up computation yields good shortcuts for a constant fraction of parts. Thus, aftereach bottom-up computation, we can use our block parameter verification algorithm – Lemma 4.5– to identify the parts for which our shortcut construction succeeded and freeze the shortcut edgesof said parts. The correctness and runtime of Algorithm 8 is given by Lemma 6.7. Lemma 6.7.
Given: partition ( P i ) Ni =1 where v ∈ P i knows leader l i ∈ P i ; a tree T of depth D which admits a T -restricted shortcut of congestion c and block parameter b ; and a sub-part division,Algorithm 8 deterministically computes in ˜ O ( b ( c + D )) rounds and ˜ O ( m ) messages, a shortcut withcongestion O ( c log n ) and block parameter O ( b ) .Proof. We first prove the runtime and message complexity. As mentioned above, a heavy pathdecomposition of T can be computed in O ( D ) rounds using O ( n ) messages. We now bound themessage and round complexity of each iteration. First note that we can inform every path if it hasa light edge in O ( D ) rounds with O ( n ) messages. Next, running Algorithm 7 O (log n ) times – onceon each heavy path – requires O ( c log D log n + D log n ) rounds and O ( n log n ) messages. Lastly,18 lgorithm 8 Deterministic shortcut construction.
Input : partition of V , ( P i ) Ni =1 with leader l i known by v ∈ P i ; Input : sub-part division
Input : BFS-tree T Output : T -restricted shortcut with congestion ˜ O ( c ) and block parameter < b Initially all P i are active . Compute heavy path decomposition of T . for j = 1 , , . . . , O (log n ) do for all v ∈ V do if v is representative in active part P i then S j ( v ) ← { l i } . else S j ( v ) ← ∅ . Set each heavy path with no incoming light edges active . for b log n c repetitions do Let S f be the output of Algorithm
For active path sink node v and light edge ( v, u ) let S j ( u ) = S j − ( u ) ∪ S v S f ( v ). Set all active paths inactive and all heavy paths with source u as in Line 12 active. Set parts with block parameter < b inactive (see Lemma 4.5). return ∪ j S j ( v ) as v ’s shortcut edgesnotice that informing every node in a path that the path is now active requires O ( D ) rounds using O ( n ) messages. Lastly, by Lemma 4.5, running Algorithm 7 is within our stated bounds. Summingover iterations, we conclude that the overall round and message complexities are ˜ O ( b ( c + D )) and˜ O ( m ) respectively.We now prove correctness. Notice that by Lemma 6.6 the number of parts assigned to an edgein any particular iteration is at most O ( c log D ) and so the overall congestion on any edge is atmost O ( c log D log n ) = ˜ O ( c ).We now analyze the block parameter. In particular, we argue that the number of active partsis at least halved in each iteration. Let A j be the set of active parts in iteration j . Let U j be theset of heavy edges used by H but broken in iteration j and therefore not assigned to any parts by S j . Each edge in U j received at least 2 c − c = c more requests by parts to use it than in H . Thuseach edge in U j receives at least 2 c − c = c requests from parts in A j . However, each part in A j cancontribute at most b such additional requests to a broken edge, as each block can only send oneadditional request towards the tree’s root. Consequently, we have | U j | ≤ A j b c . Next, we say anactive part is bad in iteration j if more than 2 b of its edges of H are broken in iteration j , and good in iteration j otherwise. Note that for a good part the number of blocks in the output shortcut isat most 3 b = O ( b ). On the other hand, every broken heavy edge used in H is used at most c timesin H . We conclude that the number of bad active parts is at most | U j | c b .Combining both upper and lower bounds on | U j | , the number of bad parts active parts is atmost A j / j . Thus, after O (log n ) iterations all parts will be marked inactive, meaningthe block parameter in the returned shortcut is at most 3 b .19 eferences [1] Baruch Awerbuch. Optimal distributed algorithms for minimum weight spanning tree, count-ing, leader election, and related problems. In Proceedings of the 19th Annual ACM Symposiumon Theory of Computing (STOC) , pages 230–240, 1987.[2] Baruch Awerbuch, Oded Goldreich, Ronen Vainish, and David Peleg. A trade-off betweeninformation and communication in broadcast protocols.
Journal of the ACM (JACM) , 37(2):238–256, 1990.[3] F Chin and HF Ting. An almost linear time and o ( n log n + e ) messages distributed algorithmfor minimum-weight spanning trees. In Proceedings of the 26th Symposium on Foundations ofComputer Science (FOCS) , pages 257–266, 1985.[4] Richard Cole and Uzi Vishkin. Deterministic coin tossing with applications to optimal parallellist ranking.
Information and Control , 70(1):32–53, 1986.[5] Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, GopalPandurangan, David Peleg, and Roger Wattenhofer. Distributed verification and hardness ofdistributed approximation.
SIAM Journal on Computing (SICOMP) , 41(5):1235–1265, 2012.[6] Michael Elkin. A faster distributed protocol for constructing a minimum spanning tree.
Journalof Computer and System Sciences , 72(8):1282–1308, 2006.[7] Michael Elkin. An unconditional lower bound on the time-approximation trade-off for thedistributed minimum spanning tree problem.
SIAM Journal on Computing (SICOMP) , 36(2):433–456, 2006.[8] Michael Elkin. A simple deterministic distributed mst algorithm, with near-optimal time andmessage complexities.
Proceedings of the 36th ACM Symposium on Principles of DistributedComputing (PODC) , 2017.[9] Michalis Faloutsos and Mart Molle. A linear-time optimal-message distributed algorithm forminimum spanning trees.
Distributed Computing , 17(2):151–170, 2004.[10] Silvio Frischknecht, Stephan Holzer, and Roger Wattenhofer. Networks cannot compute theirdiameter in sublinear time. In
Proceedings of the 23rd Annual ACM-SIAM Symposium onDiscrete Algorithms (SODA) , pages 1150–1162, 2012.[11] Eli Gafni. Improvements in the time complexity of two message-optimal election algorithms.In
Proceedings of the 4th ACM Symposium on Principles of Distributed Computing (PODC) ,pages 175–185, 1985.[12] Robert G. Gallager, Pierre A. Humblet, and Philip M. Spira. A distributed algorithm forminimum-weight spanning trees.
ACM Transactions on Programming Languages and systems(TOPLAS) , 5(1):66–77, 1983.[13] Juan A Garay, Shay Kutten, and David Peleg. A sublinear time distributed algorithm forminimum-weight spanning trees.
SIAM Journal on Computing (SICOMP) , 27(1):302–316,1998. 2014] Mohsen Ghaffari. Near-optimal distributed approximation of minimum-weight connected dom-inating set. In
Proceedings of the 41nd International Colloquium on Automata, Languages andProgramming (ICALP) , pages 483–494, 2014.[15] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks II: Low-congestion shortcuts, mst, and min-cut. In
Proceedings of the 27th Annual ACM-SIAM Sym-posium on Discrete Algorithms (SODA) , pages 202–219, 2016.[16] Mohsen Ghaffari and Fabian Kuhn. Distributed minimum cut approximation. In
Proceedingsof the 27th International Symposium on Distributed Computing (DISC) , pages 1–15, 2013.[17] Mohsen Ghaffari, Andreas Karrenbauer, Fabian Kuhn, Christoph Lenzen, and Boaz Patt-Shamir. Near-optimal distributed maximum flow. In
Proceedings of the 34th ACM Symposiumon Principles of Distributed Computing (PODC) , pages 81–90, 2015.[18] Bernhard Haeupler and Jason Li. Beating O ( √ n + D ) for distributed shortest path approxi-mations via shortcuts. arXiv preprint arXiv:1802.03671 , 2018.[19] Bernhard Haeupler, Taisuke Izumi, and Goran Zuzic. Low-congestion shortcuts without em-bedding. In Proceedings of the 35th ACM Symposium on Principles of Distributed Computing(PODC) , pages 451–460, 2016.[20] Bernhard Haeupler, Taisuke Izumi, and Goran Zuzic. Near-optimal low-congestion shortcuts onbounded parameter graphs. In
Proceedings of the 30th International Symposium on DistributedComputing (DISC) , pages 158–172, 2016.[21] Bernhard Haeupler, Jason Li, and Goran Zuzic. Minor excluded network families admit fastdistributed algorithms. arXiv preprint arXiv:1801.06237 , 2018.[22] Stephan Holzer and Roger Wattenhofer. Optimal distributed all pairs shortest paths and ap-plications. In
Proceedings of the 31st ACM Symposium on Principles of Distributed Computing(PODC) , pages 355–364, 2012.[23] Taisuke Izumi and Roger Wattenhofer. Time lower bounds for distributed distance oracles. In
International Conference on Principles of Distributed Systems (OPODIS) , pages 60–75, 2014.[24] David R. Karger. Random sampling in cut, flow, and network design problems. In
Proceedingsof the 26th Annual ACM Symposium on Theory of Computing (STOC) , pages 648–657, 1994.[25] Maleq Khan and Gopal Pandurangan. A fast distributed approximation algorithm for min-imum spanning trees. In
Proceedings of the 20th International Symposium on DistributedComputing (DISC) , pages 355–369, 2006.[26] Shay Kutten and David Peleg. Fast distributed construction of k-dominating sets and appli-cations. In
Proceedings of the 14th ACM Symposium on Principles of Distributed Computing(PODC) , pages 238–251, 1995.[27] Shay Kutten, Gopal Pandurangan, David Peleg, Peter Robinson, and Amitabh Trehan. Onthe complexity of universal leader election.
Journal of the ACM (JACM) , 62(1):7, 2015.2128] Christoph Lenzen and Boaz Patt-Shamir. Fast partial distance estimation and applications.In
Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC) ,pages 153–162, 2015.[29] Christoph Lenzen and David Peleg. Efficient distributed source detection with limited band-width. In
Proceedings of the 32nd ACM Symposium on Principles of Distributed Computing(PODC) , pages 375–382, 2013.[30] Ali Mashreghi and Valerie King. Time-communication trade-offs for minimum spanning treeconstruction. In
Proceedings of the 18th International Conference on Distributed Computingand Networking (ICDCN) , 2017.[31] Gary L Miller, Richard Peng, and Shen Chen Xu. Parallel graph decompositions using ran-dom shifts. In
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms andArchitectures (SPAA) , pages 196–203, 2013.[32] Danupon Nanongkai. Distributed approximation algorithms for weighted shortest paths. In
Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC) , pages565–573, 2014.[33] Danupon Nanongkai and Hsin-Hao Su. Almost-tight distributed minimum cut algorithms.In
Proceedings of the 28th International Symposium on Distributed Computing (DISC) , pages439–453, 2014.[34] Jaroslav Nešetřil, Eva Milková, and Helena Nešetřilová. Otakar borůvka on minimum spanningtree problem translation of both the 1926 papers, comments, history.
Discrete Mathematics ,233(1):3–36, 2001.[35] Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. A time- and message-optimaldistributed algorithm for minimum spanning trees. In
Proceedings of the 49th Annual ACMSymposium on Theory of Computing (STOC) , pages 743–756, 2017.[36] David Peleg.
Distributed computing: a locality-sensitive approach . SIAM, 2000.[37] David Peleg and Vitaly Rubinovich. A near-tight lower bound on the time complexityof distributed minimum-weight spanning tree construction.
SIAM Journal on Computing(SICOMP) , 30(5):1427–1442, 2000.[38] Lucia D Penso and Valmir C Barbosa. A distributed algorithm to find k-dominating sets.
Discrete Applied Mathematics , 141(1-3):243–253, 2004.[39] Daniel D Sleator and Robert Endre Tarjan. A data structure for dynamic trees.
Journal ofComputer and System Sciences , 26(3):362–391, 1983.[40] Mikkel Thorup. Fully-dynamic min-cut.
Combinatorica , 27(1):91–127, 2007.[41] Ramakrishna Thurimella. Sub-linear distributed algorithms for sparse certificates and bicon-nected components.
Journal of Algorithms , 23(1):160–179, 1997.22 ppendixA Applications of Our PA Algorithms
Here we outline the applications of our round- and message-optimal PA algorithms for multipledistributed graph problems. We start with a discussion of the problems referred to in Section 1.2,and then discuss additional applications of our PA algorithm to optimization as well as verificationproblems, in Appendix A.2.
A.1 Deferred Proofs of Section 1.2
Here we address how to apply our PA algorithm to solve MST, Approximate Min-Cut and Approx-imate SSSP, starting with a formal definition of these problems.We now restate the round and message complexities we obtain for these problems using ournew PA algorithm and discuss the algorithms used to achieve these bounds. As before, recall thatsince every graph admits a shortcut with b = 1 and c = √ n , our algorithms simultaneously achieveworst-case optimal ˜ O ( m ) message complexity and worst-case optimal ˜ O ( D + √ n ) round complexity. Corollary 1.3.
Given a graph G admitting a tree-restricted shortcut with congestion c and blockparameter b , one can solve MST w.h.p in ˜ O ( bD + c ) rounds and ˜ O ( m ) messages and deterministicallyin ˜ O ( b ( D + c )) rounds with ˜ O ( m ) messages.Proof. Our algorithm uses Theorem 1.2 to simulate Borůvka’s classic MST algorithm [34]. InBorůvka’s algorithm every node initially belongs to its own part. Then, for O (log n ) rounds eachpart merges with the part it is connected to by its minimum outbound edge. The problem ofdetermining the minimum-weight outbound edge of a part is an example of Part-Wise Aggregation,which we solve with the algorithm of Theorem 1.2. Whenever a node v is incident to this edge itremembers that the neighbor along that edge is now in the same part as v . Round and messagecomplexities are trivial and correctness follows from that of Borůvka’s algorithm. Corollary 1.4.
For any (cid:15) > and graph G admitting a tree-restricted shortcut with congestion c and block parameter b , one can (1 + (cid:15) ) -approximate min-cut w.h.p. in ˜ O ( bD + c ) · poly(1 /(cid:15) ) roundsand ˜ O ( m ) · poly(1 /(cid:15) ) messages.Proof. Section 5.2 of Ghaffari and Haeupler [15] provides an algorithm for approximate min-cutbased on shortcuts. The algorithm works roughly as follows: use sampling ideas from Karger [24]to downsample edge weights so that the min-cut is of size O (log n/(cid:15) ); by Thorup [40], we can nowsolve MST O (log n ) · poly(1 /(cid:15) ) times (using certain different weights each time) such that thereis one edge e ∗ in one of our MSTs T ∗ such that the two connected components of T ∗ \ e ∗ definean approximately optimal min-cut; lastly, using a sketching approach this edge can be found bysolving PA poly log( n ) · poly(1 /(cid:15) ) times with high probability. See [15] for a proof of correctness.Our claimed round and message complexities are trivial given Corollary 1.3 and Theorem 1.2, asthe above algorithm requires downsampling edge weights (only requiring O (1) rounds and O ( m )messages) and by Corollary 1.3 and Theorem 1.2, the poly log n · poly(1 /(cid:15) ) instances of MST andPA can be solved with ˜ O ( bD + c ) · poly(1 /(cid:15) ) rounds and ˜ O ( m ) · poly(1 /(cid:15) ) messages.23 orollary 1.5. For any β = O (1 / poly log n ) , given a graph G admitting a tree-restricted shortcutwith congestion c and block parameter b , one can L O (log log n ) / log(1 /β ) -approximate SSSP w.h.p in ˜ O (cid:16) β ( bD + c ) (cid:17) rounds and ˜ O (cid:16) mβ (cid:17) messages.Proof. Haeupler and Li [18] provide a distributed algorithm with the stated round complexity andapproximation factor based on a solution to PA. Roughly the algorithm works as follows. Wecompute O ( log nβ ) low diameter decomposition; in particular, as in Miller et al. [31] every nodestarts a weighted BFS at a randomly-chosen time and runs this weighted BFS for O ( log nβ ) rounds.During this weighted BFS, nodes claim as part of their ball any nodes they reach that have notyet been claimed or started their weighted BFS. The weights used for the weighted BFS changein each round: any weight strictly inside a claimed ball is updated to have weight 0 and any edgeincident to two claimed balls is additively increased. Moreover, the increments used by the weightedBFS geometrically increase in each iteration so that despite increasing edge weights, the weightedBFS can still efficiently proceed. Lastly, the union of all BFS trees returned by every weightedBFS is returned as a tree, T ∗ , that approximates distance in the graph; to inform nodes of theirapproximate distance from the source, the source can simply broadcast on T ∗ .What makes this algorithm difficult to implement is that running a weighted BFS with edgeweights set to 0 requires that a weighted BFS traverse components connected by weight-zero edgesof potentially large diameter “in a single round”. To overcome this issue, Haeupler and Li [18] usePA to efficiently traverse these connected components. For more see Haeupler and Li [18]. Relyingon our algorithm for PA and observing that these dominate the round and message complexity ofthe weighted BFS calls, our claimed round and message complexities follow. A.2 Further Applications of Our PA Algorithms
Here we mention some further applications of PA, all of which prior work has show to have PAas their round and communication bottleneck. For all of these problems our new PA algorithm ofTheorem 1.2 therefore yields ˜ O ( D + √ n )-round and ˜ O ( m )-message algorithms. Graph Verification Problems.
In [5], Das Sarma et al. provided an extensive list of lowerbounds for optimization problems (many of which we referred to throughout this paper, as theirlower bounds prove our algorithms’ round complexity to be optimal). Das Sarma et al. furthershowed that their lower bounds carry over to verification problems. For these problems, the inputis a graph G and a subgraph H of G and an algorithm must verify whether this subgraph satisfiessome property, such as whether H is a spanning tree or H is a cut (see [5] for more).Das Sarma et al. also provided algorithms for this long list of verification problems, relying heav-ily on an optimal MST algorithm and the following connected component algorithm of Thurimella[41, Algorithm 5]. Thurimella’s algorithm, given a graph G and subgraph H as above, outputs alabel ‘ ( v ) for each vertex v ∈ V ( G ) such that ‘ ( u ) = ‘ ( v ) if and only if u and v are in the sameconnected component of H . The problem solved by Thurimella is easily cast as an instance of PA,by having each part elect a leader in a connected component in H – say, a node of minimum ID– and use the leader’s ID as a label. Without repeating the arguments of Das Sarma et al. [5], We note that for bipartiteness verification, Das Sarma et al. relied on the algorithm of [41] also outputting arooted spanning tree of each connected component of H with each vertex knowing its level in the tree. As our PAalgorithm maintains such rooted spanning trees, it can also be used to solve this verification problem within the samebounds.
24e note that as MST and Thurimella’s algorithm require ˜ O ( D + √ n ) rounds (based on [26]), sodo the algorithms of Das Sarma et al., and this is tight for these verification problems by theirlower bounds. Our MST and PA algorithms show that for the long list of verification problems DasSarma et al. considered, optimal round complexity does not preclude optimal message complexity,as we can attain both simultaneously. Corollary A.1.
All the graph verification problems considered in [5, Section 8] can be solved inan optimal ˜ O ( D + √ n ) rounds and ˜ O ( m ) messages. Approximation of Minimum-Weight Connected Dominating Set
Another application ofour PA algorithms follows from the work of Ghaffari [14]. In that work, Ghaffari shows that thealgorithm of Thurimella [41], discussed above, can be used to to compute an O (log n )-approximateminimum weight connected dominating set (a set of nodes S such that all vertices in G are atdistance at most one from a node in S ). In particular, Ghaffari relied on the ability to extendThurimella’s algorithm to the case where nodes also have some value x ( v ) assigned to them sothat the label of nodes in a connected component can be equal to: (A) the list of the k = O (1)largest values x ( v ) in the component, or (B) the sum of values x ( v ) in the component. Both theseapplications can be cast as instances of PA. Plugging in our PA algorithm into Ghaffari’s algorithm[14], we obtain the following. Corollary A.2.
There exists an ˜ O ( D + √ n ) -round, ˜ O ( m ) -message algorithm which computes an O (log n ) -approximate minimum weight connected dominating set. k -dominating sets. Another distributed primitive used in multiple distributed graph algorithmsis the problem of computing an O ( n/k )-node k -dominating set. That is, a set of nodes S such thatevery node in G is at distance at most k from some node in S . This problem has found applications indistributed algorithm for MST [26] and (1 + (cid:15) )-approximate eccentricity computation [22]. For thisproblem ˜ O ( k )-round algorithms are known, [26]), including some with linear message complexity[38]. However, for large k , i.e. k (cid:29) max { D, √ n } , no ˜ O ( D + √ n )-round, ˜ O ( m )-message algorithmwas known. Such an algorithm follows immediately from a simple generalization of our sub-partdivision algorithm, as follows. As in Algorithm 6, we repeatedly merge sub-parts, marking a sub-part as complete when it attains some size. Unlike the above algorithm, this threshold is chosen tobe k/ D . By arguments which are a simple generalization of Algorithm 6’s analysis,this implies that the obtained sub-parts have diameter at most k , and that each sub-part containsat least k/ n/k and it forms a k -dominating set. The one delicate point to notice isthat now we can solve Part-Wise Aggregation using our round- and message-optimal algorithmfor PA – unlike in Algorithm 6, which is a sub-routine in our PA algorithm. In particular, even if k (cid:29) max { D, √ n } , this can be done in ˜ O ( D + √ n ) rounds (and ˜ O ( m ) messages). Therefore, as eachof the O (log n ) iterations of this algorithm can be implemented using some O (log ∗ n ) many callsto PA (by Lemma 6.3), together with some local computation, we obtain the following corollary ofTheorem 1.2. Corollary A.3.
For any integer k , there exists an ˜ O ( D + √ n ) -round, ˜ O ( m ) -message algorithmwhich computes a k -dominating set of size O ( n/k ) . D + √ n ) bounds for distributedgraph algorithms can be matched with ˜ O ( m ) messages for an even wider range of problems notdiscussed in this paper. B Dispensing with Known Leader Assumption
Throughout this paper we have assumed that parts always know a leader. That is for every part P i every node v ∈ P i knows the ID of some leader l i ∈ P i . We solved PA assuming that this holds.We now show that this assumption can dispensed with. In particular, we demonstrate that analgorithm that solves PA with the assumption of a known leader for each part can be convertedinto one that makes no such assumption with only logarithmic overhead in round and messagecomplexities. The conversion is deterministic and so it demonstrates that a known leader is notrequired for either or deterministic or our randomized results.Our algorithm is Algorithm 9 and works as follows: start with the singleton partition whereevery node is its own leader; repeatedly coarsen this partition O (log n ) times until it matches theinput PA partition by applying our PA solution that assumes that a leader is known to merge thestars given by a star joining. At each step in the coarsening we maintain the invariant that everypart knows a leader and so in the end we need only solve PA with a known leader which we cando by assumption. Algorithm 9
PA without leaders.
Input : PA instance with parts ( P i ) Ni =1 ; Input : PA algorithm, A , that assumes every part knows a leader. Output : a solution to the input PA problem. for all i ∈ | V | do Set P i ← { i } and l i ← i . . Each P i maintains a leader l i , initially set to i for O (log n ) rounds do for all part P i do Pick some e i = ( u, v ) ∈ P i × ( V \ P i ) by running A where u, v are in the same P i . Compute a star joining with Algorithm 5 over the P i s using edges { e i } . for all part P i which joined P j in the star joining do Inform each v ∈ P i that their leader is now l j by running A . Merge P i into P j . Run A on the PA instance consisting of the P i s, each with a known leader l i . Lemma B.1.
Given a PA instance where no leaders are known and a PA algorithm, A , thatassumes leaders are known using R rounds and M messages, Algorithm 9 solves the PA instancewith no leaders in ˜ O ( R ) rounds and ˜ O ( M ) messages.Proof. We first prove round and message complexities. Our algorithm runs A to solve PA with aknown leader and Algorithm 5 to compute a star-joining logarithmically many times. The latterconsists of O (log ∗ n ) calls to A . Thus, the stated round and message complexities follow trivially.We now argue correctness. Each round a constant fraction of the P j s participating in thealgorithm get to merge by definition of a star joining and so O (log n ) repetitions are sufficient to26oarsen every P j to a P i . Moreover, P , . . . , P N is valid input to A since we maintain the invariantthat every node in a P j has an elected leader. At the end of this coarsening our PA instance nowhas elected leaders. By the correctness of A our algorithm is correct. C Our Results In Tabular Form
Throughout the paper we state our results in utmost generality by giving our algorithms’ runningtime in terms of the optimal block parameter b and congestion c . As stated in Theorem 1.2 and itsCorollaries 1.3, 1.4 and 1.5 and Appendix A, for PA and the wide range of application problemswe consider, our deterministic algorithms terminate in ˜ O ( b ( D + c )) rounds and our randomizedalgorithms terminate in ˜ O ( bD + c ) rounds. To make these bounds more concrete, we review someknown bounds on the parameters b and c in Table 1, and then state the implied running times ofall our algorithms for the above problems in Table 2. General Planar Genus g Treewidth t Pathwidth p Minor Free[15] [15] [19] [20] [20] [21] b O (log D ) O ( √ g ) O ( t ) p ˜ O ( D ) c √ n ˜ O ( D ) ˜ O ( √ gD ) ˜ O ( t ) p ˜ O ( D )Table 1: Known bounds on block parameter, b , and congestion, c .General Planar Genus g Treewidth t Pathwidth p Minor FreeDeterministic ˜ O ( D + √ n ) ˜ O ( D ) ˜ O ( gD ) ˜ O ( tD + t ) ˜ O ( pD + p ) ˜ O ( D )Randomized ˜ O ( D + √ n ) ˜ O ( D ) ˜ O ( √ gD ) ˜ O ( tD ) ˜ O ( pD ) ˜ O ( D )Table 2: Summary of running times of our algorithms.Reviewing Table 2, we note that for all problems considered, a matching worst case round lowerbound of ˜Ω( D + √ n ) is given by Das Sarma et al. [5], while a trivial lower bound of Ω( D ) holds forthese problems for all graphs. Our algorithms match the worst case bounds and the Ω( D ) lowerbound (up to polylog terms) for any constant genus, treewidth and pathwidth, all while requiringonly ˜ O ( m ) messages. The exact optimal dependence on the parameters g, t and p remains an openquestion. The two exceptions to this rule are our L (cid:15) -approximate SSSP and (1 + (cid:15) )-approximate Min-Cut, for which the(randomized) bounds hold as stated in the table only for fixed (or polylogarithmic) (cid:15) ..