[PDF] Distributed Symmetry-Breaking Algorithms for Congested Cliques

Abstract

The {Congested Clique} is a distributed-computing model for single-hop networks with restricted bandwidth that has been very intensively studied recently. It models a network by an n -vertex graph in which any pair of vertices can communicate one with another by transmitting O(logn) bits in each round. Various problems have been studied in this setting, but for some of them the best-known results are those for general networks. In this paper we devise significantly improved algorithms for various symmetry-breaking problems, such as forests-decompositions, vertex-colorings, and maximal independent set. We analyze the running time of our algorithms as a function of the arboricity a of a clique subgraph that is given as input. Our algorithms are especially efficient in Trees, planar graphs, graphs with constant genus, and many other graphs that have bounded arboricity, but unbounded size. We obtain O(a) -forest-decomposition algorithm with O(loga) time that improves the previously-known O(logn) time, O( a 2+ϵ ) -coloring in O( log ∗ n) time that improves upon an O(logn) -time algorithm, O(a) -coloring in O( a ϵ ) -time that improves upon several previous algorithms, and a maximal independent set algorithm with O( a − − √ ) time that improves at least quadratically upon the state-of-the-art for small and moderate values of a . Those results are achieved using several techniques. First, we produce a forest decomposition with a helpful structure called { H -partition} within O(loga) rounds. In general graphs this structure requires Θ(logn) time, but in Congested Cliques we are able to compute it faster. We employ this structure in conjunction with partitioning techniques that allow us to solve various symmetry-breaking problems efficiently.

Full PDF

aa r X i v : . [ c s . D C ] F e b Distributed Symmetry-Breaking Algorithms forCongested Cliques ∗ Leonid Barenboim Victor KhazanovFebruary 21, 2018

Abstract

The

Congested Clique is a distributed-computing model for single-hop networks with restricted band-width that has been very intensively studied recently. It models a network by an n -vertex graph in whichany pair of vertices can communicate one with another by transmitting O (log n ) bits in each round. Var-ious problems have been studied in this setting, but for some of them the best-known results are thosefor general networks. For other problems, the results for Congested Cliques are better than on generalnetworks, but still incure signiﬁcant dependency on the number of vertices n . Hence the performanceof these algorithms may become poor on large cliques, even though their diameter is just 1. In thispaper we devise signiﬁcantly improved algorithms for various symmetry-breaking problems, such asforests-decompositions, vertex-colorings, and maximal independent set.We analyze the running time of our algorithms as a function of the arboricity a of a clique subgraphthat is given as input. The arboricity is always smaller than the number of vertices n in the subgraph,and for many families of graphs it is signiﬁcantly smaller. In particular, trees, planar graphs, graphswith constant genus, and many other graphs have bounded arboricity, but unbounded size. We obtain O ( a )-forest-decomposition algorithm with O (log a ) time that improves the previously-known O (log n )time, O ( a ǫ )-coloring in O (log ∗ n ) time that improves upon an O (log n )-time algorithm, O ( a )-coloringin O ( a ǫ )-time that improves upon several previous algorithms, and a maximal independent set algorithmwith O ( √ a ) time that improves at least quadratically upon the state-of-the-art for small and moderatevalues of a .Those results are achieved using several techniques. First, we produce a forest decomposition witha helpful structure called H -partition within O (log a ) rounds. In general graphs this structure requiresΘ(log n ) time, but in Congested Cliques we are able to compute it faster. We employ this structurein conjunction with partitioning techniques that allow us to solve various symmetry-breaking problemseﬃciently. In the message-passing

LOCAL model of distributed computing a network is represented by an n -vertexgraph G = ( V, E ). Each vertex has its own processing unit and memory of unrestricted size. In addition,each vertex has a unique identity number (ID) of size O (log n ). Computation proceeds in synchronousrounds. In each round vertices perform local computations and send messages to their neighbors. Therunning time in this model is the number of rounds required to complete a task. Local computation is notcounted towards running time. Message size is not restricted. Therefore, this model is less suitable fornetworks that are constrained in message size as a result of limited channel bandwidth. To handle such ∗ Open University of Israel. Email: [email protected] ; [email protected]. This research has been supported by ISFgrant 724/15 and Open University of Israel research fund.

CONGEST model that is similar to theLOCAL model, except that each edge is only allowed to transmit O (log n ) bits per round. An importanttype of CONGEST networks that has been intensively studied recently is the Congested Clique model.It represents single-hop networks with limited bandwidth. Although the diameter of such networks is 1,which would make any problem on such graphs trivial in the LOCAL model, in the Congested Cliquesvarious tasks become very challenging. Note that the Congested Clique is equivalent to a general n -vertexgraph in which any pair of vertices (not necessarily neighbors) can exchange messages of size O (log n ) ineach round. Such a general graph corresponds to a subgraph of an n -clique. The subgraph constitutes theinput, while the clique constitutes the communication infrastructure.The study of the problem of Minimum Spanning Tree (henceforth, MST) was initiated in the CongestedClique model by Lotker et al. [21]. They devised a deterministic O (log log n )-rounds algorithm thatimproved a straight-forward O (log n ) solution. In the sequel, randomized O (log log log n )-rounds- [13],[22], O (log ∗ n )-rounds - [10], and O (1)-rounds [16] algorithms for MST in Congested Cliques were devised.These algorithms, however, may fail with certain probabilities. Thus obtaining deterministic algorithmsthat never fail seems to be a more challenging task in this setting. Since the publication of the resultof [21] many additional problems have been studied in the Congested Clique setting [5, 6, 8, 9, 14]. Inparticular, several symmetry-breaking problems were investigated. Solving such problems is very useful innetworks in order to allocate resources, schedule tasks, perform load-balancing, and so on. Hegeman andPemmaraju [14] obtained a randomized O (∆)-coloring algorithm with O (1) rounds if the maximum degree∆ is at least Ω(log n ), and O (log log n )-time otherwise. We note that although in a clique it holds that∆ = n −

1, and an O (∆)-coloring algorithm is trivial (by choosing unique vertex identiﬁers as colors), theproblem is deﬁned in a more general way. Speciﬁcally, we are given a clique Q = ( V, E ), and a subgraph G ′ = ( V, E ′ ) , E ′ ⊆ E . The goal is computing a solution for G ′ as a function of ∆ = ∆( G ′ ), rather then∆( Q ). In this case the O (∆)-coloring problem becomes non-trivial at all. We are not aware of previously-known deterministic algorithms for coloring in the Congested Clique that outperform algorithms for generalgraphs. (Except an algorithm of [6] that is not applicable in general, but rather if ∆ = O ( n / ). In thiscase its running time is O (log ∆).)Another symmetry-breaking problem that was studied in the Congested Clique is Maximal Indepen-dent Set (henceforth, MIS). The goal of this problem is to compute a subset of non-adjacent verticesthat cannot be extended. Again, this problem is interesting in subgrahs of the Congested Clique, ratherthan the Congested Clique as a whole. A deterministic algorithm for this problem with running time O (log ∆ log n ) was devised in [6]. If ∆ = O ( n / ) then the running time of the algorithm of [6] improvesto O (log ∆). Ghaﬀari [9] devised a randomized MIS algorithm for the Congested Clque that requires˜ O (log ∆ / √ log n + 1) ≤ ˜ O ( √ log ∆) rounds. Interestingly, when ∆ is not restricted, all above-mentioneddeterministic algorithms and most randomized ones have signiﬁcant dependency on the clique size n . Ob-taining a deterministic algorithm for these problems that does not depend on n is an important objective,since very large clique subgraphs may have some bounded parameters (e.g., bounded arboricity) that canbe utilized in order to improve running time. In this paper we devise improved deterministic symmetry-breaking algorithms for the Congested Cliquethat have very loose dependency on n , or not at all. Speciﬁcally, for clique subgraphs with arboricity a we obtain O ( a )-coloring in O ( a ǫ ) time (for an arbitrarilly small constant ǫ > O ( a ǫ )-coloring in O (log a ) time, O ( a (2+ ǫ ) )-coloring in O (log ∗ n ) time and Maximal Independent Set in O ( √ a ) time. The log ∗ n is the number of times the log function has to be applied iteratively until we arrive at a number smaller than 2.That is, log ∗ n > , log ∗ n = 1 + log ∗ (log n ). The arboricity is the minimum number of forests that graph edges can be partitioned into. It always holds that a ( G ′ ) ≤ ∆( G ′ ), and often the arboricity of a graph is signiﬁcantly smaller than its maximum degree. n . See table below. Moreover, the log n factor is unavoidable when solvingthese problems in general graphs [2]. Our results demonstrate that in Congested Cliques much bettersolutions are possible. Our MIS algorithm outperforms the results of [5] when there is a large gap between a and ∆ or between a and n . For example, trees, planar graphs, graphs of constant genus, and graphs thatexclude any ﬁxed minor, all have arboricity a = O (1). On the other hand, their maximum degree ∆ andsize n are unbounded. Our Results (Deterministic) Previous Results (Deterministic and Randomized)

Running Time Running TimeForest-Decomposition O (log a ) Forest-Decomposition [2] O (log n ) O ( a ε )-coloring O (log ∗ n ) O ( a ε )-coloring [2] O (log n ) O ( a ) − coloring O (log a ) + log ∗ n O ( a ) − coloring [2] O (log n ) O ( a ε )-coloring O (log a ) O ( a ε )-coloring [3] O (log a log n ) O ( a )-coloring O ( a ε ) O ( a )-coloring [3] O (min( a ε log n, a ε + log ε n ))MIS O ( √ a ) MIS [2] O ( a + log n )MIS [6] O (log ∆ log n )MIS (rand.) [9] ˜ O ( √ log ∆) O (∆)-coloring O ( a ǫ ) O (∆)-coloring (rand.) [14] O (log log n )Our main technical tool is an O ( a )-forests-decomposition algorithm that requires O (log a ) rounds inthe Congested Clqiue. This is in contrast to general graphs where O ( a )-forests-decomposition requiresΘ(log n ) rounds. Once we compute such a forests decomposition, each vertex knows its O ( a ) parents inthe O ( a ) forests of the decomposition. We orient edges towards parents. The union of all edges that pointtowards parents constitute the edge set E ′ of the input. This is because for each edge, one of its endpointis oriented outwards, and is considered in the union. Note also that the out degree of each vertex is O ( a ).Then, within O ( a ) rounds each vertex can broadcast the information about all its outgoing edges to allother vertices in the graph. Indeed, each outgoing edge can be represented by O (log n ) bits using IDsof endpoints. Then, in round i ∈ O ( a ), each vertex broadcasts to all vertices the information of its i thoutgoing edges. After O ( a ) rounds all vertices know all edge of E ′ and are able to construct locally (intheir internal memory) the input graph G ′ = ( V, E ′ ).Once vertices know the input graph they can solve any computable problem (for unweighted graphs orgraphs with weights consisting of O (log n ) bits) locally. The vertices run the same deterministic algorithmlocally, and obtain a consistent solution (the same in all vertices). Then each vertex deduces its part fromthe solution of the entire graph. This does not require communication whatsoever, and so the additional(distributed) running time for this computation is 0. Thus our results demonstrate that any computableproblem can be solved in the Congested Clique in O ( a ) rounds deterministically. This is an alternativeway of showing what follows from Lenzen’s [19] routing scheme, since a graph with arboricity a has O ( n · a )edges that can be announced within O ( a ) rounds of Lenzen’s algorithm. But the additional structure offorests-decomposition that we obtain is useful for speeding up certain computations, as we discuss below.We note that although in this model it is allowed to make unrestricted local computation, in this paper wedo not abuse this ability, and devise algorithms whose local computations are reasonable (i.e., polynomial).Since any computable problem can be solved in O ( a ) rounds, our next goal is obtaining algorithms witha better running time. We do so by partitioning the input into subgraphs of smaller arboricity. We notethat vertex disjoint subgraphs are Congested Cliques by themselves that can be processed in parallel. Forexample, partitioning the input graph into O ( a − ǫ )-subgraphs of arboricity O ( a ǫ ), and coloring subgraphsin parallel using disjoint palettes, makes it possible to color the entire input graph with O ( a ) colors in O ( a ǫ ) time rather than O ( a ). Partitioning also works for MIS, although this problem is more diﬃcult3o parallelize. (In the general CONGEST model the best algorithm in terms of a has running time O ( a + log ∗ n ).) Nevertheless, using our new partitioning techniques we obtain an MIS with O ( √ a ) timein the Congested Clique. We believe that this technique is of independent interest, and may be applicablemore broadly. Speciﬁcally, by quickly partitioning the input into subgraphs of small arboricity, we cansolve any computable problem in these subgraphs in O ( a ǫ ) time, rather than O ( a ). Given a methodthat eﬃciently combines these solutions, it would be possible to obtain a solution for the entire inputsigniﬁcantly faster than O ( a ). Lenzen [19] devised a communication scheme for the Congested Clique. Speciﬁcally, if each vertex is re-quired to send O ( n ) meassages of O (log n ) bits each, and if each vertex needs to receive at most O ( n )messages, then this communication can be performed within O (1) rounds in the Congested Clique. Al-gebraic methods for the Congested Clique were studied in [5, 8]. Symmetry-breaking problems were veryintensively studied in general graphs. Many of these results apply to the Congested Clique. In particular,Goldberg, Plotkin, and Shannon [12] devised a (∆ + 1)-coloring algorithm with running time O (∆ log n ).Goldberg and Plotkin [11] devised an O (∆ )-coloring algorithm with running time O (log ∗ n ) for constantvalues of ∆. Linial [20] extended this result to general values of ∆. Kuhn and Wattenhofer [18] obtaineda (∆ + 1) coloring algorithm with running time O (∆ log ∆ + log ∗ n ). Barenboim and Elkin [3] devisedan O (min( a ε log n, a ε + log ε n ))-time algorithm for O ( a )-coloring, and O (log a log n )-time algorithm for O ( a ε )-coloring. We provide some deﬁnitions and survey several known procedures that are needed for our algorithmsthat we describe in the next sections. We relegate descriptions of known procedures to Appendix A.This includes H -partitions, Forests-Decomposition, Defective-coloring, O ( a )-proper-coloring and Lenzen’srouting schem in the Congested Clique. Readers that are familiar with these concepts may proceed directlyto Section 3 after reading Section 2.1. The k -vertex-coloring problem is deﬁned as follows. Given a graph G = ( V, E ), ﬁnd a proper coloring ϕ : V → ϕ ( v ) = ϕ ( u ) , ∀ ( u, v ) ∈ E . The out-degree of a vertex v in a directed graph isthe number of edges incident to v that are oriented out of v . An orientation µ of (the edge set of) a graphis an assignment of direction to each edge ( u, v ) ∈ E either towards u or towards v . Consider a graph G = ( V, E ) in which some of the edges are oriented. In our work we use a concept of partial orientations ,which was employed by Barenboim and Elkin [3]. A partial orientation is allowed not to orient some edgesof the graph. By this deﬁnition, a partial orientation σ has deﬁcit at most d , for some positive integerparameter d , if for every vertex v in the graph the number of edges incident to v that σ does not orientis no greater than d . Another important parameter of a partial orientation is its length l . This is thelength of the longest path P in which all edges are oriented consistently by σ . (That is, each vertex in thepath has out-degree and in-degree at most 1 in the path.) An H -partition ( H , H , ..., H ℓ ) of G = ( V, E )with degree A , for some parameter A , is a partition of V , such that for any vertex in a set H i , i ∈ [ ℓ ], thenumber of its neighbors in H i ∪ H i +1 ∪ ... ∪ H ℓ is at most A .4 Forest-Decomposition-CC

In this section we describe our Forest-Decomposition algorithm for the Congested Clique. Our Forest-Decomposition algorithm starts with computing an H -partition. This computation is performed fasterin Congested Cliques than in general graphs thanks to the following observation. Once the ﬁrst O (log a ) H -sets are computed (within O (log a ) time), the subgraph induced by the remaining active vertices has atmost O ( n ) edges. (We prove this in Lemma 3.1 below.) Consequently, all these vertices can learn thisentire subgraph using Lenzen’s algorithms within O(1) rounds. Then each vertex can locally compute the H -set it belongs to. This is in contrast to the algorithm for general graphs where the running time isΘ(log n ), even for graphs with O ( n ) edges.First we provide a procedure which computes an H -partition within O (1) rounds, on graphs with edgeset of size at most O ( n ). This procedure is based on Lenzen’s routing scheme. The main idea of theprocedure is that each vertex can transmit all edges adjacent on it to all other vertices in the graph. Thisis because the overall number of messages each vertex receives in this case is O ( n ). Indeed, each edge canbe encoded as a message of size O (log n ) that contains the IDs of the edge endpoints, and the number ofmessages is bounded by the number of edges in the graph. Since the number of sent messages of each vertexis also bounded by O ( n ), Lenzen’s scheme allows all vertices to transmit all their edges to all other verticeswithin constant number of rounds, as long as the number of edges is O ( n ). Once a vertex receives all theedges of the graph, it constructs the graph in its local memory. All vertices construct the same graph, andperform a local computation of the H -partition. This does not require any communication whatsoever, butsince all vertices hold the same graph, the resulting H -partition is consistent in all vertices. This completesthe description of the procedure. Algorithm 1 H -partition of an input graph G with arboricity a and O ( n ) edges procedure Sparse-Partition ( G, a, ε ) Each node u in G broadcasts its degree to every other node v in G Using Lenzen’s scheme, send all information about all edges to all vertices of G Each vertex v ∈ V perfomrs locally the following operations: Initially, all vertices of G are marked as active. i = (cid:6) ε log a + 1 (cid:7) while i ≤ ε log n do if v is active and has at most (2 + ε ) · a active neighbors then make v inactive add v to H i i = i + 1 5ext, we provide a general procedure to compute an H -partition in graphs with any number of edgesin the Congested Clique model. The preocedure is called Procedure H -Partition-CC . The computation isdone by ﬁrst reducing the number of edges to O ( n ) within O (log a ) rounds, and then invoking ProcedureSparse-Partition on the remaining subgraph. The reduction phase (lines 3 - 13 of the algorithm below)operates similarly to Procedure Sparse-Partition, but the partition into H -sets is performed in a distributedmanner, rather than locally, and the number of iterations is just O (log a ), rather than O (log n ). In thenext lemmas we show that this is suﬃcient to reduce the number of edges to O ( n ). Algorithm 2

Computing an H -partitions of a general graph G with arboricity a in the Congested Cliquemodel procedure H-Partition-CC ( a, ε ) An algorithm for each vertex v V : i = 1 while i ≤ (cid:6) ε · log a (cid:7) do if v is active and has at most (2 + ε ) · a active neighbors then make v inactive add v to H i send the messages ”inactive” and ” v joined H i ” to all the neighbors for each received ”inactive” message do mark the sender neighbor as inactive end for i = i + 1 end while H i , H i +1 ..., H O (log n ) = invoke Procedure Sparse-Partition on the subgraph induced by remainingactive vertices Lemma 3.1.

After (cid:6) ε log a (cid:7) rounds (lines 4-13 in Algorithm 2), the number of edges whose both endpointsare incident to nodes that are still active is O ( n ) .Proof. Consider the i th iteration. By Lemma A.4 in Appendix A, the graph G i induced by the remainingactive vertices in the round i has ( ε ) i · | V | vertices. Recall that a graph with arboricity a has no morethan n · a edges. The number of edges in the graph G i is at most: ( ε ) i · n · a . Then in the round i = (cid:6) ε log a (cid:7) , the graph G i has ( ε ) ⌈ ε log a ⌉ · n · a = O ( n ) edges.The next lemma states the correctness of Algorithm 2, as well as its running time. Lemma 3.2.

Algorithm 2 computes an H -partion in O (log a ) rounds.Proof. The correctness of Algorithm 1 follows from the correctness of H-partition of [2] in conjunction withLenzen’s routing scheme. Speciﬁcally, within O (log a ) rounds the algorithm properly computes the H -sets H , H ,..., H O (log a ) , and within an additional round the remaining subgraph is learnt by all vertices usingLenzen’s scheme, and all H -sets of this subgraph, up to H O (log n ) , are computed locally by each vertex.Thus, each vertex can deduce the index of its H -set within O (log a ) rounds from the beginning of thealgorithm.We summarize the properties of Procedure H-Partition-CC in the following theorem: Theorem 3.3.

Procedure H-Partition-CC invoked on a graph G with arboricity a(G) and a parameter ε , < ε ≤ H -partition of size l = O (log n ) with degree at most O ( a ) . The running time of theprocedure is O (log a ) .

6e next devise a forest-decomposition algorithm for the Congested Clique model, called

ProcedureForest-Decomposition-CC . It accepts as input the parameters a and ε . In the ﬁrst step, it computes an H-Partition-CC , with degree at most (2 + ε ) · a . In the next step, it invokes a procedure called ProcedureOrientation [3] as follows.Procedure Orientation: For each edge e = ( u, v ), if the endpoints u, v are in diﬀerent sets H i , H j , i = j ,then the edge is oriented towards the vertex in the set with a greater index. Otherwise, if i = j , the edge e is oriented towards the vertex with a greater ID among the two vertices u and v . The orientation µ produced by this step is acyclic. Each vertex has out-degree at most (2 + ε ) · a . The correctness of theprocedure follows from the correctness of Procedure Orientation from [2].The last step of the algorithm is partitioning the edge set of the graph into forests as follows: eachvertex is in charge of its outgoing edges, and it assigns each outgoing edge a distinct label from the set { , , ..., (2 + ε ) · a } . This completes the description of the algorithm. Its pseudocode and analysis areprovided below. Algorithm 3

Partitioning of the edge set of G into ( ⌊ (2 + ε ) · a ⌋ ) forests in the Congested-Clique model procedure Forests-Decomposition-CC ( a, ε ) invoke Procedure H -Partition-CC( a , ε ) µ = Orientation() assign a distinct label to each µ -outgoing edge of v from the set [ ⌊ (2 + ε ) · a ⌋ ] Lemma 3.4.

The time complexity of Procedure Forests-Decomposition-CC is O (log a ) .Proof. Procedure H-Partition-CC takes O (log a ) time, and steps (2) and (3) of Forests-Decomposition-CC require O(1) rounds each. Therefore, the overall time of Procedure Forests-Decomposition-CC is O (log a ). Theorem 3.5.

For a graph G with arboricity a = a ( G ) , and a parameter ε, < ε ≤ , in CongestedClique, Procedure Forests-Decomposition-CC ( a, ε ) partitions the edge set of G into ( ⌊ (2 + ε ) · a ⌋ ) forestsin O (log a ) rounds. Moreover, as a result of its execution each vertex v knows the label and the orientationof every edge ( v, u ) adjacent to v. O ( a ) time in Congested Clique In this section we describe how to solve any computable problem in O ( a ) time in the Congested Clique.We note that since any graph with arboricity a has O ( a · n ) edges, this is possible to achieve by directlyapplying O ( a ) rounds of Lenzen’s scheme [19]. However, in this section we present an alternative solutionthat employs forest-decompositions. Given a forest-decomposition in which the number of parents (i.e.outgoing edges) of each vertex is bounded by O ( a ), we can solve any computable problem within thisnumber of rounds. Speciﬁcally, once Procedure Forests-Decomposition-CC is invoked, it partitions theedge set of G into ( ⌊ (2 + ε ) · a ⌋ ) forests in O (log a ) rounds. As a result of its execution, each vertex v knowsthe label and the orientation of every edge ( v, u ) adjacent to u . An outgoing edge from a vertex v to avertex u labeled with a label i means that u is the parent of v in a tree of the i th forest F i . Therefore,by transmitting the information of a distinct parent in a round, each vertex can inform all other verticesof the graph about all its parents. This will require an overall of O ( a ) rounds - one round per parent.Then, each vertex knows all parents of all vertices in the graph G . But this information is suﬃcient toconstuct the graph G locally. Indeed, for each edge e of the graph G , one of its enpoints is a parentof the other in some forest i , and thus this edge is announced to all vertices in round i . Within O ( a )rounds, all edges are announced, and so the entire graph is known to all vertices. Therefore, we can solve7ny computable problem on G locally (without any additional communication), by executing the samedeterministic algorithm on the same graph that is known to all. This guarantees a consistent solutionin all vertices. Thus, we obtain a general solution with O ( a ) time to any computable problem in theCongested Clique. (Note that this is true either if the input graph G is unweighted or if G has weightson edges that require O (log n ) bits per edge. In the latter case, the information about weights can betransmitted together with the information about parents withouth aﬀecting the running time bound O ( a ).Recall, however, that all our algorithms in this paper are for unweighted graphs.) Therefore, it would bemore interesting to ﬁnd faster than Θ( a ) algorithms for various problems. We obtain such algorithms inthe next sections. O ( a ) -coloring in O (log a + log ∗ n ) time Note that in the synchronous message-passing model of distributed computing a proper O ( a )- coloringrequires Θ(log n ) time [20]. However, in Congested Clique we can improve the running time and reacheven better result of O (log a ) + log ∗ n .In this section we employ Procedure Forests-Decomposition-CC to provide an eﬃcient algorithm thatcolors the input graph G of arboricity a=a(G) in O ( a ) colors. The running time of the algorithm is O (log a ) + log ∗ n . For computing an O ( a )-coloring we will use Procedure Arb-Linial described in [2].Procedure Arb-Linial accepts a graph G with arboricity a ( G ). Given an O ( a )-forests-decomposition of G ,the procedure computes a proper coloring ϕ of the graph using O ( a ) colors in O (log ∗ n ) running time.During the execution of this procedure, each vertex transmits at most O (log n ) bits over each edge in eachround.Procedure Forest-Decomposition-CC has better running time than the respective procedure on generalgraphs, which allows us to compute a proper O ( a )-coloring of the graph very quickly. We devise aprocedure called Procedure Arb-Coloring-CC that works in the following way. The procedure starts byexecuting Procedure Forest-Decomposition-CC with the input parameter a = a ( G ). This invocation returnsan H -partition of G of size l ≤ ⌈ ε log n ⌉ , and degree at most A = (2 + ε ) · a . Then, we invoke ProcedureArb-Linial on the forest-decomposition. Since the procedure requires each vertex to send only its currentcolor to its neighbors (which is of size O (log n )), Procedure Arb-Linial can be invoked as-is in the congestedclique. In our case we execute Procedure Arb-Linial with an input parameter A = (2 + ε ) · a . In ProcedureArb-Linial each vertex considers only the colors of its parents in forests F , F , ..., F A . By Lemma A.6 inAppendix A the algorithm computes O (((2 + ε ) · a ) ) = O ( a )-coloring. This completes the description ofProcedure Arb-Coloring-CC. Its pseudocode and running time analysis are provided below. Algorithm 4 O ( a )-coloring in the Congested Clique procedure Arb-Coloring-CC ( a, ε ) H = ( H , H , , H l ) = invoke Procedure Forest-Decomposition-CC invoke Procedure Arb-Linial ( H , A = (2 + ε ) · a ) Theorem 5.1.

Procedure Arb-Coloring-CC computes a proper O ( a ) -coloring in the Congested Clique in O (log a + log ∗ n ) rounds.Proof. The correctness of the procedure follows from the above discussion. The running time of step (1) is O (log a ) rounds, by Lemma 3.4. Step (2), by Lemma A.10, requires O (log ∗ n ) rounds. Thus, the overallrunning time of the procedure is O (log a ) + log ∗ n . 8 O ( a ε ) -coloring in O (log ∗ n ) time In this section we show that the factor of log a can be eliminated from the running time of Theorem 5.1in the expense of slightly increasing the number of colors to O ( a ε ), for an arbitrarilly small positiveconstant ε . To this end, we invoke Procedure H-Partition-CC with second parameter set as a ε , rather than ε . We show below that this way the running time of forests-decompositions becomes just O (1). However,the number of forests produced is now O ( a (1+ ε ) ), rather than O ( a ). Moreover, once Procedure Forest-Decomposition-CC terminates, we invoke Arb-Linial-CC algorithm on the result of the forest decompositionto compute O (( a (1+ ε ) ) )-Coloring. Lemma 6.1.

Invoking Procedure H-Partition-CC with the second parameter set as q = a ε requires O (1) rounds.Proof. In each round the number of active vertices is reduced by a factor of Θ( a ε ). For i = 1 , , ... , thenumber of edges in the subgraph induced by active vertices in round i is at most O ( ( a · n )( a ε ) i ). Thus, after i = O ( ε ) rounds,the number of remaining edges will be O ( n ). Then we can employ Lenzen’s scheme, broadcastthese edges to all vertices within O (1) rounds, and compute the remaining H -sets locally. Therefore, theoverall running time is O ( ε ) = O (1). Lemma 6.2.

For graphs G with a(G)=a, and a parameter, q = a ε , for an arbitrarilly small positiveconstant ε , Procedure Forest-Decomposition-CC partitions the edge set of G into A = O ( a ε ) orientedforests in O (1) rounds in Congested Clique.Proof. By Lemma 6.1, Procedure

H-Partitions-CC executes in O (1) rounds, the second stage is an ori-entation that is computed in O(1) rounds, and assigning labels to outgoing edges is computed in

O(1) rounds as well. Therefore, the overall time of is O (1).The next theorem follows directly from Lemmas 6.1 - 6.2. Theorem 6.3.

For graphs G with a ( G ) = a and with a parameter q = a ε , for a positive constant ε ,Procedure Arb-Coloring-CC computes O ( a (2+ ε ) ) -coloring within O (log ∗ n ) time in Congested Clique. O ( a ε ) -coloring in O (log a ) time In this section we devise an algorithm that produces O ( a ε )-coloring in O (log a + log ∗ n ) running time.We employ a combination of defective colorings and forest decompositions. Usually, when a vertex isrequired to select a color, it chooses a color diﬀerent from the colors of all its neighbors. The vertex’sneighbors select their colors in diﬀerent rounds. Alternatively, in a defective coloring, a vertex can select acolor that is already used by its neighbors. Furthermore, neighbors can perform the selection in the sameround. Therefore, the computation can be signiﬁcantly more eﬃcient. Moreover, defective colorings allowus to obtain helpfull structures with appropriate properties, such as partial orientations with small deﬁcit.We start by presenting a procedure, called Procedure Partial-Orientation-CC. It is based on a procedurefrom [2], but the current variant is adapted to Congested Cliques, and it is more eﬃcient than the variantfor general graphs. The procedure receives as an input a graph G and an integer t >

0. It computes anorientation with out-degree ⌊ (2 + ε ) · a ⌋ and a deﬁcit at most ⌊ at ⌋ . (Recall that the deﬁcit is the maximumnumber of unoriented edges adjacent on the same vertex.)Procedure Partial-Orientation-CC contains three steps. First, an H -partition of the input graph G iscomputed. Then the vertex set of G is partitioned into subsets H , H , ..., H l , such that every vertex in H i , ≤ i ≤ O (log n ), has O ( a ) neighbors in S log nj = i H j . In the next step, ( ⌊ a/t ⌋ ) -defective O ( t )-coloring iscomputed in each G ( H i ) in parallel, using [3]. The ﬁnal step is a computation of an orientation as follows.Consider an edge e = ( u, v ) , u ∈ H i , v ∈ H j for some 1 ≤ i, j ≤ O (log n ). If i < j , orient the edge towards9 . If j < i , orient the edge towards u . Otherwise i = j . In this case the vertices u and v may have diﬀerentcolors or the same color. If the colors are diﬀerent, orient the edge towards the vertex that is colored witha greater color. Otherwise, the edge remains unoriented. This complete the describing of the procedure. Algorithm 5

Computing a partial orientation with length O ( t log n ) and deﬁcit a/t in the CongestedClique procedure Partial-Orientation-CC ( G, t ) H = ( H , H , ..., H l ) Invoke Procedure H-Partition-CC For each i = 1 , ..., log n in parallel do : compute an ( ⌊ a/t ⌋ ) -defective O ( t ) -coloring of G ( H i ) For each edge e = ( u, v ) in E in parallel do : if u and v belong to diﬀerent H -sets then orient e towards the set with greater index. else if u and v have diﬀerent colors then orient e towards the vertex with greater color between u, v. Lemma 7.1.

For graphs G with a(G)=a, with parameters ε, < ε ≤ and integer t, t > . The ProcedurePartial-Orientation-CC produces an acyclic orientation of out-degree ⌊ (2 + ε ) · a ⌋ Proof.

Consider a vertex v ∈ H i . Each outgoing edge of v is connected to a vertex in a set H j such that j ≥ i . By Lemma 2, v has at most ⌊ (2 + ε ) · a ⌋ neighbors in S log nj = i H j . Thus, the out-degree of v is at most ⌊ (2 + ε ) · a ⌋ . Lemma 7.2.

For graph G with a(G)=a, with parameters ε, < ε ≤ and integer t, t > . The ProcedurePartial-Orientation-CC produces an acyclic orientation of length O ( t · log n ) Proof.

Consider a directed path p ′ in G ( H i ). The length of p ′ is smaller than the number of colors used inthe defective coloring of G ( H i ), which is O ( t ). (This is because each edge on a path is directed towards agreater color, and the number of colors of H i is O ( t ).) Consider a directed path p in G with respect to theorientation produced by Procedure Partial-Orientation-CC. The path p contains at most O (log n ) edgeswhich cross between diﬀerent H -sets. (This is because each edge that cross between H -sets is directedtowards a greater index, and the number of indices of H sets is bounded by O (log n ).) Note, that betweenany pair of such edges that cross between H -sets, there are at most O ( t ) edges which belong to the same H -set (with respect to both their endpoints). Therefore, the length of the path p is at most O ( t · log n ). Theorem 7.3.

The running time of the Procedure Partial-Orientation-CC on a graphs G with a(G)=a,with parameters ε, < ε ≤ and integer t, t > is O (log a + log ∗ n ) Proof.

The ﬁrst step of Procedure Partial-Orientation-CC is Procedure H-Partition-CC which requires O (log a ) rounds. The second step, is computing defective colorings, which by Lemma 6, requires O (log ∗ n )time. Orientation step requies only O (1) rounds. Thus, the overall time is O (log a + log ∗ n ).Partial Orientations allow us to compute arbdefective colorings as follows. Each vertex waits for allneighbors on outgoing edges (henceforth, parents) to select a color from a certain range { , , ..., k } . Then avertex selects a color that is used by the minimum number of parents. While this is not a proper coloring, itpartitions the graph into subgraphs induced by color classes. These subgraphs have smaller arboricity, andcan be processed more eﬃciently. By repeating this several times, we obtain subgraphs with suﬃcientlysmall arboricity that can be colored directly. Then we combine all colorings eﬃciently to obtain a uniﬁedcoloring of the input graph. This general scheme was developed in [3] for general graphs. But here we10pply it more eﬃciently on Congested Cliques, using their special properties and the new techniques wedevised for them.Once we deﬁned Procedure Partial-Orientation-CC, we proceed to Procedure Simple-Arbdefective [3]to compute O ( a/k )-arbdefective k -koloring. In other words, it computes a vertex decomposition into k subgraphs such that each subgraph has arboricity O ( a/k ). (See Appendix A). Note that in the ﬁrst roundthe vertices without outgoing edges have nothing to wait for, and so they are colored in the ﬁrst round. Algorithm 6

Computing O ( a/k )-arbdefective k -coloring procedure Simple-Arbdefective-CC ( G, k ) An algorithm for each vertex v ∈ V While ( v is not colored ) do if each parent u of v is colored then v selects a color from the range 1 , , ..., k , used by the minimum number of parents. send the messages ” v is colored” to all the neighbors end While Now, we deﬁne our next procedure, called Procedure Arbdefective-Coloring-CC. The procedure receivesas input a graph G and two positive integer parameters k and t . First, it invokes Procedure Partial-Orientation-CC on G and t . After that it employs the produced orientation and the parameter k as aninput for Procedure Simple-Arbdefective-CC, which is activated as soon as Procedure Partial-Orientation-CC ends. Note that during the invocation of Procedure Partial-Orientation-CC an execution of Lenzen’sscheme is performed, and so all vertices learn the subsets { H j , H j +1 , ..., H l } , j = Θ(log a ) , l = O (log n ) ofthe H partition. We will refer to partition H j , H j +1 ..., H l as a subpartition H ′ of H = { H , H , ..., H l } .Once the partition H ′ becomes known to all vertices, Procedure Simple-Arbdefective-CC can be invokedon it locally, without communication whatsoever. Then any vertex that belongs to H i , i ≥ j , selects itscolor immediately, according to this computation. Vertices in H i with i < j must select their colors byexecuting a distributed algorithm. This is done again using Procedure Simple-Arbdefective-CC, but sincethe number of remaining H -sets is just O (log a ), this is done more eﬃciently than invokig it on the entiregraph. This completes the description of the procedure. Its pseudocode is provided below. Algorithm 7

Computing an arbdefective coloring with k colors and arbdefect O ( a/t + a/k ) in the Con-gested Clique procedure Arbdefective-coloring-CC ( G, k, t ) H = { H , H , ..., H l } invoke Procedure Partial-Orientation-CC( G , t ) let H ′ = { H j , H j +1 , ..., H l } , j = Θ(log a ), be the sets that all vertices v ∈ V have learnt as a resultof the invocation of line 2 invoke Procedure Simple-Arbdefective-CC( G , k ) locally on H ′ invoke Procedure Simple-Arbdefective-CC( G , k ) in a distributed manner on H \ H ′ = { H , H , ..., H j − } Lemma 7.4.

The running time of Procedure Arbdefective-Coloring-CC is O ( t · log a ) Proof.

Procedure Partial-Orientation-CC requires O (log a ) rounds. Then all vertices learn the sets H ′ = { H j , H j +1 , ..., H l } and color them locally within O (1) rounds. Consequently, any remaining oriented pathof uncolored vertices belongs to H \ H ′ , and thus has length O ( t log a ). Indeed, a path may consists ofat most O (log a ) edges that cross between H -sets of H \ H ′ , and at most O ( t ) edges that are within thesame H -set between pairs of crossing edges. Lemma 7.5.

Procedure Arbdefective-Coloring-CC invoked on a graph G and two positive integer parame-ters k and t computes an ( ⌊ a/t + (2 + ε ) · a/k ⌋ ) -arbdefective k -coloring in time O ( t log a )11 roof. The number of outgoing edges of each vertex is at most (2 + ε ) · a . (See Lemma A.1.) Considera subgraph G i induced by vertices of the same color i ∈ { , , ..., k } . Since each vertex selected a colorused by minimum number of parents from the set { , , ..., k } , it has at most (2 + ε ) a/k outgoing edgesin G i . (By pigeonhole principle.) In addition, a vertex in G i may have at most a/t unoriented edgesadjacent on it in G i , since the deﬁcit is at most a/t . For the purpose of analysis we can add directionsto all unoriented edges, such that the graph remains acyclic. This is done by a topological sortng ofvertices according to directions of originally oriented edges. Then each vertex in G i has out degree at most( ⌊ a/t + (2 + ε ) · a/k ⌋ ), all edges are oriented, and the orientation is acyclic. Hence the arboricity of G i isat most ( ⌊ a/t + (2 + ε ) · a/k ⌋ ) for all i ∈ { , , ..., k } .We will invoke Procedure Arbdefective-Coloring-CC with a parameter t = k = O (1), that has tobe a suﬃciently large constant. In this case it returns a ((3 + ε ) · a/t ⌋ ) -arbdefective t-coloring in O ( t log a ) time. Such a t -coloring constitutes a decomposition of G into t sub-graphs with arboricityat most ((3 + ε ) · a/t ) in each of them. The invocations are performed by a procedure we deﬁne next.The procedure is called Procedure Proper-Coloring-CC. The main idea is partitioning an input graph G into subgraphs G = G , G , ..., G k using Procedure Arbdefective-Coloring-CC in time O ( t log a ), and theninvoking Procedure Proper-Coloring-CC recursively on these subgraphs. Note that each vertex-inducedsubgraph of a Congested Clique is a Congested Clique by iteslf, and so it is possible to invoke Proce-dure Proper-Coloring-CC recursively. The number of recursion levels is going to be O (log a ), and thusthe overall running time is O (log a ). Our ultimate goal is to partition an input graph G by ProcedureArbdefective-Coloring-CC to subgraphs G i , ≤ i ≤ a ε with arboricity a ( G i ) = O (1). This is the termi-nation condition of the recursion. In the bottom level of the recursion, when all subgraphs have a constantarboricity, we invoke our general algorithm from Section 4 to color the subgraphs with O (1) colors each,in constant time. We apply this idea in the following Procedure Proper-Coloring-CC. Algorithm 8

Proper Coloring in Congested Clique procedure Proper-Coloring-CC ( G ′ , α ) p = a suﬃciently large constant if α > p then for each G i ∈ G ′ in parallel do : G ′′ , G ′′ , ...G ′′ p = Procedure Arbdefective-Coloring-CC( G i , k = p, t = p ) α =(3 + ε ) αp /* New upper bound for arboricity of each subgraph */ Proper-Coloring-CC( G i , α ) end for else Color each G i ∈ G ′ using our general algorithm of Section 4 with O ( α ) distinct colors /* O ( α ) = O ( p ) = O (1) */ 12he procedure receives as input a graph G . In each recursion level Procedure Arbdefective-Coloring-CCis invoked on an input graph G ′ . Then a decomposition into p subgraphs is performed, where each subgraphhas arboricity at most (3 + ε )( a ( G ′ ) p ). In each of the following recursion levels, Procedure Arbdefective-Coloring-CC is called in parallel on all subsequent subgraphs that were created at the previous levels. Asa result, a reﬁnement of the decomposition is obtained, that is, each subgraph partitioned further into p subgraphs of yet lower arboricity. Consequently, after each level, the number of subgraphs in G grows bya factor p , but the arboricity of each subgraph decreases by a factor of p/ (3 + ε ). Consequently, in level i of the recursion, the product of the number of subgraphs and the arboricity of subgraphs is O ((3 + ε ) i · a ).Once the arboricity of each graph becomes at most p , the procedure terminates in a level denoted r , andreturns an O ((3 + ε ) r · a )-coloring of the entire graph. (Since there are O ((3 + ε ) r · a ) subgraphs in thatstage, and each is colored with distinct O (1)-colors.) We next analyze the procedure. Lemma 7.6.

In the end of level i of the recursion , i = 1 , , .. any graph G ′′ j that is produced in this levelhas arboricity at most ((3 + ε ) /p ) i · a ( G ) , where a ( G ) is the arboricity of the original input graph G .Proof. The proof is by induction on the number of levels. The base case is the ﬁrst level. Then G ispartitioned into p subgraphs produced by Procedure Procedure Arbdefective-Coloring-CC, with arboricyat most (3 + ε ) a/p in each of them. For the inductive step, consider a level i . By the induction hypothesis,each subgraph in G ′ has arboricity at most ((3 + ε ) /p ) ( i − · a ( G ) . During level i , Procedure Arbdefective-Coloring-CC is invoked on all subgraphs in G ′ . Consequently, the new subgraphs have arboricity at most(3 + ε )((3 + ε ) /p ) ( i − · a ( G ) /p = ((3 + ε ) /p ) i · a ( G ). Lemma 7.7.

The recursion proceeds for (log a ) / (log( p/ (3 + ε ))) levels.Proof. In each level the parameter α is decreased by a multiplicative factor of p/ (3 + ε ), for a suﬃcientlylarge constant p . Therefore, the number of levels is at most log p/ (3+ ε ) a = (log a ) / (log( p/ (3 + ε ))).Once the arboricity of each graph become O (1), we color each subgraph properly using O (1) distinctcolors. Note that it is indeed possible to use distinct palettes for each subgraph so each vertex deduces anappropriate color (i.e., diﬀerent from colors of other subgraphs and from neighbors in the same subgraph)using the index of the vertex’s subgraph, and the indexes of subgraph collections the vertex belongs to inthe recursion tree. The following theorem analyses the running time of the procedure. Theorem 7.8.

The running time of the Procedure Proper-Coloring-CC on a graphs G with arboricity a ( G ) = a is O (log a ) . The procedure colors an input graph G with O ( a ε ) colors, for an arbitrarilly smallpositive constant ε .Proof. There are log a recursion levels, each level requires O ( t log a ) = O (log a ) rounds. The bottomlevell requires O (1) time. Thus, overall, the running time is O (log a ). The number of colors is ((3 + ε )) log a/ (log( p/ (3+ ε ))) · a ( G ) = a ε , for a suﬃciently large constant p . O ( a ) -coloring in O ( a ε ) time Our goal in this section is to eﬃciently compute an O ( a )-coloring of the graph G . In Proper-Coloring-CC weinvoked Procedure Arbdefective-Coloring-CC on a graph G with the input parameters p = k = t = O (1). Ifwe invoke our Proper-Coloring-CC algorithm with diﬀerent parameters, p = k = t = a ε , for an arbitrarilysmall constant ε >

0, we obtain the following result.

Theorem 8.1.

Invoking Procedure Proper-Coloring-CC on a graph G with arboricity a with the parameter p = ⌈ a ε/ ⌉ , produces a proper O ( a ) -coloring of G within O ( a ε ) time. roof. During the execution of Procedure Proper-Coloring-CC, the number of recursion levels is O (3 /ε ),i.e., a constant. In each level, the number of colors increases just by a constant as well. In each levelProcedure Partial-Orientation-CC is executed, which requires O ( t log a ) = O ( a ε/ log a ) = O ( a ε ) rounds.The bottom recursion level requires O ( a ε/ ) time and produces O ( a ε/ )-coloring in each of the O ( a − ε )subgraphs of this stage, using our algorithm from Section 4. Hence, the total running time is O ( a ε ). Our MIS algorithm works in the following way. We invoke procedure Proper-Coloring-CC with p = a / instead of a constant. Moreover, we perform recursive calls as long as α > p , rather than α > p .Consequently, there are just four recursion levels, each of which requires O ( t log a ) = O ( a / log a ) time.At the bottom level of the recursion, each subgraph has arboricity O ( √ a ) and there are q = O ( √ a ) suchsubgraphs. Denote these subgraphs G , G , ..., G q . Once Procedure-Coloring-CC terminates, we performthe following loop consisting of q iterations. For each i = 1 , , ..., q , we compute an MIS locally in G i . Atthis stage all vertices of the subgraph G i have already learnt it during the execution of Procedure Proper-Coloring-CC, so this is indeed possible. Once vertices join MIS, they send their neighbors a message tellingthem not to join. Each vertex that receives a message from a neighbor in the MIS, broadcasts to all verticesin the graph that it is outside the MIS. Once these messages are received, each vertex deletes such verticesthat have neighbors in the MIS from each G , G , ..., G q in its local memory. This completes the descriptionof an iteration. Once all iterations complete, we have an MIS of the entire graph. The pseudocode of thealgorithm is provided below. Next, we analyze its correctness and running time. Algorithm 9

MIS in Congested Clique procedure MIS-CC ( G ′ , α ) initially, M = ∅ compute a decomposition into q = O ( √ α ) subgraphs G , G , ..., G q of arboricity O ( √ α ) Each vertex in each G i , i = 1 , , ..., q , learns the subgraph G i using our general algorithm fromSecion 4. for i = 1,2,...,q do compute an MIS of G i locally and add its vertices to M each vertex of G i that is in M broadcasts this information to all vertices each vertex that has a neighbor in M broadcasts this infomation to all vertices each vertex removes in its local memory the vertices of G i +1 , G i +2 , ..., G q that have neighborsin M end for return M Theorem 9.1.

Procedure MIS-CC computes a proper MIS of the input graph.Proof.

We prove that for i = 1 , , ..., q , after iteration i , the subgraph of G ′ induced by vertices of G , G , ..., G i has a proper MIS. The proof is by induction on i . Base ( i = 1 ): After the ﬁrst iteration an MIS of G is computed and added to M . Step:

In the beginning of iteration i , by induction hypothesis, M contains an MIS of the subgraph inducedby vertices of G , G , ..., G i − . In iteration i −

1, all neighbors of M announced this to all other vertices,and as a result were removed from G i in the local memories of processors. Consequently, the MIS that iscomputed in iteration i on line 6 of the procedure does not have neighbors in the set M = M i − producedin the end of iteration i −

1. Denote the MIS computed in iteration i , step 6, by M ′ . It follows that M i − ∪ M ′ is an independent set. Moreover, any vertex in G , G , ..., G i − is at distance at most 1 from14ome vertex in M i − , by induction hypothesis. Any vertex in G i is at distance 1 from some vertex in M i − (if it was removed from G i ), or at distance at most 1 from some vertex in M ′ (if it remained in G i ). Thusthe set M computed after i iterations, M i = M i − ∪ M ′ , is an MIS of vertices of G , G , ..., G i . Theorem 9.2.

The running time of Procedure MIS-CC is O ( √ a ) .Proof. The decomposition in line 3 is obtained using Procedure Proper-Coloring-CC that is invoked with p = a / , instead for a constant. It is invoked for 4 recursion levels. Consequently, in the bottom level,the arboricity of each subgraph is O ( √ a ). Hence, the running time of step 3 is O ( t log a + √ a ) = O ( √ a ).In line 4, each subgraph of G , G , ..., G q is learnt by all its vertices. This is performed in parallel for i = 1 , , ..., q , and requires O ( √ a ) time, since the arboricity of each subgraph is O ( √ a ). Each iterationof the loop in lines 5 - 10 requires O (1) rounds. Indeed, the computation of MIS is local and does notrequire communication rounds whatsoever. Broadcasting information of vertices in the MIS requires 1round. (Each vertex broadcasts a message of O (log n ) bits containing its ID.) Broadcasting informationabout vertices that have neighbors in the MIS also requires 1 round. Therefore, the running time of q iterations is O ( q ) = O ( √ a ). This is also the running time of the entire algorithm. References [1] N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximalindependent set problem.

J. of Algorithms , 7(4):567–583, 1986.[2] L. Barenboim, and M. Elkin. Sublogarithmic distributed MIS algorithm for sparse graphs using Nash-Williams decomposition. In Proc. of the 27th ACM Symp. on Principles of Distributed Computing,pages 25-34, 2008.[3] L. Barenboim and M. Elkin. Deterministic Distributed Vertex Coloring in polylogarithmic Time. J.ACM 58, 5 (2011), 23.[4] L. Barenboim, M. Elkin, and F. Kuhn. Distributed (∆ + 1) - coloring in linear (in ∆) time. SIAM.Journal on Computing, 43(1), pages 72-95, 2014.[5] K. Censor-Hillel, P. Kaski, J. Korhonenz, C. Lenzen, A. Paz, J. Suomela Algebraic Methods in theCongested Clique. Proceedings of the 34th ACM Symposium on Principles of Distributed Computing,Pages 143-152, 2015.[6] K. Censor-Hillel, M. Parter, G. Schwartzman. Derandomizing Local Distributed Algorithms underBandwidth Restrictions. In

Proce. of the 31st International Symposium on Distributed Computing ,2016[7] R. Cole and U. Vishkin. Deterministic Coin Tossing with Applications to Optimal Parallel ListRanking. Information and Control 70(1),pages 32-53, 1986[8] F. Le Gall. Further algebraic algorithms in the congested clique model and applications to graph-theoretic problems. In

Proc. of the 30th International Symposium on Distributed Computing , pages57 - 70, 2016.[9] M. Ghaﬀari. Distributed MIS via All-to-All Communication. In proc. of the 36th ACM Symposiumon Principles of Distributed Computing , pages 141-149, 2017.[10] M. Ghaﬀari and M. Parter. MST in Log-Star Rounds of Congested Clique. In 35th ACM Symp. onPrinciples of Distributed Computing (PODC), pages 19-28, 2016.1511] A. Goldberg, and S. Plotkin. Eﬃcient parallel algorithms for (∆ + 1)- coloring and maximal in-dependent set problem. In Proc. 19th ACM Symposium on Theory of Computing, pages 315-324,1987.[12] A. Goldberg, S. Plotkin, and G. Shannon. Parallel symmetry-breaking in sparse graphs. SIAM Journalon Discrete Mathematics, 1(4):434-446, 1988.[13] J. Hegeman, G. Pandurangan, S. Pemmaraju, V. Sardeshmukh, M. Scquizzato. Toward OptimalBounds in the Congested Clique: Graph Connectivity and MST. Proc. of 34th ACM Symp. onPrinciples of Distributed Computing, pp.91-100, 2015.[14] J. Hegeman, and S. Pemmaraju. Lessons from the Congested Clique applied to MapReduce.

Theo-retical Computer Science , 608: pages 268-281, 2015.[15] A. Israeli and A. Itai. A fast and simple randomized parallel algorithm for maximal matching. Info.Proc. Lett. 22, 2 (1986), pages 77-80.[16] T. Jurdzinski, and K. Nowicki. MST in O (1) Rounds of the Congested Clique. Proc. of 29th ACM-SIAM Symp. on Discrete Algorithms, pages 2620-2632, 2018[17] F. Kuhn. Weak graph colorings: distributed algorithms and applications. In proc. of 21st ACM Symp.on Parallel Algorithms and Architectures, pages 138-144, 2009.[18] F. Kuhn, and R. Wattenhofer. On the complexity of distributed graph coloring. In proc. of 25th ACMSymp. on Principles of Distributed Computing, pp. 7-15, 2006.[19] C. Lenzen. Optimal deterministic routing and sorting on the congested clique. In Proc. 32nd ACMSymp. on Principles of Distributed Computing, pp. 42-50, 2013.[20] N. Linial. Locality in distributed graph algorithms. SIAM Journal on Computing, 21(1):193-201,1992.[21] Z. Lotker, E. Pavlov, B. Patt-Shamir, and D. Peleg. MST construction in O (log log n ) communicationrounds. In the Proceedings of the Symposium on Parallel Algorithms and Architectures, pages 94-100.ACM, 2003.[22] S. Pemmaraju, V. Sardeshmukh. Minimum-weight Spanning Tree Construction in O (log log log n )Rounds on the Congested Clique. http://arxiv.org/abs/1412.2333 ppendix A Preliminaries - Basic Procedures

A.1 H-partition

The arboricity a = a ( G ) is the minimum number a of edge-disjoint forests F , F , ..., F a whose union coversthe entire edge set E of the graph G = ( V, E ). Such a decomposition is called an a -forest-decompositionof G. The structure of H -partitions is useful for computing forests decompositions. A procedure forcomputing an H -partition, called Procedure Partition , was devised in [2]. This procedure accepts asinput the arboricity of the graph, and an arbitrarily small positive real constant ε ≤

2. The parameter ε determines the quality of the resulting H -partition. This means that smaller values of ε result in abetter partition, but require more time. Procedure Partition computes an H -partition with degree at most(2 + ε ) · a and size l = ⌈ ε log n ⌉ within l rounds. During the execution of Procedure Partition each vertexin V is either active or inactive. Initially, all the vertices are active. For every i = 1 , , ..., l in the i thround each active vertex with at most (2 + ε ) · a active neighbors joins the set H i and becomes inactive.The following results were proven in [2]. Lemma A.1. [2] For a graph G with arboricity a ( G ) = a , and a parameter ε , < ε ≤ , ProcedurePartition ( a, ε ) computes an H-partition of size l = ⌈ ε log n ⌉ with degree at most (2 + ε ) · a . The runningtime of the procedure is O (log n ) . Lemma A.2. [2] For a graph G with arboricity a ( G ) = a , and a parameter ε , < ε ≤ , G has at least εε +2 · | V | vertices with degree (2 + ε ) · a or less . Lemma A.3. [2] For any subgraph G’ of G, the arboricity of G’ is at most the arboricity of G.

Lemma A.4. [2] The H-partition H = H , H , ..., H l , l ≤ ⌈ ε log n ⌉ , has degree at most A = (2 + ε ) · a A.2 Forests-Decomposition

Coloring oriented forests can be performed extremely eﬃciently in the distributed setting, both in terms ofrunning time and in the number of colors. For a wide range of graph families, it is possible to color orientedgraphs signiﬁcantly faster than a coloring of general graphs, using the decomposition to forests. If a graphcan be decomposed into a reasonably small number of oriented forests, then both the running time andthe size of the employed coloring palette can be reduced. A k -forests-decomposition is a partition of theedge set of the graph into k subsets, such that each subset forms a forest. Eﬃcient distributed algorithmsfor computing O ( a )-forests decompositions have been devised recently in [2] . Several results from [2] areused in this work. Lemma A.5. [2] (1) For any graph G, a proper ( ⌊ (2 + ε ) · a ⌋ + 1) -coloring of G can be computed in O ( a log n ) time, for an arbitrarily small positive constant ε .(2) For any graph G, an O ( a ) - forest-decomposition can be computed in O (log n ) time. Another, useful, procedure is Procedure Arb-Linial [2] . Which is essentially a composition of Linial [20] O (∆ )-coloring algorithm with an algorithm Procedure Forests-Decomposition [2] . The main diﬀerence ofthe coloring step of Procedure Arb-Linial from the original Linial coloring algorithm is that in ProcedureArb-Linial each vertex considers only the colors of its parents in forests F , F , ..., F A , where A ≤ ⌊ (2+ ε ) · a ⌋ rather than all its neighbors. Lemma A.6. [2] An O ( a ) − coloring can be computed in O (log ∗ n ) time i .3 Defective coloring An m-defective p-coloring of a graph G is a coloring of the vertices of G using p colors, such that eachvertex has at most m neighbors colored by its color. Each color class in the m-defective coloring inducesa graph of maximum degree m .It is known that for any positive integer parameter p , an ⌊ △ p ⌋ -defective O ( p ) -coloring can be eﬃcientlycomputed distributively [4]. Lemma A.7. [4] ⌊ △ p ⌋ -defective O ( p ) -coloring can be computed in O (log ∗ n ) time An r-arbdefective k-coloring is a coloring with k colors, such that all the vertices colored by the samecolor i , 1 ≤ i ≤ k , induce a subgraph of arboricity at most r . Barenboim and Elkin [3] devised an eﬃcientprocedure for computing an arbdefective coloring Arbdefective-Coloring Procedure . The procedure, receivesas input a graph G and two positive integer parameters k and t .Barenboim and Elkin [3] deﬁned a procedure Simple-Arbdefective which works in the following way. Theprocedure accepts as input such an orientation and a positive integer parameter t . During its execution,each vertex computes its color as follows. Each vertex waits for its parents to select their colors. Oncethe vertex receives a message from each of its parents containing their selections, it selects a color fromthe range 1 , , ..., k that is used by the minimum number of parents. Then it sends its selection to all itsneighbors. This completes the description of the procedure. It is used in a more sopisticated procedurecalled Arbdefective-Coloring. Its properties are summarized below. Lemma A.8. [3] Procedure Arbdefective-Coloring invoked on a graph G with arboricity a , and two positiveinteger parameters k and t , computes an ( ⌊ a/t + (2 + ε ) · a/k ⌋ ) -arbdefective k -coloring in time O ( t log n ) . A.4 O(a)-coloring

The O(a)-coloring algorithm of Barenboim and Elkin [3] works as follows. The procedure receives asinput a graph G and a positive integer parameter p. It proceeds in phases. In the ﬁrst phase ProcedureArbdefective-Coloring is invoked on the input graph G with the parameters k=p and t=p. Consequently,a decomposition into p subgraphs is produced, in which each subgraph has arboricity O ( a/p ). In eachof the following phases Procedure Arbdefective-Coloring is invoked in parallel on all subgraphs of thedecomposition of the previous phase. Each subgraph is partitioned into p sub-graphs of smaller arboricity.Thus, after each phase, the number of subgraphs in G grows by a factor of p , however the arboricity ofeach subgraph shrinks by a factor of Θ( p ). Consequently, the product of the number of sub-graphs and thearboricity of subgraphs remains O ( a ) after each phase. Once the arboricities of all subgraphs become smallenough, this is used for a fast parallel coloring of all the sub-graphs, resulting in a proper O ( a )-coloring ofthe graph G . Lemma A.9. [3] Invoking Procedure Legal-Coloring on a graph G with arboricity a with the parameter p = ⌈ a µ ⌉ for a positive constant µ < , produces a legal O ( a ) -coloring of G within O ( a µ · log n ) time. A.5 Lenzen’s routing algorithm

One of the important building blocks for algorithms in the Congested Clique model is Lenzen’s routingalgorithm [19]. This algorithm guarantees that if there is a component of an algorithm in which eachnode needs to send at most O ( n log n ) bits and receive at most O ( n log n ) bit, then O (1) rounds aresuﬃcient. This corresponds to sending and receiving O ( n ) pieces of data with a size O (log n ) to everynode. Intuitively, this is easy when each piece of information of a node has a distinct destination, via adirect message. Since, source-destination partition does not have to be uniform , it is a big advantage ofLenzen’s algorithm. ii emma A.10. [19] The Algorithm of Optimal Deterministic Routing provides a routing scheme such thatif each node is the source for O ( n ) messages and each node is the designation for O ( n ) messages, then allof these messages can be routed from their sources to their destinations within O (1) rounds.rounds.