[PDF] Massively Parallel Correlation Clustering in Bounded Arboricity Graphs

Abstract

Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph G on n vertices such that the positive edges induce a \lambda-arboric graph. Our main result is a 3-approximation (\textit{in expectation}) algorithm to correlation clustering that runs in \mathcal{O}(\log \lambda \cdot \textrm{poly}(\log \log n)) MPC rounds in the \textit{strongly sublinear memory regime}. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA '18) on randomized greedy MIS and the \texttt{PIVOT} algorithm of Ailon, Charikar, and Newman (STOC '05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with \textit{worst case} (1+\epsilon)-approximation guarantees in the special case of forests, where \lambda=1.

Full PDF

MMassively Parallel Correlation Clustering in Bounded ArboricityGraphs

M´elanie Cambus , Davin Choo , Havu Miikonen , and Jara Uitto Aalto University, Finland ETH Z¨urich, Switzerland { melanie.cambus, havu.miikonen, jara.uitto } @aalto.ﬁ, [email protected]

15 February 2021

Abstract

Identifying clusters of similar elements in a set is a common objective in data analysis. With the immensegrowth of data and physical limitations on single processor speed, it is necessary to ﬁnd eﬃcient parallelalgorithms for clustering tasks. In this paper, we study the problem of correlation clustering in boundedarboricity graphs with respect to the Massively Parallel Computation (MPC) model. More speciﬁcally, weare given a complete graph where the vertices correspond to the elements and each edge is either positive ornegative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the verticesinto clusters with as few disagreements as possible. That is, we want to minimize the number of positiveinter-cluster edges and negative intra-cluster edges.Consider an input graph G on n vertices such that the positive edges induce a λ -arboric graph. Ourmain result is a 3-approximation ( in expectation ) algorithm that runs in O (log λ · log log n ) MPC rounds inthe sublinear memory regime . This is obtained by combining structural properties of correlation clusteringon bounded arboricity graphs with the insights of Fischer and Noever (SODA ’18) on randomized greedyMIS and the PIVOT algorithm of Ailon, Charikar, and Newman (STOC ’05). Combined with known graphmatching algorithms, our structural property also implies an exact algorithm and algorithms with worstcase (1 + ε )-approximation guarantees in the special case of forests, where λ = 1. Clustering is a common unsupervised machine learning task. Graphs are a versatile abstraction of datasets,where common objectives for data-analysis are to simplify the datasets, community detection, and link predic-tion [CSX12, CBGVZ13].Here, we study the correlation clustering problem which aims to group elements of a dataset according totheir similarities. Consider the setting where we are given a complete signed graph G = ( V, E = E + ∪ E − )where edges are given positive ( E + ) or negative ( E − ) labels, signifying whether two points are similar or not.The task is to ﬁnd a partitioning of the vertex set V into clusters C , C , . . . , C r , where r is not ﬁxed by theproblem statement but can be chosen freely by the algorithm . If endpoints of a positive edge belong to thesame cluster, we say that the edge is a positive agreement ; and a positive disagreement otherwise. Besides, ifendpoints of a negative edge belong to the same cluster, we say that the edge is a negative disagreement ; anda negative agreement otherwise. The goal of correlation clustering is to obtain a clustering that maximizesagreements or minimizes disagreements. In this work, we focus on the case of minimizing disagreements whenthe positive edges of the input graph induces a λ -arboric graph.In the complete signed graph setting, one can perform cost-charging arguments via “bad triangles” to proveapproximation guarantees. A set of 3 vertices { u, v, w } is a bad triangle if { u, v } , { v, w } ∈ E + and { u, w } ∈ E − .The edges within any bad triangle induce at least one disagreement in any clustering, so one can lower boundthe cost of any optimum clustering by the number of edge-disjoint bad triangles in the input graph. PIVOT is a well-known algorithm that provides a 3-approximation (in expectation) to the problem of minimizing This is in contrast to, for example, the classic k -means clustering where k is a ﬁxed constant. a r X i v : . [ c s . D C ] F e b isagreements in the sequential setting by using a cost-charging argument on bad triangles [ACN08]. It worksas follows: as long as the graph is non-empty, pick a vertex v uniformly at random and form a new clusterusing v and its “positive neighbors” (i.e. joined by a positive edge). One can view PIVOT as simulating greedyMIS with respect to a uniform-at-random permutation of vertices.Many of the known distributed algorithms for the correlation clustering problem adapt the PIVOT algorithm.The basic building block is to ﬁx a random permutation and to create the clusters by ﬁnding, in parallel, localminimums according to the permutation. The

ParallelPIVOT , C4 and ClusterWild! algorithms [CDK14,PPO +

15] all obtain constant approximations in O (log n · log ∆) synchronous rounds, where ∆ stands for themaximum positive degree. Meanwhile, with a tighter analysis of randomized greedy MIS algorithm [FN18],one can obtain a 3-approximation in O (log n ) rounds by directly simulating PIVOT . All above approximationguarantees are in expectation.

We consider the

Massive Parallel Computation (MPC) model [KSV10, BKS17] which serves as a theoreticalabstraction of several popular massively parallel computation frameworks such as Dryad [IBY + + M machines, each with memory size S , and we wish to solve a problem givenan input of size N . In the context of correlation clustering, we may think of N = | E + | , since the negative edgescan be inferred from missing positive edges. Typically, the local memory bound S is assumed to be signiﬁcantlysmaller than N . We focus on the strongly sublinear memory regime, where S = (cid:101) O (cid:0) n δ (cid:1) for some constant δ < S · M is not much larger than N . In our case, the required total memory is (cid:101) O (cid:0) n δ (cid:1) and for convenience, we assume that each vertex has one machine dedicated to its computations. In particular,in the beginning of the computation, each vertex and its edges are stored on a single machine.The computation in the MPC model proceeds in synchronous rounds . In each round, each machine canperform arbitrary computation on the data that resides on it. Then, each machine communicates in an all-to-all fashion with all other machines conditioned on sending and receiving messages of size at most O ( S ).This concludes the description of an MPC round. Since communication costs are typically the bottleneck, themetric for evaluating the eﬃciency of an MPC algorithm is the number of rounds required. Remark.

To avoid unnecessary technical complications, we assume throughout the paper that the maximum(positive) degree of the input graph is such that the edges of a vertex ﬁt on a single computer i.e. ∆ ∈ O ( S ).This assumption can be lifted using the virtual communication tree technique described by Ghaﬀari and Uitto[GU19]. Our goal is to obtain eﬃcient algorithms for correlation clustering in the sublinear memory regime of MPC (seeModel 1) when given a complete signed graph G , with maximum positive degree ∆, where the set of positiveedges E + induces a λ -arboric graph. Our main contributions are the following:1. By combining known techniques, we show that one can compute a randomized greedy MIS, with respectto a uniform-at-random permutation of vertices, in O (log ∆ · log log n ) MPC rounds (Theorem 3) using (cid:101) O (cid:0) n δ (cid:1) global memory. We believe that this result is of independent interest beyond applications tocorrelation clustering.2. Our main result (Theorem 5) is that one can eﬀectively ignore vertices of degrees larger than O ( λ )when computing a correlation clustering. Then, the overall runtime and approximation guarantees areinherited from the choice of algorithm used to solve correlation clustering on the remaining boundeddegree subgraph . A subset M ⊆ V is a maximal independent set (MIS) if (1) for any two vertices u, v ∈ M , u and v are not neighbors, and(2) for any vertex v ∈ V , either v ∈ M or v has a neighbor in M . Given a vertex ordering π : [ n ] → V , greedy MIS refers to theprocess of iterating through π (1) , . . . , π ( n ) and adding each vertex to M if it has no neighbor of smaller ordering. Technically speaking,

ParallelPIVOT does not compute a greedy MIS. Instead, it computes random independent sets in eachphase and only uses the initial random ordering to perform tie-breaking. i.e. if a vertex u has more than one positive neighbor inthe independent set, then vertex u joins the cluster deﬁned by the neighbor with the smallest assigned order. Although there is no hard constraint, all known MPC algorithms spend polynomial time on each machine in any given round. In some works, “bounded degree” is synonymous with “maximum degree O (1)”. Here, we mean that the maximum degree is O ( λ ).

2. Using our main result, we show how to obtain eﬃcient correlation clustering algorithms for boundedarboricity graphs. By simulating

PIVOT on a graph with maximum degree O ( λ ) via Theorem 3, we get(i) A 3-approx. (in expectation) algorithm in O (log λ · log log n ) MPC rounds using (cid:101) O (cid:0) n δ (cid:1) globalmemory.In the special case of forests (where λ = 1), we show that the optimum correlation clustering is equivalentto computing a maximum matching . Let 0 < ε ≤ maximum matching and two for maximal matching ), and hiding 1 /ε factors in O ε ( · ),we obtain(ii) An exact randomized algorithm that runs in (cid:101) O (log n ) MPC rounds.(iii) A (1 + ε )-approx. (worst case) deterministic algorithm that runs in O ε (log log ∗ n ) MPC rounds.(iv) A (1 + ε )-approx. (worst case) randomized algorithm that runs in O ε (1) MPC rounds.Finally, for low-arboricity graphs, the following result may be of interest:(v) An O (cid:0) λ (cid:1) -approx. (worst case) deterministic algorithm that runs in O (1) MPC rounds.For more details and an in-depth discussion about our techniques, see Section 2. Before diving into formal details, we highlight in Section 2 the key ideas needed to obtain our results. Weshow how to eﬃciently compute a randomized greedy MIS in Section 3. Our structural result about correlationclustering in bounded arboricity graphs is presented in Section 4 and we combine this structural insight withknown algorithms in Section 5 to yield eﬃcient correlation clustering algorithms. Finally, we conclude withsome open questions in Section 6.

In this work, we only deal with complete signed graphs denoted by G = ( V, E = E + ∪ E − ) where | V | = n , | E | = (cid:0) n (cid:1) , and E + and E − denote the sets of positively and negatively labeled edges respectively. For a vertex v , the sets N + ( v ) ⊆ V and N − ( v ) ⊆ V denote vertices that are connected to v via positive and negative edges,respectively. The k -hop neighborhood of a vertex v is the set of vertices that have a path from v involving atmost k positive edges.A clustering C is a partition of the vertex set V . That is, C is a set of sets of vertices such that (i) A ∩ B = ∅ for any two sets A, B ∈ C and (ii) ∪ A ∈C A = V . For a cluster C ⊆ V , N + C ( v ) = N + ( v ) ∩ C is the set ofneighbors of v that lie within cluster C . We write d + C ( v ) = | N + ( v ) ∩ C | to denote the positive degree of v within C . If endpoints of a positive edge do not belong to the same cluster, we say that the edge is a positivedisagreement . Meanwhile, if endpoints of a negative edge belong to the same cluster, we say that the edge is a negative disagreement . Given a clustering C , the cost of a clustering cost ( C ) is deﬁned as the total number ofdisagreements.The arboricity λ G of a graph G = ( V, E ) is deﬁned as λ G = max S ⊆ V (cid:108) | E ( S ) || S |− (cid:109) , where E ( S ) is the set ofedges induced by S ⊆ V . We drop the subscript G when it is clear from context. A graph with arboricity λ issaid to be λ -arboric. We denote the set { , , . . . , n } by [ n ]. We hide absolute constant multiplicative factorsand multiplicative factors logarithmic in n using standard notations: O ( · ), Ω( · ), and (cid:101) O ( · ). The notation log ∗ n refers to the smallest integer t such that the t -iterated logarithm of n is at most 1 . An event E on a n -vertexgraph holds with high probability if it happens with probability at least 1 − n − c for an arbitrary constant c >

1, where c may aﬀect other constants (e.g. those hidden in the asymptotics). We ﬁx the parameters in ourmodel of computation as follows. Model 1 (Strongly sublinear MPC regime with n machines) . Consider the MPC model. Each vertex is givenaccess to a machine with memory size (cid:101) O (cid:0) n δ (cid:1) , for some constant < δ < . The total global memory usage is (cid:101) O (cid:0) n δ (cid:1) . That is, log ( t ) n ≤

1. For all practical values of n , one may treat log ∗ n ≤ .4 Further related work Correlation clustering on complete signed graphs was introduced by Bansal, Blum and Chawla [BBC04] .They showed that computing the optimal solution to correlation clustering is NP-complete, and explored twodiﬀerent optimization problems: maximizing agreements, or minimizing disagreements. While the optimumclusterings to both problems are the same (i.e. a clustering minimizes disagreements if and only if it maximizesagreements), the complexity landscapes of their approximate versions are wildly diﬀerent.Maximizing agreements is known to admit a polynomial time approximation scheme in complete graphs[BBC04]. Furthermore, Swamy [Swa04] gave a 0.7666-approximation on general weighted graphs via semideﬁ-nite programming.On the other hand, for minimizing disagreements, the best known approximation ratio for complete graphsis 2.06, due to CMSY [CMSY15], via probabilistic rounding of a linear program (LP) solution. This 2.06-approximation uses the same LP as the one proposed by Ailon, Charikar and Newman [ACN08] but performsprobabilistic rounding more carefully, nearly matching the integrality gap of 2 shown by Charikar, Guruswamiand Wirth [CGW05]. In general weighted graphs, the current state of the art, due to DEFI [DEFI06], gives an O (log n )-approximation through an LP rounding scheme.In a distributed setting, PPORRJ [PPO +

15] presented two random algorithms ( C4 and ClusterWild! )to address the correlation clustering problem in the case of complete graphs, aiming at better complexitiesthan

KwikCluster . The C4 algorithm gives a 3-approximation in expectation, with a polylogarithmic numberof rounds where the greedy MIS problem is solved on each round. The ClusterWild! algorithm gives upon the independence property in order to speed up the process, resulting in a (3 + ε )-approximation. Boththose algorithms are proven to terminate after O (cid:0) (cid:15) · log n · log ∆ (cid:1) rounds with high probability. A thirddistributed algorithm for solving correlation clustering is given by Chierichetti, Dalvi and Kumar [CDK14] forthe MapReduce model. Called ParallelPivot , it also gives a constant approximation in polylogarithmic time,without solving a greedy MIS in each round. Using a tighter analysis, Fischer and Noever [FN18] showed thatrandomized greedy MIS terminates in O (log n ) round with high probability, which directly implies an O (log n )rounds simulation of PIVOT in various distributed computation models.For our approach, the randomized greedy MIS plays a crucial role in terms of the approximation ratio. Fasteralgorithms are known for ﬁnding an MIS that may not satisfy the greedy property. For example, Ghaﬀari andUitto [GU19] showed that there is an MIS algorithm running in O (cid:0) √ log ∆ · log log ∆ + √ log log n (cid:1) rounds.This algorithm was later adapted to bounded arboricity with runtime of O (cid:0) √ log λ · log log λ + √ log log n (cid:1) byBBDFHKU [BBD + O (log ∆ + log log n ) MPCrounds due to Czumaj, Davies and Parter [CDP20]. In this section, we highlight the key ideas needed to obtain our results in Section 1.2. Before doing so, letus begin by explaining some computational features of the MPC model so as to set up the context needed toappreciate our algorithmic results. By exploiting these computational features together with a structural resultof randomized greedy MIS by Fischer and Noever [FN18], we explain how to compute a randomized greedyMIS in O (log ∆ · log log n ) MPC rounds. We conclude this section by explaining how to obtain our correlationclustering results by using our structural lemma that reduces the maximum degree of the input graph to O ( λ ). To better appreciate of the computational features of MPC, we ﬁrst describe the LOCAL and CONGESTmodels [Lin92, Pel00].In the LOCAL model, all vertices are treated as individual computation nodes and are given a uniqueidentiﬁer – some binary string of length O (log n ). Computation occurs in synchronous rounds where eachvertex does the following: perform arbitrary local computations, then send messages (of unbounded size)to neighbors. As the LOCAL model does not impose any restrictions on computation or communication For relevant prior work, we try our best to list all authors when there are three or less, and use their initials when thereare more (e.g. CMSY, PPORRJ, BBDFHKU). While this avoids the use of et al. in citations in favor of an equal mention of allauthors’ surnames, we apologize for the slight unreadability. k -hopneighborhood in k LOCAL rounds.The CONGEST model is identical to the LOCAL model with an additional restriction: the size of messagesthat can be sent or received per round can only be O (log n ) bits across each edge. This means that CONGESTalgorithms may no longer assume that they can learn about the k -hop topology for every vertex in k CONGESTrounds.Since the MPC model does not restrict computation within a machine, one can directly simulate any k -roundLOCAL or CONGEST algorithm in O ( k ) MPC rounds, as long as each machine sends and receives messagesof size at most O ( S ). This often allows us to directly invoke existing LOCAL and CONGEST algorithms in ablack-box fashion. First introduced by C(cid:32)LMMOSP [C(cid:32)LM + A only needs to know the k -hop neighborhoodto perform r steps of an algorithm, then these r steps can be compressed into a single MPC round once the k -hop neighborhood has been gathered. One way to speed up computation in an all-to-all communication setting (such as MPC) is the well-knowngraph exponentiation technique of Lenzen and Wattenhofer [LW10]. The idea is as follows: Suppose eachvertex is currently aware of its 2 k − -hop neighborhood, then by sending this 2 k − topology to all their currentneighbors, each vertex learns about their respective 2 k -hop neighborhoods in one additional MPC round. Inother words, every vertex can learn about its k -hop neighborhood in O (log k ) MPC rounds, as long as themachine memory is large enough. See Fig. 1 for an illustration. This technique is motivated by the fact thatonce a vertex has gathered its k -hop neighborhood, it can execute any LOCAL algorithm that runs in k roundsin just a single MPC round. u v u vk th round2 k − k − k u learns from v Figure 1: Graph exponentiation: After round k , vertex u knows the graph topology within its 2 k -hopneighborhood. Suppose we wish to execute a k -round LOCAL algorithm but the machine memory of a single machine is toosmall to contain entire k -hop neighborhoods. To get around this, one can combine graph exponentiation withround compression:1. All vertices collect the largest possible neighborhood using graph exponentiation (say only (cid:96) -hop for some (cid:96) < k ).2. All vertices simulate (cid:96) steps of the LOCAL algorithm in a single MPC round using round compression.5. All vertices update their neighbors about the status of their computation.4. Repeat steps 2-3 for O ( k/(cid:96) ) phases.This essentially creates a virtual communication graph where vertices are connected to their (cid:96) -hop neighbor-hoods. This allows a vertex to derive, in one round of MPC, all the messages that reaches it in the next (cid:96) rounds of message passing. Using one more MPC round and the fact that local computation is unbounded,vertex u can inform all neighbors in the virtual graph about its current state in the simulated message passingalgorithm. See Fig. 2 for an illustration. Vertices gather (cid:96) -hop neighborhoodvia graph exponentiation u w vu w v Figure 2: In this example, we set (cid:96) = 2. After each vertex collects their 2-hop neighborhood, computationwithin each collected neighborhood can be performed in a single compressed MPC round. Observe that thevertices u and v were originally 8 hops apart. But, in the virtual communication graph , vertices u and v can now communicate in 2 MPC rounds through vertex w ’s collected neighborhood. We see that this virtualcommunication graph has a smaller eﬀective diameter compared to the original input graph. Remark.

In Section 2.1.4, we make the implicit assumption that the states of the vertices are small and hencecan be communicated with small messages. In many algorithms (e.g. for solving MIS, matching, coloring),including ours, the vertices maintain very small states. Hence, we omit discussion of individual message sizesin the scope of this paper.

Broadcast trees are a useful MPC data structure introduced by Goodrich, Sitchinava and Zhang [GSZ11]that allow us to perform certain aggregation tasks in O (1 /δ ) MPC rounds, which is essentially O (1) forconstant δ . Suppose we have O ( N ) global memory and S = O (cid:0) n δ (cid:1) local memory. We build an S -ary virtualcommunication tree over the machines. That is, within 1 MPC round, the parent machine can send O (1)numbers to each of its S children machines, or collect 1 number from each of its S children machines. In O (log S N ) ⊆ O (1 /δ ) rounds, for all vertices v in parallel, one can: • broadcast a message from v to all neighboring vertices in N ( v ); • compute f ( N ( v )), the value of a distributive aggregate function f on set of vertices N ( v ).An example of such a function f is computing the sum/min/max of numbers that were originally distributedacross all machines. We use broadcast trees in the MPC implementation of the algorithm described in Corol-lary 19. The following result of Fischer and Noever [FN18] states that each vertex only needs the ordering of thevertices within its O (log n )-hop neighborhood in order to compute its own output status within a randomizedgreedy MIS run. We borrow some notation from Ghaﬀari and Nowicki [GN20, Lemma 3.5]. Note that N ∈ O ( n ) for graphs on n vertices. More speciﬁcally, they analyzed the “longest length of a dependency path” and showed that it is O (log n ) with high probability,which implies Theorem 2. heorem 2 (Fischer and Noever [FN18]) . Given a uniform-at-random ordering of vertices, with high proba-bility, the MIS status of any vertex is determined by the vertex orderings within its O (log n ) -hop neighborhood. Let π : [ n ] → V be a uniform-at-random ordering of vertices and G be a graph with maximum degree ∆. InSection 3, we show that one can compute greedy MIS (with respect to π ) in O (log ∆ · log log n ) MPC rounds. Theorem 3 (Randomized greedy MIS (Informal)) . Let G be a graph with maximum degree ∆ . Then, random-ized greedy MIS can be computed in O (log ∆ · log log n ) MPC rounds.

The algorithm works in phases. In each phase, we process a preﬁx graph G preﬁx deﬁned by vertices indexedby a preﬁx of π . By Theorem 2, it suﬃces for these vertices to collect their O (log n )-hop neighborhoods. If weassign a machine to each vertex in G preﬁx , we can combine graph exponentiation with round compression , asexplained in Section 2.1.4, to compute the greedy MIS on G preﬁx : each vertex will gather only their O (cid:16) log n log log n (cid:17) -hop neighborhood using graph exponentiation, then we simulate the greedy MIS algorithm in O (log log n )compressed rounds. For a suﬃciently large preﬁx of π , we can show that the maximum degree of the inputgraph after processing G preﬁx to ∆ / n vertices after O (log ∆) phases. See Algorithm 1 and Algorithm 2 for pseudocode, and Fig. 3 for an illustration. Algorithm 1

Greedy MIS in sublinear memory regime of the MPC model Input : Graph G = ( V, E ) with maximum degree ∆ Let π : [ n ] → V be an ordering of vertices chosen uniformly at random. for i = 0 , , , . . . , O (log ∆) do (cid:46) O (log ∆) phases, or until G is empty Let preﬁx size t i = O (cid:16) n log n ∆ / i (cid:17) and preﬁx oﬀset o i = (cid:80) i − z =0 t z . Let G i be the graph induced by vertices π ( o i + 1) , . . . , π ( o i + t i ) with maximum degree ∆ (cid:48) . Process preﬁx graph G i using Algorithm 2. (cid:46) By Chernoﬀ bounds, ∆ (cid:48) ∈ O (log n ) end for Process any remaining vertices in G using additional O (log log n ) MPC rounds. Algorithm 2

Greedy MIS on preﬁx graph with n (cid:48) ≤ n vertices in O (log log n ) MPC rounds using n (cid:48) machines Input : Vertex ordering π , number of vertices n , preﬁx graph G preﬁx on n (cid:48) vertices with max. degree∆ ≤ poly(log n ) Assign a machine to each vertex. Graph exponentiate and simulate greedy MIS (with respect to π ) on G preﬁx in O (log log n ) MPC rounds. Remark.

If the input graph already has maximum degree ∆ is at most poly(log n ), then one can directly applyAlgorithm 2 to compute greedy MIS in O (log log n ) MPC rounds without resorting to Algorithm 1. Our algorithmic results for correlation clustering derive from the following key structural lemma that is provenby arguing that a local improvement to the clustering cost is possible if there exists large clusters.

Lemma 4 (Structural lemma for correlation clustering (Informal)) . There exists an optimum correlationclustering where all clusters have size at most λ − . This structural lemma allows us to perform cost-charging arguments against some optimum clustering withbounded cluster sizes. In particular, if a vertex has degree much larger than λ , then many of its incidentedges incur disagreements. This insight yields the following algorithmic implication: we can eﬀectively ignorehigh-degree vertices. Note that we cannot directly apply graph exponentiation as the O (log n )-hop neighborhood of a vertex could be larger thanmachine memory. H G H H H ﬁnal ProcessedProcessed ProcessedInitialAfterPhase 1AfterPhase 2After O (log ∆)phases Process with Algorithm 2 in O (log log n ) MPC roundsProcess with Algorithm 2 in O (log log n ) MPC roundsProcess with Algorithm 2 in O (log log n ) MPC rounds π (1) . . . π ( t ) . . . π ( n ) π ( t + 1) . . . π ( t + t ) . . . π ( n ) π ( t + t + 1) . . . π ( n ) π ( n )Figure 3: Illustration of Algorithm 1 given an initial graph G on n vertices with maximum degree ∆. Let i ∈ { , . . . , O (log ∆) } and deﬁne t i = O (cid:16) n log n ∆ / i (cid:17) . For each i , with high probability, the induced subgraph G i has maximum degree poly(log n ). Apply Algorithm 2 to process G i in O (log log n ) MPC rounds while usingtotal global memory of (cid:101) O (cid:0) n δ (cid:1) . By our choice of t i , Lemma 11 tells us that remaining subgraph H i hasmaximum degree ∆ / i . We repeat this argument until the ﬁnal subgraph H ﬁnal involving poly(log n ) vertices,which can be processed in another O (log log n ) MPC rounds by another call to Algorithm 2. Theorem 5 (Algorithmic implication (Informal)) . Let G be a graph where E + induces a λ -arboric graph.Form singleton clusters with vertices with degrees O ( λ/ε ) . Run an α -approximate algorithm A on the remain-ing subgraph. Then, the union of clusters is a max { ε, α } -approximation. The runtime and approximationguarantees of the overall algorithm follows from the guarantees of A (e.g. in expectation / worst case, deter-ministic / randomized). Observe that

PIVOT essentially simulates a randomized greedy MIS with respect to a uniform-at-randomordering of vertices. By setting ε = 2 in Theorem 5 and ∆ = O ( λ ) in Theorem 3, we immediately obtain a3-approximation (in expectation) algorithm for correlation clustering in O (log λ · log log n ) MPC rounds. Notethat we always have λ ≤ ∆ ≤ n , and that λ can be signiﬁcantly smaller than ∆ and n in general. Manysparse graphs have λ ∈ O (1) while having unbounded maximum degrees, including planar graphs and boundedtreewidth graphs. As such, for several classes of graphs, our result improves over directly simulating PIVOT in O (log n ) rounds. Corollary 6 (General algorithm (Informal)) . Let G be a complete signed graph such that E + induces a λ -arboric graph. There exists an algorithm that, with high probability, produces a 3-approximation (in expectation)for correlation clustering of G in O (log λ · log log n ) MPC rounds.Remark (On converting “in expectation” to “with high probability”) . Note that one can run O (log n ) copiesof Corollary 6 in parallel and output the best clustering. Applying this standard trick converts the “in expec-tation” guarantee to a “with high probability” guarantee with only a logarithmic factor increase in memoryconsumption.In the case of forests (i.e. λ = 1), Lemma 4 states that the optimum correlation clustering cost correspondsto the number of edges minus the size of the maximum matching. Instead of computing a maximum matching,Lemma 7 tells us that computing an approximate matching suﬃces to obtain an α -approximation (not nec-essarily maximal) to the correlation clustering problem. Note that maximal matchings are 2-approximationsand they always apply. Lemma 7 (Approximation via approximate matchings (Informal)) . Let G be a complete signed graph suchthat E + induces a forest. Suppose that the maximum matching size on E + is | M ∗ | . If M is a matching on E + such that α · | M | ≥ | M ∗ | , for some ≤ α ≤ , then clustering using M yields an α -approximation to theoptimum correlation clustering of G . λ = 1. More speciﬁcally, we considerthe following results. • Using dynamic programming, BBDHM [BBD +

18] compute a maximum matching (on trees) in O (log n )MPC rounds. • In LOCAL, EMR [EMR15] deterministically solve (1 + ε )-approx. matching in O (cid:16) ∆ O ( ε ) + ε · log ∗ n (cid:17) rounds. • In CONGEST, BCGS [BYCHGS17] give an O (cid:16) O (1 /ε ) · log ∆log log ∆ (cid:17) round randomized algorithm for (1+ ε )-approx. matching.These approximation results are heavily based on the Hopcroft-Karp framework [HK73], where independentsets of augmenting paths are iteratively ﬂipped. Since λ = 1 and ε is a constant, we have a subgraph of constantmaximum degree by ignoring vertices with degrees O ( λ/ε ). On this constant degree graph, each vertex onlyneeds polylogarithmic memory when we perform graph exponentiation, satisfying the memory constraints ofModel 1. Thus, applying these matching algorithms together with Theorem 5 and Lemma 7 yields the followingresult. Corollary 8 (Forest algorithm (Informal)) . Let G be a complete signed graph such that E + induces a forestand < ε ≤ be a constant. Hiding factors in /ε using O ε ( · ) , there exists:1. An optimum randomized algorithm that runs in O (log n ) MPC rounds.2. A (1 + ε ) -approximation (worst case) deterministic algorithm that runs in O ε (log log ∗ n ) MPC rounds.3. A (1 + ε ) -approximation (worst case) randomized algorithm that runs in O ε (1) MPC rounds.

Finally, we give a simple O (cid:0) λ (cid:1) -approximate (worst-case) algorithm in O (1) MPC rounds. Corollary 9 (Simple algorithm (Informal)) . Let G be a complete signed graph such that E + induces a λ -arboric graph. Then, there exists an O (cid:0) λ (cid:1) -approximation (worst case) deterministic algorithm that runs in O (1) MPC rounds.

The following algorithm can be implemented in O (1) MPC rounds using broadcast trees: connected com-ponents which are cliques form clusters, and all other vertices form individual singleton clusters. We now givean informal argument when the input graph is a single connected component but not a clique. By Lemma 4,there will be ≥ n/λ clusters and so the optimal number of disagreements is ≥ n/λ . Meanwhile, the singletonclusters incurs errors on all positive edges, i.e. ≤ λ · n since E + induces a λ -arboric graph. Thus, the worstpossible approximation ratio is ≈ λ . In this section, we explain how to eﬃciently compute a randomized greedy MIS in the sublinear memory regimeof the MPC model. As discussed in Section 2, we rely on the result of Fischer and Noever [FN18] and exploitcomputational features of the MPC model such as graph exponentiation and round compression.We ﬁrst prove two helper lemmata with respect to a uniform-at-random ordering of vertices π . Lemma 10states that one can eﬃciently compute greedy MIS on graphs with polylogarithmic maximum degrees whileLemma 11 bounds the maximum degree of the remaining subgraph after processing t ≤ n vertices. Wecompose these lemmata in Theorem 12 to obtain an eﬃcient algorithm for computing randomized greedy MISon bounded maximum degree graphs. Remark.

Similar statements to Lemma 10 and Lemma 11 were previously known (e.g. see [GGK +

18, Section 3]and [ACG +

15, Lemma 27]). In this section, we combine these lemmata with the result of Fischer and Noever[FN18] to obtain a new bound for randomized greedy MIS in strongly sublinear memory regime of the MPCmodel.

Lemma 10.

Consider Model 1 with input graph G = ( V, E ) and | V | = n . Let G preﬁx be a subgraph on n (cid:48) ≤ n vertices and π be a uniform-at-random ordering of vertices in G preﬁx . Suppose G preﬁx has maximum degree ∆ is at most poly (log n ) . Then, with high probability, one can simulate greedy MIS on G preﬁx (with respect to π )in O (log log n ) MPC rounds. roof. By Theorem 2, it suﬃces for any vertex to learn the ordering of vertices within its O (log n )-hopneighborhood in G preﬁx to determine whether itself is in the MIS. However, due to machine memory con-straints, vertices may not be able to directly store their full O (log n )-hop neighborhoods in a single ma-chine. Instead, vertices will gather their O (cid:16) log n log log n (cid:17) -hop neighborhood via graph exponentiation, then sim-ulate the greedy MIS algorithm in O (log log n ) compressed rounds. The total runtime of this procedure is O (cid:16) log (cid:16) log n log log n (cid:17) + log log n (cid:17) ⊆ O (log log n ).It remains to show that the O (cid:16) log n log log n (cid:17) -hop neighborhood of any vertex ﬁts in a single machine. Letus be more precise about the constant factors involved. Suppose that ∆ = D · log k ( n ) and that the longestdependency chain in greedy MIS has length L · log n for some constants D, L > k ≥

1. For some constant

C >

0, let R = C · L · (cid:16) log n log log n (cid:17) denote the R -hop neighborhood that we want to collect into a single machine.Note that the parameters { D, L, k, δ } are given to us, and we only have control over the parameter C . If wepick C such that C · L · k · max { , log D } < δ <

1, then R · log ∆ = C · L · (cid:18) log n log log n (cid:19) · (log D + k log log n ) ∈ O ( δ · log n ) ⇐⇒ ∆ R ∈ O (cid:0) n δ (cid:1) Thus, with appropriate constant factors, the O (cid:16) log n log log n (cid:17) -hop neighborhood of any vertex ﬁts in a singlemachine. Lemma 11.

Let G be a graph on n vertices and π : [ n ] → V be a uniform-at-random ordering of vertices.For t ∈ [ n ] , consider the subgraph H t obtained after processing vertices { π (1) , . . . , π ( t ) } via greedy MIS (withrespect to π ). Then, with high probability, the maximum degree in H t is at most O (cid:16) n log nt (cid:17) .Proof. For the sake of clarity, we now prove the statement by setting the maximum degree bound in H t to n log nt . The constant 10 is arbitrary and can be adjusted based on how this lemma is invoked.Let t (cid:48) ∈ [ t ] be an arbitrary round and v an arbitrary vertex. Suppose v has degree d t (cid:48) − ( v ) after processingthe ﬁrst ( t (cid:48) −

1) vertices deﬁned by π (1) , . . . , π ( t (cid:48) − t (cid:48) = 1, then nothing has been processed yet and d t (cid:48) − ( v ) = d ( v ) = d ( v ), where d ( v ) is the degree of v in the input graph G . If d t (cid:48) − ( v ) ≤ n log nt , thenthe vertex v already satisﬁes the lemma since vertex degrees never increase while processing π . Otherwise, d t (cid:48) − ( v ) > n log nt . We now proceed to upper bound the probability of vertex v remaining in the subgraphafter processing π ( t (cid:48) ).For vertex v to remain, neither v nor any of its neighbors must be chosen to be π ( t (cid:48) − π is a uniform-at-random ordering of vertices, this happens with probability 1 − d t (cid:48)− ( v ) n − t (cid:48) +1 ≤ − d t (cid:48)− ( v ) n < − n log ntn =1 −

10 log nt . Thus, the probability that vertex v remains in H t , while having d t ( v ) > n log nt after processing π ( t ), is at most (cid:16) −

10 log nt (cid:17) t ≤ exp ( −

10 log n ) = n − . The lemma follows by taking a union bound over allvertices. Theorem 12.

Consider Model 1. Let G be a graph with n vertices of maximum degree ∆ and π : [ n ] → V be a uniform-at-random ordering of vertices. Then, with high probability, one can compute greedy MIS (withrespect to π ) in O (log ∆ · log log n ) MPC rounds.Proof.

By using Algorithm 2 as a subroutine, Algorithm 1 runs in O (log ∆ · log log n ) MPC rounds. Analysis of Algorithm 1 : There are O (log ∆) phases in Algorithm 1. For i ∈ { , . . . , O (log ∆) } and someappropriate constant factors, we set t i = O (cid:16) n log n ∆ / i (cid:17) and consider the induced preﬁx graph G i . By Chernoﬀbounds, the maximum degree in G i is O (log n ) with high probability in n . So, we can apply Algorithm 2 tocompute greedy MIS within G i in O (log log n ) MPC rounds. By Lemma 11, the maximum degree of the graphafter processing G i is halved with high probability in n . So, after O (log ∆) phases, there are at most poly(log n )vertices left in the graph. Thus, the maximum degree is at most poly(log n ) and we apply Algorithm 2 one lasttime. Finally, since ∆ ≤ n , we can apply union bound over these O (log ∆) phases to upper bound the failureprobability. Analysis of Algorithm 2 : Since the algorithm is invoked on a graph with the maximum degree at mostpoly(log n ), we can apply Lemma 10 to compute greedy MIS on the vertices in O (log log n ) MPC rounds.10 Structural properties for correlation clustering in low arboricitygraphs

In this section, we prove our main result (Theorem 14) about correlation clustering by ignoring high-degreevertices. To do so, we ﬁrst show a structural result of optimum correlation clusterings (Lemma 13): there exists an optimum clustering with bounded cluster sizes. This structural lemma also implies that in the special case offorests (i.e. λ = 1), a maximum matching on E + yields an optimum correlation clustering of G (Corollary 15). Lemma 13 (Structural lemma for correlation clustering) . Let G be a complete signed graph such that positiveedges E + induce a λ -arboric graph. Then, there exists an optimum correlation clustering where all clustershave size at most λ − .Proof. The proof involves performing local updates by repeatedly removing vertices from large clusters whilearguing that the number of disagreements does not increase (it may not strictly decrease but may stay thesame).Consider an arbitrary clustering that has a cluster C of size at least | C | ≥ λ −

1. We will show thatthere exists some vertex v ∗ ∈ C such that d + C ( v ∗ ) ≤ λ −

1. Observe that removing v ∗ to form its ownsingleton cluster creates d + C ( v ∗ ) positive disagreements and removes ( | C | − − d + C ( v ∗ ) negative disagreements.Since d + C ( v ∗ ) ≤ λ − ≤ | C | +12 − | C |− , we see that this local update will not increase the number ofdisagreements. It remains to argue that v ∗ exists.Suppose, for a contradiction, that such a vertex v ∗ does not exist in a cluster of size | C | ≥ λ −

1. Then, d + C ( v ) ≥ λ for all vertices v ∈ C . Summing over all vertices in C , we see that (cid:12)(cid:12) E + ( C ) (cid:12)(cid:12) = 12 (cid:88) v ∈ C d + C ( v ) ≥ · | C | · λ = | C | · λ ≥ | C | · | E + ( C ) || C | − > (cid:12)(cid:12) E + ( C ) (cid:12)(cid:12) where the second last inequality follows from the deﬁnition of arboricity. This is a contradiction, thus sucha vertex v ∗ exists. Repeating this argument (i.e. removing all vertices like v ∗ to form their own singletonclusters), we can transform any optimum clustering into one with clusters of size at most 4 λ − Theorem 14 (Algorithmic implication of Lemma 13) . Let G be a complete signed graph such that positiveedges E + induce a λ -arboric graph. For ε > , let H = (cid:110) v ∈ V : d ( v ) > ε ) ε · λ (cid:111) ⊆ V be the set of high-degree vertices, and G (cid:48) ⊆ G be the subgraph obtained by removing high-degree vertices in H . Suppose A is an α -approximate correlation clustering algorithm and cost ( OP T ( G )) is the optimum correlation clustering cost.Then, cost ( {{ v } : v ∈ H } ∪ A ( G (cid:48) )) ≤ max { ε, α } · cost ( OP T ( G )) where {{ v } : v ∈ H } ∪ A ( G (cid:48) ) is the clustering obtained by combining the singleton clusters of high-degreevertices with A ’s clustering of G (cid:48) . See Algorithm 3 for a pseudocode. Furthermore, if A is α -approximationonly in expectation, then the above inequality holds only in expectation.Proof. Denote edges incident to high-degree vertices as marked ( M ), and unmarked ( U ) otherwise. We furthersplit marked edges into positive ( M + ) and negative marked edges ( M − ). In other words, we partition the edgeset E into M + ∪ M − ∪ U . Instead of the usual handshaking lemma , we have (cid:12)(cid:12) M + (cid:12)(cid:12) ≤ (cid:88) v ∈ H d + ( v ) ≤ · (cid:12)(cid:12) M + (cid:12)(cid:12) (1)because high-degree vertices may have low-degree neighbors and marked edges may be counted twice in thesum. See Fig. 4 for an illustration.Fix an optimum clustering OP T ( G ) of G where each cluster has size at most 4 λ −

2. Such a cluster-ing exists by Lemma 13. Observe that cost ( OP T ( G )) = (disagreements in M + ) + (disagreements in M − ) +(disagreements in U ) ≥ (disagreements in M + )+0+(disagreements in U ). We defer the proof for the followinginequality and ﬁrst use it to derive our result: cost ( OP T ( G )) ≥

11 + ε · (cid:12)(cid:12) M + (cid:12)(cid:12) + (disagreements in U ) (2) Handshaking lemma: (cid:80) v ∈ V d ( v ) = 2 | E | . .. . . v ∈ H Figure 4: High-degree vertices H are ﬁlled and only edges in M + are shown. Edges contributing to d + ( v )are highlighted in red. In the summation of Eq. (1), dashed edges are counted only once and solid edges arecounted twice, hence | M + | ≤ (cid:80) v ∈ H d + ( v ) ≤ | M + | .Since ignoring high-degree vertices in the clustering from OP T ( G ) yields a valid clustering for G (cid:48) , we see that cost ( OP T ( G (cid:48) )) ≤ (disagreements in U ). Thus, cost ( {{ v } : v ∈ H } ∪ A ( G (cid:48) )) = (cid:12)(cid:12) M + (cid:12)(cid:12) + cost ( A ( G (cid:48) )) ≤ (cid:12)(cid:12) M + (cid:12)(cid:12) + α · cost ( OP T ( G (cid:48) )) since A is α -approximate ≤ (cid:12)(cid:12) M + (cid:12)(cid:12) + α · (disagreements in U ) since cost ( OP T ( G (cid:48) )) ≤ (disagreements in U ) ≤ max { ε, α } · cost ( OP T ( G )) by Eq. (2)If A is only α -approximate in expectation, the same argument yields the same conclusion, but only in expec-tation.To prove Eq. (2), it suﬃces to show that (disagreements in M + ) ≥ ε · | M + | . Consider an arbitraryhigh-degree vertex v ∈ H . By choice of OP T ( G ) and deﬁnition of H , at most4 λ − ≤ λ ≤ ε ε ) · d ( v ) (3)of marked edges incident to v belong in the same cluster and will not cause any external positive disagreements.Let us call these edges good edges since they do not incur any cost in cost ( OP T ( G )). In total, across all high-degree vertices, there are at most (cid:80) v ∈ H ε ε ) · d ( v ) ≤ ε ε ) · · | M + | = ε ε · | M + | good edges due to Eq. (3)and Eq. (1). In other words, (disagreements in M + ) ≥ (cid:16) − ε ε (cid:17) · | M + | = ε · | M + | . Algorithm 3

Correlation clustering for low-arboricity graphs

Input : Graph G = ( V, E = E + ∪ E − ) such that E + induces a λ -arboric graph, ε > α -approximatealgorithm A Let H = (cid:110) v ∈ V : d ( v ) > ε ) ε · λ (cid:111) ⊆ V be the set of high-degree vertices.Let G (cid:48) ⊆ G be a subgraph obtained by removing high-degree vertices in H .Let A ( G (cid:48) ) be the clustering obtained by running A on the subgraph G (cid:48) of bounded degree. Return

Clustering {{ v } : v ∈ H } ∪ A ( G (cid:48) ). Corollary 15 (Maximum matchings yield optimum correlation clustering in forests) . Let G be a completesigned graph such that positive edges E + induce a forest (i.e. λ = 1 ). Then, clustering using a maximum matching on E + yields an optimum cost correlation clustering.Proof. By Lemma 13, with λ = 1, we know that there exists an optimum correlation clustering where clustersare of size 1 or 2. A size-2 cluster reduces one disagreement if the vertices are joined by a positive edge. Hence,the total number of disagreements is minimized by maximizing the number of such size-2 clusters, which is thesame as computing the maximum matching on the set of positive edges E + . In this section, we describe how to use our main result (Theorem 14) to obtain eﬃcient correlation clusteringalgorithms in the sublinear memory regime of the MPC model. Theorem 14 implies that we can focus on12olving correlation clustering on graphs with maximum degree O ( λ ).For general λ -arboric graphs, we simulate PIVOT by invoking Theorem 12 to obtain Corollary 16. Forforests, Corollary 15 states that a maximum matching on E + yields an optimal correlation clustering. Then,Lemma 17 tells us that if one computes an approximate matching (not necessarily maximal) instead of amaximum matching, we still get a reasonable cost approximation to the optimum correlation clustering. Byinvoking existing matching algorithms, we show how to obtain three diﬀerent correlation clustering algorithms(with diﬀerent guarantees) in Corollary 18. Finally, Corollary 19 gives a deterministic constant round algorithmthat yields an O ( λ ) approximation. Corollary 16.

Consider Model 1. Let G be a complete signed graph such that positive edges E + induce a λ -arboric graph. With high probability, there exists an algorithm that produces a 3-approximation (in expectation)for correlation clustering of G in O (log ∆ · log log n ) MPC rounds.Proof.

Run Algorithm 3 with ε = 2 and PIVOT as A . The approximation guarantee is due to the fact that PIVOT gives a 3-approximation in expectation. Since ε = 2, the maximum degree in G (cid:48) is 12 λ . Set ∆ = 12 λ inTheorem 12. Lemma 17.

Let G be a complete signed graph such that positive edges E + induce a forest. Suppose | M ∗ | isthe size of a maximum matching on E + and M is an approximate matching on E + where α · | M | ≥ | M ∗ | forsome ≤ α ≤ . Then, clustering using M yields an α -approximation to the optimum correlation clusteringof G .Proof. Clustering based on any matching (i.e. forming clusters of size two for each matched pair of verticesand singleton clusters for unmatched vertices) incurs n − − | M | disagreements. By Corollary 15, clusteringwith respect to a maximum matching yields a correlation clustering of optimum cost. If | M ∗ | = | M | , thenthe approximation ratio is trivially 1. Henceforth, let | M | ≤ | M ∗ | −

1. Observe that n − − α ·| M ∗ | n − −| M ∗ | ≤ α ⇐⇒| M ∗ | · (cid:0) α (cid:1) ≤ n −

1. Indeed, | M ∗ | · (cid:18) α (cid:19) ≤ | M ∗ | + | M | since α · | M | ≥ | M ∗ |≤ · | M ∗ | − | M | ≤ | M ∗ | − ≤ n − | M ∗ | ≤ n n verticesThus, the approximation factor for using an approximate matching M is n − −| M | n − −| M ∗ | ≤ n − − α ·| M ∗ | n − −| M ∗ | ≤ α . Remark.

The approximation ratio of Lemma 17 tends to 1 as | M | tends to | M ∗ | . The worst ratio possible is2 and this approximation ratio is tight: consider a path of 4 vertices and 3 edges with | M ∗ | = 2 and maximalmatching | M | = 1. Corollary 18.

Consider Model 1. Let G be a complete signed graph such that positive edges E + induce aforest. Let < ε ≤ be a constant. Then, there exists the following algorithms for correlation clustering:1. An optimum randomized algorithm that runs in (cid:101) O (log n ) MPC rounds.2. A (1 + ε ) -approximation (worst case) deterministic algorithm that runs in O (cid:0) ε · (cid:0) log ε + log log ∗ n (cid:1)(cid:1) MPC rounds.3. A (1 + ε ) -approximation (worst case) randomized algorithm that runs in O (cid:0) log log ε (cid:1) MPC rounds.Proof.

For the ﬁrst algorithm, we use the algorithm of BBDHM [BBD +

18] to compute a maximum matchingin (cid:101) O (log n ) MPC rounds, and cluster matched vertices together according to Corollary 15.For the second algorithm, we apply Theorem 14 with λ = 1, α = 1 / (1 + ε ), and A as the deterministicapproximate matching algorithm of Even, Medina and Ron [EMR15] on the subgraph with maximum degree∆ ∈ O (1 /ε ). Their algorithm runs in R ∈ O (cid:0) ∆ O (1 /ε ) + ε · log ∗ n (cid:1) LOCAL rounds. This can be spedup to O (cid:0) ε · (cid:0) log ε + log log ∗ n (cid:1)(cid:1) MPC rounds via graph exponentiation since ∆ ∈ O (1 /ε ) and each R -hopneighborhood is of logarithmic size.For the third algorithm, we apply Theorem 14 with λ = 1, α = 1 + ε , and A as the randomized approximatematching algorithm of BCGS [BYCHGS17] on the subgraph with maximum degree ∆ ∈ O (1 /ε ). Theiralgorithm runs in R ∈ O (cid:16) log ε log log ε (cid:17) ⊆ O (cid:0) log ε (cid:1) CONGEST rounds. This can be sped up to O (cid:0) log log ε (cid:1) MPCrounds via graph exponentiation since ∆ ∈ O (1 /ε ) and each R -hop neighborhood is of logarithmic size.13 emark. Let M ∗ be some maximum matching and M be some approximate matching. The approximationratio in [EMR15] is stated as | M | = (1 − ε (cid:48) ) · | M ∗ | while we write (1 + ε ) · | M | = | M ∗ | . This is only a constantfactor diﬀerence: ε (cid:48) ∈ Θ( ε ). Corollary 19.

Consider Model 1. Let G be a complete signed graph such that positive edges E + induce a λ -arboric graph. Then, there exists a deterministic algorithm that produces an O ( λ ) -approximation (worstcase) for correlation clustering of G in O (1) MPC rounds.Proof.

Consider the following deterministic algorithm: Each connected component (with respect to E + ) thatis a clique forms a single cluster, then all remaining vertices form singleton clusters. MPC implementation.

Any clique in a λ -arboric graph involves at most 2 λ vertices. Ignoring vertices withdegrees larger than (2 λ − O (1) MPC rounds using broadcast trees. Approximation analysis.

Fix an optimum clustering

OP T ( G ) of G such that each cluster has size at most4 λ −

2. Such a clustering exists by Lemma 13. Note that clusters in any optimum clustering are connectedcomponents (with respect to E + ), otherwise one can strictly improve the cost by splitting up such clusters.By bounding the approximation ratio for an arbitrary connected component in the input graph G , we obtaina worst case approximation ratio.Consider an arbitrary connected component H on n vertices and m positive edges. Since H is λ -arboric, m ≤ λ · n . If H is a clique, then our algorithm incurs zero disagreements. Otherwise, our algorithm formssingleton clusters and incurs m ≤ λ · n disagreements. Since each cluster in OP T ( G ) has size at most 4 λ − n λ − clusters of OP T ( G ) involving vertices in H . Since H is a connected component,this means that OP T ( G ) incurs at least n λ − − H . Thus,the worst possible approximation ratio is λ · n n λ − − ∈ O ( λ ). Remark.

The approximation analysis in Corollary 19 is tight (up to constant factors): consider the barbellgraph where two cliques K λ (cliques on λ vertices) are joined by a single edge. The optimum clustering formsa cluster on each K λ and incurs one external disagreement. Meanwhile, forming singleton clusters incurs ≈ λ positive disagreements. In this work, we present a structural result on correlation clustering of complete signed graphs such that thepositive edges induce a bounded arboricity graph. Combining this with known algorithms, we obtain eﬃcientalgorithms in the sublinear memory regime of the MPC model. We also showed how to compute a randomizedgreedy MIS in O (log ∆ · log log n ) MPC rounds. As intriguing directions for future work, we pose the followingquestions: Question 1.

Can a randomized greedy MIS be computed using linear total space, i.e. S · M ∈ (cid:101) O ( | E + | ) ? In this work, we allocated a machine to each vertex, using a total global memory of (cid:101) O (cid:0) n δ (cid:1) , so as to focuson the algorithmic aspects of the problem without getting muddled in details. We hypothesize that it wouldbe possible to reduce the global memory footprint using a more reﬁned algorithm (e.g. using ideas from graphshattering [BEPS16]). Question 2.

Can a randomized greedy MIS be computed in O (log ∆ + log log n ) or O (cid:0) √ log ∆ + log log n (cid:1) MPC rounds?

This would imply that a 3-approximate (in expectation) correlation clustering algorithm in the same numberof MPC rounds. We posit that a better running time than O (log ∆ · log log n ) should be possible. The informalintuition is as follows: Fischer and Noever’s result [FN18] tells us that most vertices do not have long dependencychains in every phase, so pipelining arguments similar to the related work mentioned in Section 1.4 might work. Question 3.

Is there an eﬃcient distributed algorithm to minimize disagreements with an approximationguarantee strictly better than 3 (in expectation), or worst-case guarantees for general graphs?

For minimizing disagreements in complete signed graphs, known algorithms (see Section 1.4) with approx-imation guarantees strictly less than 3 (in expectation) are based on probabilistic rounding of LPs.

Can oneimplement such LPs eﬃciently in a distributed setting, or design an algorithm that is amenable to a distributedimplementation with provable guarantees strictly better than 3?

In this work, we gave algorithms with worst-case approximation guarantees when the graph induced by positive edges is a forest.

Can one design algorithmsthat give worst-case guarantees for general graphs? cknowledgements This work was supported in part by the Academy of Finland, Grant 334238.

References [ACG +

15] Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth.Correlation Clustering in Data Streams. In

Proceedings of the 32nd International Conference onInternational Conference on Machine Learning (ICML) , page 2237–2246, 2015. 9[ACN08] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating Inconsistent Information: Rank-ing and Clustering.

Journal of the ACM (JACM) , 55(5):1–27, 2008. 2, 4[BBC04] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation Clustering.

Machine learning ,56(1-3):89–113, 2004. 4[BBD +

18] MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Haji-aghayi, and Vahab Mirrokni. Brief Announcement: MapReduce Algorithms for Massive Trees.In , pages162:1–162:4, 2018. 9, 13[BBD +

19] Soheil Behnezhad, Sebastian Brandt, Mahsa Derakhshan, Manuela Fischer, MohammadTaghiHajiaghayi, Richard M. Karp, and Jara Uitto. Massively Parallel Computation of Matching andMIS in Sparse Graphs. In

Proceedings of the 2019 ACM Symposium on Principles of DistributedComputing (PODC) , pages 481–490, 2019. 4[BEPS16] Leonid Barenboim, Michael Elkin, Seth Pettie, and Johannes Schneider. The Locality of Dis-tributed Symmetry Breaking.

Journal of the ACM (JACM) , 63(3):1–45, 2016. 14[BKS17] Paul Beame, Paraschos Koutris, and Dan Suciu. Communication Steps for Parallel Query Pro-cessing.

Journal of the ACM (JACM) , 64(6):1–58, 2017. 2[BYCHGS17] Reuven Bar-Yehuda, Keren Censor-Hillel, Mohsen Ghaﬀari, and Gregory Schwartzman. Dis-tributed Approximation of Maximum Independent Set and Maximum Matching. In

Proceedingsof the ACM Symposium on Principles of Distributed Computing (PODC) , pages 165–174, 2017.9, 13[CBGVZ13] Nicol`o Cesa-Bianchi, Claudio Gentile, Fabio Vitale, and Giovanni Zappella. A Correlation Clus-tering Approach to Link Classiﬁcation in Signed Networks.

Journal of Machine Learning Re-search (JMLR) , 23:34.1–34.20, 2013. 1[CDK14] Flavio Chierichetti, Nilesh Dalvi, and Ravi Kumar. Correlation Clustering in MapReduce. In

Proceedings of the 20th ACM SIGKDD international conference on Knowledge Discovery andData mining (KDD) , pages 641–650, 2014. 2, 4[CDP20] Artur Czumaj, Peter Davies, and Merav Parter. Graph sparsiﬁcation for derandomizing mas-sively parallel computation with low space. In

Proceedings of the 32nd ACM Symposium onParallelism in Algorithms and Architectures (SPAA) , pages 175–185, 2020. 4[CGW05] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with Qualitative In-formation.

Journal of Computer and System Sciences , 71(3):360–383, 2005. 4[C(cid:32)LM +

19] Artur Czumaj, Jakub (cid:32)Lacki, Aleksander Madry, Slobodan Mitrovic, Krzysztof Onak, and PiotrSankowski. Round Compression for Parallel Matching Algorithms.

SIAM Journal on Computing ,49(5):STOC18–1, 2019. 5[CMSY15] Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near Op-timal LP Rounding Algorithm for Correlation Clustering on Complete and Complete k -partitegraphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing(STOC) , pages 219–228, 2015. 4 15CSX12] Yudong Chen, Sujay Sanghavi, and Huan Xu. Clustering Sparse Graphs. In

Advances in NeuralInformation Processing Systems (NeurIPS) , pages 2204–2212, 2012. 1[DEFI06] Erik D Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. Correlation Clustering inGeneral Weighted Graphs.

Theoretical Computer Science , 361(2-3):172–187, 2006. 4[DG08] Jeﬀrey Dean and Sanjay Ghemawat. MapReduce: Simpliﬁed Data Processing on Large Clusters.

Communications of the ACM , 51(1):107–113, 2008. 2[EMR15] Guy Even, Moti Medina, and Dana Ron. Distributed Maximum Matching in Bounded DegreeGraphs. In

Proceedings of International Conference on Distributed Computing and Networking(ICDCN) , 2015. 9, 13, 14[FN18] Manuela Fischer and Andreas Noever. Tight Analysis of Parallel Randomized Greedy MIS. In

Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages2152–2160, 2018. 2, 4, 6, 7, 9, 14[GGK +

18] Mohsen Ghaﬀari, Themis Gouleakis, Christian Konrad, Slobodan Mitrovi´c, and Ronitt Ru-binfeld. Improved Massively Parallel Computation Algorithms for MIS, Matching, and VertexCover. In

Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC) ,pages 129–138, 2018. 9[GN20] Mohsen Ghaﬀari and Krzysztof Nowicki. Massively Parallel Algorithms for Minimum Cut. In

Proceedings of the 39th Symposium on Principles of Distributed Computing (PODC) , pages 119–128, 2020. 6[GSZ11] Michael T Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, Searching, and Simulationin the MapReduce Framework. In

International Symposium on Algorithms and Computation(ISAAC) , pages 374–383, 2011. 6[GU19] Mohsen Ghaﬀari and Jara Uitto. Sparsifying Distributed Algorithms with Ramiﬁcations inMassively Parallel Computation and Centralized Local Computation. In

Proceedings of the 30thAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1636–1653, 2019. 2, 4[HK73] John E. Hopcroft and Richard M. Karp. An n / Algorithm for Maximum Matchings in BipartiteGraphs.

SIAM Journal on Computing , 2(4):225–231, 1973. 9[IBY +

07] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: Dis-tributed Data-parallel Programs from Sequential Building Blocks. In

Proceedings of the 2ndACM SIGOPS/EuroSys European Conference on Computer Systems 2007 , pages 59–72, 2007. 2[KSV10] Howard Karloﬀ, Siddharth Suri, and Sergei Vassilvitskii. A Model of Computation for MapRe-duce. In

Proceedings of the 21st annual ACM-SIAM symposium on Discrete Algorithms (SODA) ,pages 938–948, 2010. 2[Lin92] Nathan Linial. Locality in distributed graph algorithms.

SIAM Journal on computing , 21(1):193–201, 1992. 4[LW10] Christoph Lenzen and Roger Wattenhofer. Brief announcement: Exponential Speed-up of LocalAlgorithms Using Non-local Communication. In

Proceedings of the 29th ACM SIGACT-SIGOPSsymposium on Principles of Distributed Computing (PODC) , pages 295–296, 2010. 5[Pel00] David Peleg.

Distributed Computing: A Locality-Sensitive Approach . 2000. 4[PPO +

15] Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran,and Michael I Jordan. Parallel Correlation Clustering on Big Graphs. In

Advances in NeuralInformation Processing Systems (NeurIPS) , pages 82–90, 2015. 2, 4[Swa04] Chaitanya Swamy. Correlation Clustering: Maximizing Agreements via Semideﬁnite Program-ming. In

Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) ,volume 4, pages 526–527, 2004. 4[Whi12] Tom White.

Hadoop: The Deﬁnitive Guide . 2012. 216ZCF +

10] Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. Spark:Cluster Computing with Working Sets.