[PDF] Work-Optimal Parallel Minimum Cuts for Non-Sparse Graphs

Abstract

We present the first work-optimal polylogarithmic-depth parallel algorithm for the minimum cut problem on non-sparse graphs. For m\geq n^{1+\epsilon} for any constant \epsilon>0, our algorithm requires O(m \log n) work and O(\log^3 n) depth and succeeds with high probability. Its work matches the best O(m \log n) runtime for sequential algorithms [MN STOC 2020, GMW SOSA 2021]. This improves the previous best work by Geissmann and Gianinazzi [SPAA 2018] by O(\log^3 n) factor, while matching the depth of their algorithm. To do this, we design a work-efficient approximation algorithm and parallelize the recent sequential algorithms [MN STOC 2020; GMW SOSA 2021] that exploit a connection between 2-respecting minimum cuts and 2-dimensional orthogonal range searching.

Full PDF

aa r X i v : . [ c s . D S ] F e b Work-Optimal Parallel Minimum Cuts for Non-Sparse Graphs

Andr´es L´opez-Mart´ınez ∗ Sagnik Mukhopadhyay † Danupon Nanongkai ‡ Abstract

We present the ﬁrst work-optimal polylogarithmic-depth parallel algorithm for the minimumcut problem on non-sparse graphs. For m ≥ n ǫ for any constant ǫ > O ( m log n ) work and O (log n ) depth and succeeds with high probability. Its work matchesthe best O ( m log n ) runtime for sequential algorithms [MN STOC’20; GMW SOSA’21]. Thisimproves the previous best work by Geissmann and Gianinazzi [SPAA’18] by O (log n ) factor,while matching the depth of their algorithm. To do this, we design a work-eﬃcient approximationalgorithm and parallelize the recent sequential algorithms [MN STOC’21; GMW SOSA’21] thatexploit a connection between 2-respecting minimum cuts and 2-dimensional orthogonal rangesearching. ∗ KTH Royal Institute of technology, Sweden, [email protected] † KTH Royal Institute of technology, Sweden, [email protected] ‡ KTH Royal Institute of technology, Sweden, [email protected] i ontents A Proof of Lemma 4.9 21 ii Introduction

Computing the minimum cut, or min-cut, is a fundamental graph problem. Given a weightedundirected graph G = ( V, E ), a cut is a set of edges whose removal disconnects G . The min-cutproblem is to ﬁnd the cut with minimum total edge weight. Throughout, we let n = | V | and m = | E | .Unless stated otherwise, all algorithms are randomized and succeed with high probability. In the sequential setting, nearly-linear time algorithms were known since the breakthrough workof Karger [Kar00]. His algorithm requires O (cid:16) m (log n ) log( n /m )log log n + n log n (cid:17) time. This bound hasbeen recently improved to O ( m (log n ) / log log n + n log n ) [GMW19, MN20]. By simplifying theframework of [MN20], this was improved to O ( m (log n ) /ǫ + n ǫ (log n ) /ǫ + n log n ) for any ǫ > m = n , this bound is the best inthe sequential setting. (Note that better bounds exist when the input graph is very sparse or isunweighted and simple [GMW19, HRW17, GNT20]. These cases are relevant to our results. )When it comes to parallel algorithms, no algorithm with nearly-linear work and polylogarithmicdepth was known until the recent result by Geissmann and Gianinazzi [GG18], where they obtainan algorithm with O ( m log n ) work and O (log n ) depth. The work of this algorithm is higherthan that by Karger’s sequential algorithm by an Ω(log n ) factor, and it was left open in [GG18]whether a work-optimal algorithm with polylogarithmic depth exists. Our results.

We present the ﬁrst work-optimal polylogarithmic-depth parallel algorithm for theminimum cut problem on non-sparse graphs. For any ǫ ≥ / log n , our algorithm requires O (log n )depth while its work is O (cid:18) m log nǫ + n ǫ log nǫ + n log n (cid:19) For non-sparse graphs ( m ≥ cn log n log log n for some large constant c ), the work of our algorithmmatches the best O ( m (log n ) /ǫ + n ǫ (log n ) /ǫ + n log n ) runtime for sequential algorithms[MN20, GMW20]. When m = n , the work of our algorithm can be simpliﬁed to O ( m log n )and improves the previous best work by Geissmann and Gianinazzi [GG18] by an Ω(log n ) factorwhile matching the depth of their algorithm. Remark:

Concurrently and independently from this paper, [AB21] recently achieved a parallelalgorithm with O (log n ) depth and O ( m log n ) work by parallelizing the sequential algorithm of[GMW19]. The work of this algorithm is smaller than ours for sparse graphs ( m = O ( n log n )).This is work-optimal when m = O ( n log n ) (but not when m = ω ( n log n )). Table 1 comparesour result with other results.To achieve our result, one challenge is to ﬁrst solve the problem approximately . This is crucialbecause all known nearly-linear time sequential algorithms require to compute the so-called skeleton and to compute the skeleton we need an O (1)-approximate value of the min-cut. While a linear-time(2+ ǫ )-approximation algorithm was known in the sequential setting [Mat93], no O (1)-approximationalgorithm that requires polylogarithmic depth and work less than the exact algorithm of [GG18] We say that an algorithm succeeds with high probability (w.h.p.) if it outputs a correct answer with probabilityat least 1 − /n c for an arbitrarily large constant c . For sparse input graphs, the best bound of O ( m log n ) is due to [GMW19] (improving from Karger’s O ( m log n )bound). For simple graphs, the best bounds are O ( m log n ) and O ( m + n log n ) [GNT20]. This is because m (log n ) /ǫ + n ǫ (log n ) /ǫ > n log n when m ≥ cn log n log log n . Otherwise, m (log n ) /ǫ ≤ n log n implies that ǫ ≥ c (log log n ) / log( n ). But then n ǫ (log n ) /ǫ ≥ n c log log( n ) / log( n ) log n > n log n forlarge enough constant c (note that ǫ < When m = n log ( n ) g ( n ) for some growing function g , we have m log nǫ + n ε log nε + n log n < m log n bysetting ǫ to c (log log( g ( n )) / log n for some constant c . ource Work Remark [GG18] O ( m log n ) Old recordHere O ( m log n + n ǫ ) work-optimal on(for constant ǫ ) non-sparse graphs[AB21] O ( m log n ) work-optimal on(independent) sparse graphsTable 1: Bounds for randomized parallel algorithms computing the minimum cut with high proba-bility. All algorithms require O (log n ) depth.was known in the parallel setting. In this paper, we show a O (1)-approximation algorithm with O ( m log n + n log n ) work and O (log n ) depth. The algorithm can be modiﬁed to obtain a (1 + ε )-approximation of the min-cut (for any small constant ε ) without any change in the performanceguarantee. This algorithm might be of independent interest.Another bottleneck in the previous parallel algorithm of [GG18] is solving the so-called two-respecting cut problem , where the randomized algorithm of [GG18] requires O ( m log n ) work and O (log n ) depth. The work does not match the then-best time complexity of O ( m log n ) in thesequential setting [Kar00]. In this paper, we obtain a work-optimal algorithm for this problem.Our algorithm is deterministic and requires O ( m/ǫ + n ǫ (log n ) /ǫ + n log n ) work and O (log n )depth. Its work matches that by the sequential algorithm of [MN20, GMW20]. To do this, weparallelize the algorithm of [MN20] and its simpliﬁcation in [GMW20], which exploit a connectionbetween the 2-respecting min-cut problem and 2-dimensional orthogonal range searching. Organization.

We review the necessary prerequisites in Section 2. In Section 3, we provide theparallel algorithm for approximating min-cut in a weighted graph. Finally, in Section 4 we designthe parallel algorithm for computing the exact minimum cut in a weighted graph.

In this section, we introduce the model of computation and brieﬂy state the main ideas of Karger’s[Kar00] and Mukhopadhyay-Nanongkai’s [MN20] min-cut algorithms. Then we review two impor-tant concepts—graph skeletons and connectivity certiﬁcates—that are extremely useful for ouralgorithms.

We use the work-depth model [SV82, Ble96] (sometimes called work-span model [CLRS09] or work-time framework [J´92]) to design and analyze the theoretical performance of our algorithms. The work of an algorithm is deﬁned as the total number of operations used, similar to the time complexityin the sequential RAM model. And the depth is the length of the longest sequence of dependentoperations. We assume concurrent reads and writes are supported. By Brent’s scheduling theorem[Bre74], an algorithm with work W and depth D takes O ( W/p + D ) time when p processors areavailable. A parallel algorithm is work-optimal if it performs the same number of operations—upto a constant—as the best known sequential algorithm for the same problem. The basic idea behind Karger’s randomized algorithm is to exploit a connection between minimumcuts and greedy packings of spanning trees. 2 eﬁnition 2.1 ((Greedy) tree packing) . Let G = ( V, E ) be a weighted graph. A tree packing S of G is a multiset of spanning trees of G , where each edge e ∈ E is loaded with the total numberof trees containing e . We say that S = ( T , . . . , T k ) is a greedy tree packing if each T i is a minimalspanning tree with respect to the loads induced by { T , . . . , T i − } and no edge has load greaterthan the weight of that edge.Such a packing S of G has the important property that the minimum cut in G S . Therefore, if we try out trees from thepacking and, for each of them, ﬁnd the minimum cut that 2-respects it, we will eventually stumbleupon the minimum cut of G . To make this eﬃcient, Karger uses random sampling techniques toconstruct a sparse subgraph H of G on which a greedy tree packing can be computed in less time.The schematic description of the algorithm is given below. Algorithm 2.2

Schematic of Karger’s algorithm. Find a sparse subgraph H of G that preserves the min-cut with arbitraryprecision. Find a tree packing in H of weight O ( λ ′ ) = O (log n ), that w.h.p. containsa tree that 2-constrains the min-cut of H . For each tree in the packing, ﬁnd the smallest cut in G that 2-respects thetree.We refer to steps 1 and 2 from the schematic as the tree packing step , and to the problem of ﬁndinga 2-respecting min-cut in step 3 we refer as the cut-ﬁnding step . We now brieﬂy describe the algorithm of Mukhopadhyay and Nanongkai for serving the cut-ﬁndingstep.Let T be a spanning tree of the input graph G . The algorithm rests on the fact that the cutdetermined by tree edges e and f is unique, and consists of edges ( u, v ) ∈ G such that exactly oneof e and f belongs to the uv -path in T . Let cut ( e, f ) denote the value of such cut for any pair e, f ∈ T .Assume for now that T is a path. In matrix terms, the minimum 2-respecting cut problem canbe restated as ﬁnding the smallest element in the matrix M of dimension ( n − × ( n −

1) wherethe ( i, j )-th entry of M is determined by M [ i, j ] = cut ( e i , e j ), with e i and e j the i -th and the j -thedges of T , respectively. A key insight of Mukhopadhyay and Nanongkai is in observing that thematrix M satisﬁes a monotonicity property (see [MN20, Sec. 3.1] or [GMW20, Sec. 3.2] for moredetails). One can take advantage of this property to ﬁnd the minimum entry of M in time moreeﬃcient that inspecting up to O ( n ) entries in the worst case.This approach is generalized to handle any tree T by decomposing it into a set P of edge-disjoint paths [ST83], each of which satisﬁes a monotonicity property (when considering all othertree edges as collapsed). Observe that it can also happen that the two tree edges e and f withminimum value cut ( e, f ) each belong to a diﬀerent path in P . Hence, the algorithm must alsoconsider pairs of paths, of which there can be O ( n ) many. But another signiﬁcant contributionof Mukhopadhyay and Nanongkai is in observing that one does not need to consider every pair ofpaths P, Q ∈ P . They show that inspecting only a small subset of path pairs suﬃces to ﬁnd theminimum 2-respecting cut of T in G .On a very high level, the algorithm can be summarized into three steps as in the schematicbelow. 3 lgorithm 2.3 Schematic of MN’s algorithm. Decompose T into a set of edge-disjoint paths P . Compute the 2-respecting min-cut for single paths in P . Compute the 2-respecting min-cut among path-pairs.In Section 4.1 we formally argue that each step of the algorithm can be parallelized with optimalwork and low depth.

In this section we review useful sparsiﬁcation tools introduced by Karger [Kar94] and Nagamochiand Ibaraki [NI92b] that are used in both our exact and approximation min-cut algorithms.

Let G = ( V, E ) be an unweighted multigraph. A skeleton is deﬁned by Karger [Kar99, Kar94], asa subgraph of G on the same set of vertices, obtained by placing each edge e ∈ G in the skeletonindependently with probability p . If G is weighted, a skeleton can be constructed by placing eachedge e ∈ G in the skeleton with weight drawn from the binomial distribution with probability p and the number of trials the weight w ( e ) in G .An important result that rests at the core of our algorithms is the following: Theorem 2.4 ([Kar99, Kar94]) . Let G be a weighted graph with minimum cut λ and let p =3( d + 2)(log n ) / ( ε γλ ) where ε ≤ , γ ≤ and both ε and γ are Θ(1) . Then, a skeleton H of G constructed with probability p satisﬁes the following properties with high probability:1. The minimum cut in H has value λ ′ within (1 ± ε ) times its expected value pλ , which is λ ′ = O (log n/ε ) .2. The value of the minimum cut in G corresponds (under the same vertex partition) to a (1 ± ε ) times minimum cut of H . In simpler terms, by sampling edges of G with probability p = O (log /λ ), we obtain a graph H where (i) the min-cut is O (log n ), and (ii) the minimum cut of G corresponds to (under the samevertex partition) (1 ± ε )-times the minimum cut of H . A related concept of importance is the sparse k-connectivity certiﬁcate . We use this in our algorithmsto bound the total size of a graph. Unlike a skeleton, where all cuts are approximately preserved,the k -connectivity certiﬁcate preserves cuts of values less than k exactly, but cuts of a higher valueare not preserved at all. Deﬁnition 2.5.

Given an unweighted graph G = ( V, E ), a sparse k-connectivity certiﬁcate is asubgraph H of G with the properties:1. H has at most kn edges, and2. H contains all edges crossing cuts of value k or less. This statement hides the fact that the approximation is also scaled by the inverse of p . For more details see[Kar94, Lemma 6.3.2] w with a set of w unweighted edges with the same endpoints. Hence, the bound in sizebecomes a bound on the total weight of the remaining edges.There is one simple algorithm by Nagamochi and Ibaraki [NI92b, NI92a] (that is enough for ourpurposes), which computes a sparse k -connectivity certiﬁcate of a graph G as follows. Computea spanning forest F in G ; then compute a spanning forest F in G − F ; and so on, continuecomputing spanning forests F i in G − ∪ ij =1 F j until F k is computed. It is easy to see that the graph H k = ∪ ki =1 F i is k -connected and has O ( kn ) edges.One can extend the algorithm onto the parallel setting by using e.g. the O ( m + n ) work and O (log n ) depth randomized algorithm of Halperin and Zwick [HZ01] to ﬁnd a spanning forest, fromwhich the following bounds are obtained (see e.g. [LM20, Sec. 2.3]). Theorem 2.6.

Given an undirected weighted graph G = ( V, E ) , a k -connectivity certiﬁcate can befound w.h.p. using O ( k ( m + n )) work and O ( k log n ) depth. We prove the following theorem:

Theorem 3.1. An (1 ± ε ) -approximation of the minimum cut value of G can be computed withwork O ( m log n + n log n ) and depth O (log n ) . For simplicity of the exposition, we prove this theorem when ε = 1 /

3. However, it is not hardto adapt the proof for any small constant ε . Throughout this section, we switch between tworepresentations of G = ( V, E ): (i) as a weighted graph, and (ii) as an unweighted multigraph wherewe replace an edge e = ( u, v ) ∈ E with weight w ( e ) by w ( e ) many unweighted copies of e betweenthe vertices u and v . Whenever we use the term copies of an edge , we denote the unweightedmultigraph representation, and for referring to the original edges of the weighted graph G , we usethe term weighted edges . By sampling a weighted edge e with probability p , we mean sampling eachof the w ( e ) many unweighted copies of e with probability p independently. We use the followingversion of a concentration bound. Lemma 3.2 (Concentration bound) . Let X , · · · , X k are iid boolean random variables and X = P i X i . Let us denote E [ X ] = µ . Then for any ε ∈ (0 , , Pr[

X / ∈ (1 ± ε ) µ ] ≤ · exp( − µ · ε / . Consider the following family of graphs generated from G by repeated sub-sampling of theunweighted copies of the edges of G . Deﬁnition 3.3 (Sampled hierarchy) . Let G = ( V, E ) is a weighted graph with total edge weight W , and let k is an integer such that 2 k = W . For any i ∈ { , · · · , k } , deﬁne G i to be a subgraph of G i − where G i includes every unweighted edge of G i − independently with probability 1 /

2. Also,deﬁne G = G viewed as an unweighted multigraph. We denote the family { G i } i ∈{ , ··· ,k } to be the sampled hierarchy of G .For G with min-cut value λ , if we sample edges with probability p , then with probability 1 − o (1)we know that the resulting sampled graph will have min-cut value in the range pλ (1 ± ε ) (giventhat pλ = Ω(log n )). We deﬁne the skeleton sampling probability in the following way: Deﬁnition 3.4.

For a graph G = ( V, E ) with min-cut value λ , deﬁne the skeleton samplingprobability p s = 100 log n/λ . 5 eﬁnition 3.5 (Skeleton layer) . Given a sampled hierarchy, we denote the skeleton layer as layer s such that 2 − s = p s .Note that, at this point, we do not know that value of λ of G , but we do know the followingfacts which follow from standard concentration argument (Lemma 3.2, setting ε = 1 / ε = 1 / / Claim 3.6.

If we sample the edges of G with probability p s , then the value of the min-cut in G s isbetween [75 log n,

125 log n ] with high probability. Claim 3.7.

If we sample edges of G with probability at least p s , then the value of the min-cut inthe sampled graph is at least

160 log n . Similarly, if we sample with probability at most p s / , thenthe value of the min-cut in the sampled graph is at most

67 log n . A word of caution regarding the proof of Claim 3.7: For really small sampling probability, wemay need to use the additive form of concentration bound which says the following: For any ε >

X > µ + ε ] ≤ exp( − ε /k ) , where we use the same notation as that in Lemma 3.2. Hence, if we can compute the min-cut valuein every G i in the sampled hierarchy, we can ﬁnd out a (1 ± /

4) approximation of the skeletonprobability p s and, thereby, a (1 ± / λ . In the rest of this section, we elaborateon how to ﬁnd these min-cut values. Of course, computing minimum cut na¨ıvely in each G i is not work eﬃcient. Towards designing aneﬃcient algorithm, we ﬁrst deﬁne the critical layer in the hierarchy for every weighted edge e . Deﬁnition 3.8 (Critical layer) . For a weighted edge e ∈ E ( G ) with weight w ( e ), deﬁne the criticallayer t e in the sampled hierarchy w.r.t. the edge e as the largest integer such that w ( e )2 t e ≥

500 log n. Given the notion of the critical layer , we modify the sampled hierarchy in the following way:

Deﬁnition 3.9 (Truncated hierarchy) . Given a graph G = ( V, E ) and its corresponding sampledhierarchy { G i } i , we obtain the truncated hierarchy by removing all unweighted copies of any edge e other than what is already present in G t e from the layers G i , i < t e . We denote this hierarchy as { G trunc i } i .The intuition behind truncating the sampled hierarchy in this way is the following: In the graph G s of the sampled hierarchy (where s is the skeleton layer), we know that the min-cut value is atmost 125 log n (Claim 3.6) and, hence, the number of unweighted copies of any weighted edge e participating in the min-cut of G s is at most 125 log n . Hence it should not be a problem to removeany extra unweighted copies of e from the hierarchy as they are useless as long as the min-cut of G s is concerned. We make this intuition concrete in the next section. First, we bound the number of unweighted copies of any edge e available in the truncated hierarchy.Note that, because of our deﬁnition of the critical layer, the expected number of copies of an edge e in layer t e is 500 log n . The following claim follows from standard concentration argument (Lemma3.2, by setting ε = 1 / laim 3.10. The number of unweighted copies of any edge e in the critical layer t e of the truncatedhierarchy is between [400 log n,

600 log n ] with high probability. Note that we need a guarantee that the min-cut in G trunc s is well separated from the min-cutvalues of G trunc i ’s above and below the skeleton layer in order for us to ﬁnd out where the skeletonlayer is. The next three claims give us this guarantee. Claim 3.11.

For the skeleton layer s , the value of the min-cut in G trunc s is [75 log n,

125 log n ] withhigh probability.Proof. This follows from the facts that (i) in the sampled hierarchy { G i } i , the value of the min-cutin G s is in the same range (Claim 3.6), and (ii) for every edge e taking part in the min-cut, t e < s .This is because, for any such edge e , the number of unweighted copies of e in the min-cut of G s is at most 125 log n w.h.p. whereas, in the critical layer t s , the number of such copies is at least400 log n w.h.p.A similar argument also proves the following claim. Claim 3.12.

For layers i > s , the value of the min-cut in G trunc s is at most

67 log n . For layers numbered less than s , we have to be careful because we may have removed edges inthe truncated hierarchy. Nevertheless, we can prove the following claim which is enough for ourpurpose. Claim 3.13.

For layer i < s , the value of the min-cut in G trunc i is at least

160 log n .Proof. We can do an exactly similar argument for the layer s − G trunc s − is at least 160 log n . The claim follows from the fact that the min-cut value cannotdecrease as G trunc i ⊆ G trunc i − and the min-cut is a monotone function. Truncated (& Exclusive) hierarchy computation for every edge e in G do Compute the critical layer t e . for i = 0 to k do for all edge e in G with t e = i do Sample binomially from B ( w ( e ) , − i ). Let the value of the randomvariable be X . Include X many unweighted copies of e in G trunc i . if i > then Sample each edge of G trunc i − with probability 1 / Set ˆ G i − = ˆ G i − \ G trunc i . Claim 3.15.

The truncated hierarchy can be computed (by Algorithm 3.14) with at most O ( m log n ) work and O (log n ) depth.Proof. We assume that the random variables X can be sampled from their corresponding binomialdistribution in O (log n ) work. For every edge e , we need to sample unweighted copies binomially This follows from [KS88] where the authors show that a random variable from B ( p, N ) can be sampled in O ( Np + log N ) work and similar depth with high probability. For our purpose, this is O (log n ) as we sample only atthe critical layer.

7n layer t e . This requires O ( m log n ) amount of work across all layers of the truncated hierarchy. Inaddition, the total number of edges across the hierarchy is at most O ( m log n ). This is because, foreach edge e in G , at most O (log n ) many unweighted copies are present across the hierarchy withhigh probability. Hence the total work is O ( m log n ). The depth of the computation is bounded bythe depth of the hierarchy and the depth required to sample binomial random variables, which is O (log n ). Even after obtaining the truncated hierarchy, ﬁnding min-cut in every G trunc i can turn out to beexpensive. We get around this problem by constructing O (log n )-cut certiﬁcate for each G trunc i and ﬁnding min-cut on those certiﬁcates. This reduces the work to O ( n poly log n ) as each suchcertiﬁcate has only O ( n poly log n ) many edges. Deﬁnition 3.16 (Exclusive hierarchy) . Given a truncated hierarchy { G trunc i } i , we deﬁne the ex-clusive hierarchy { ˆ G i } i as follows: • ˆ G k = G trunc k , and • ˆ G i = G trunc i \ G trunc i +1 ,where G \ H for two graphs G and H on the same set of vertices includes edges that are exclusively present in G and are not in H .The exclusive hierarchy can be computed while computing the truncated hierarchy at no extracost (See Algorithm 3.14). For certiﬁcate construction, the main idea is to come up with certiﬁcate H i for each ˆ G i and use S ≥ i H i as a certiﬁcate for G trunc i . The following algorithm does exactlythat with strict budgeting on the number of times a weighted edge e participates in the certiﬁcatecomputation. Algorithm 3.17

Certificate hierarchy computation for every weighted edge e do Initialize count e to 400 log n . for i = k to 0 do Initialize sfcount = 0 and H i = ∅ . while sfcount ≤

200 log n or ˆ G i = ∅ do Remove from ˆ G i all copies of any edge e such that count e = 0. Find a spanning forest F of ˆ G i . For every edge e in ˆ G i , set count e = count e − Set ˆ G i = ˆ G i \ F , H i = H i ∪ F .Algorithm 3.17 ﬁnds cut-certiﬁcates starting from ˆ G k and moving upward in the exclusivehierarchy. In each iteration, it ﬁnds at most 200 log n many spanning forests which are counted bythe variable sfcount . Note that, for any weighted edge e , the associated value count e decreases ineach spanning forest F computation irrespective of whether any copy of e is included in F or not.The only way to stop the decrement of count e in any iteration i is to include all unweighted copiesof e present in ˆ G i in the certiﬁcate H i . This makes sure that each edge e participates (and is notnecessarily included) in at most count e many spanning forest computations. Claim 3.18.

For any i , S ≥ i H i is a

200 log n -cut-certiﬁcate for G i . roof. Consider any i and let C be a cut of value at most 200 log n in G i . We need to show thatthis cut is maintained in S ≥ i H i .Consider any weighted edge e ∈ C and its count e at the i -th iteration. If count e <

200 log n ,then it means that S >i H i already contains 200 log n unweighted edges from C . This can be arguedin the following way: Consider any spanning forest F in S >i H i , computing which count e decreased.Either F includes an unweighted copy of e , or F contains another unweighted edge crossing the cut C . In either case, F contains at least 1 edge from C . Hence, each decrement of count e correspondsto one unweighted edge crossing C included in S >i H i .If, on the other hand, count e ≥

200 log n , then this edge is going to take part in all 200 log n spanning forest computations in the i -th iteration. Hence H i will contain 200 log n edges from C . We next compute the work and depth requirement for computing the certiﬁcate hierarchy. Claim 3.19.

Given the hierarchy ˆ G i , the certiﬁcates { H i } i can be computed in work O ( m log n ) and depth O (log n ) with high probability.Proof. By the design of the algorithm, each edge takes part in at most 400 log n spanning forestcomputation. Hence the total work is bounded by 400 log n × m = O ( m log n ).Calculating the depth is straight-forward. Note that there are O (log n ) many layers in thecertiﬁcate hierarchy, each amounting to O (log n ) many spanning forest computation. We knowthat the depth required for computing a spanning forest is o (log n ) [HZ01]. The total work needed to compute min-cut on S ≥ i H i for all i is at most O ( n log n ) and the depth required is O (log n ) .Proof. First note that the number of edges in S ≥ i H i for any i is O ( n poly log n ). We can run eitherrun the the algorithm designed by [GG18] or the algorithm designed in Section 4 to obtain therequires work and depth. For the approximation of min-cut (which is required in both of these twoalgorithms), we can use the expected min-cut value in G i . If we use the algorithm in Section 4then, to compute O (log n ) many min-cuts—one for each S ≥ i H i —the work required is O ( n log n )and the depth required is O (log n ) as we compute the O (log n ) instances of min-cut parallelly.Using the algorithm of [GG18] gives worse dependence of poly log n in terms of work, but the depthremains O (log n ). In this section, we present our exact parallel minimum cut algorithm. Following Karger’s framework,the outline of the solution consists of (1) ﬁnding a tree packing and (2) for each tree in the packing,ﬁnd the corresponding 2-respecting min-cut. The tree computations in part (2) of the algorithmare independent of each other, hence they can be safely executed in parallel.Let W pack ( m, n ) and D pack ( m, n ) be the work and depth needed to ﬁnd an appropriate packingof O (log n ) trees, and let W rsp ( m, n ) and D rsp ( m, n ) be the work and depth required to ﬁnd aminimum 2-respecting cut of a spanning tree. The work and depth bounds of a parallel min-cutalgorithm can be expressed as: W cut ( m, n ) = O ( W pack ( m, n ) + W rsp ( m, n ) log n ) , (1) D cut ( m, n ) = O ( D pack ( m, n ) + D rsp ( m, n )) . (2)9n the following, we give parallel algorithms for both the tree-packing and 2-respecting steps.For the former, we consider our improved approximation algorithm of Section 3 in the contextof Karger’s original packing procedure [Kar00]. And for the latter, we rely on Mukhopadhyayand Nanongkai’s minimum 2-respecting cut algorithm as described in [GMW20]. Put together viaequations (1) and (2), our parallel algorithms imply the following overall bounds. Theorem 4.1.

The minimum cut in weighted graph can be computed w.h.p. using W cut ( m, n ) = O ( m log n + n log n ) work and D cut ( m, n ) = O (log n ) depth. In the following, we ﬁrst describe how to achieve this for general (weighted) graphs (Sections4.1 and 4.2). For non-sparse input graphs, in Section 4.3 we show how to further improve the workbound to O ( m log n + n ǫ ). We begin with the parallelization of Mukhopadhyay and Nanongkai’s simpliﬁed

Theorem 4.2.

Given a spanning tree T of a graph G , the minimum cut that 2-respects T can befound w.h.p. using O ( m log m + n log n ) work and O (log n ) depth. We now describe how to implement each step of MN’s algorithm (recall schematic from Section2.3) in parallel and obtain the claimed work and depth complexities. In the following, we call thequery that asks for the value cut ( e, f ) given tree edges e and f a cut query . The algorithm begins by partitioning the tree T into a collection of edge-disjoint paths P with thefollowing property: Property 4.3 (Path Partition) . Any root-to-leaf path in T intersects O (log n ) paths in P .Geissmann and Gianinazzi give a parallel algorithm [GG18] to compute such decomposition (aso-called bough decomposition ) which we can use as a black-box in our algorithm. Lemma 4.4 ([GG18, Lemma 7]) . A tree with n vertices can be decomposed w.h.p into a set ofedge-disjoint paths P satisfying Property 4.3 using O ( n log n ) work and O (log n ) depth. Next, we will need a data structure for the tree decomposition which, on a node query, providesthe set of paths in P which intersect the root-to-leaf path ending at that node. For this we givethe following lemma. Lemma 4.5.

Let P be a set of edge-disjoint paths obtained by bough decomposition of a tree T .Given a tree T of n nodes and root r , there is a data structure that can preprocess T with O ( n log n ) work and O (log n ) depth, and supports the following operation using O (log n ) work and depth: • Root-paths( u ) : given a node u , return an array of O (log n ) disjoint paths P ′ ⊆ P , such thatevery path p ∈ P ′ belongs to the same path from the root of T to node u .Proof. We ﬁrst show how to construct the data structure, and then analyze the query operation. This result is originally Las Vegas, but by an application of Markov’s inequality, it can easily be converted intoa Monte Carlo algorithm. reprocessing. We start by computing in parallel the postorder numbering of vertices in T viathe Eulerian circuit technique [J´92]. This produces at each node u the value post ( u ) containing itsrank in a postorder traversal of T . Next, we decompose tree T in boughs, or paths, as in Lemma4.4. This produces an array of arrays A , where the element A [ i ] consists of the sequence of edgesrepresenting a bough in T , arbitrarily indexed by i . Let e ∈ A [ i ], we create the value bough ( e ) = i to let e know which bough it belongs to. This can be done trivially in parallel for every edge andevery bough. The ﬁnal step in the preprocessing is to sort the edges e = ( u, p ( u )) in each bougharray A [ i ] with respect to post ( p ( v )) in descending order, such that edges at shallower levels in T appear ﬁrst in the array. Note that A [ i ][0] will contain the edge closest to the root in bough array i . Root-paths( u ). Given a node u , we can ﬁnd the desired boughs simply by walking up the treefrom u towards the root r , and keeping track of the boughs seen. Let e = ( u, p ( u )), where p ( u )denotes the parent of u . From the quantities deﬁned during the construction, this can be doneeﬃciently as follows:1. Initialize i = bough ( e ).2. Set ˆ e = A [ i ][0] and append i to the result list L .3. If ˆ e is distinct from the root r , repeat step 2 with i = bough ( p (ˆ e )).4. Return L . Analysis.

For preprocessing, the step with the highest cost is the tree decomposition of Theorem4.4, which uses O ( n log n ) work and O (log n ) depth. All other steps are within the same work anddepth bounds. The query algorithm performs only O (1) work for each bough it ﬁnds on the wayup from edge e to the root of the tree. From Theorem 4.4, there can be at most O (log n ) suchpaths, and the result follows.After decomposing the tree, the edges e, f ∈ T that minimize cut ( e, f ) can be distributed withinthe same path p ∈ P , or in two distinct paths p, q ∈ P . We now explain how to solve both caseseﬃciently in parallel. Let p be a path in P of length ℓ , and let M p be the ( ℓ − × ( ℓ −

1) matrix deﬁned by M p [ i, j ] = cut ( e i , e j ) with e i and e j the i -th and j -th edges of p . One key contribution of Mukhopadhyay andNanongkai is in observing that the matrix M p is a Partial Monge matrix. That is, for any i = j , itholds that M p [ i, j ] − M p [ i, j + 1] ≥ M p [ i + 1 , j ] − M p [ i + 1 , j + 1]. To ﬁnd the minimum entry in thematrix, MN give a divide-and-conquer algorithm that requires the computation of only O ( ℓ log ℓ )many cut queries. This algorithm can be shown to have a simple parallel implementation withoptimal O ( ℓ log ℓ · w c ( m )) work and O ( d c ( m ) · log ℓ + log ℓ ) depth, where w c ( m ) and d c ( m ) arethe work and depth required to compute a single cut query (see [LM20] for details).We can use instead an algorithm by Awarwall et al. [AKPS90, Thm. 2.3] that requires inspect-ing only O ( ℓ log ℓ ) many entries of a Partial Monge matrix using O (log ℓ ) depth. In Lemma 4.9below, we show that a single cut query can be computed with optimal w c ( m ) = O (log n ) workand d c ( m ) = O (log n ) depth. Thus, we can ﬁnd the minimum cut determined by two edges of p with O ( ℓ log ℓ · log n ) work and O (log ℓ · log n ) depth. Since paths in P are edge-disjoint, doingthis for every path p ∈ P in parallel telescopes to the following bounds.11 e fe ′ Figure 1: Example graph and spanning tree illustrating the concept of interest. Tree-edges arerepresented by solid edges, and non-tree edges are represented by dashed lines. The graph isunweighted and the tree is rooted at r . Observe that edge e is cross-interested in f , f is cross-interested in e , and e ′ is down-interested in f . Lemma 4.6.

Finding the minimum 2-respecting cut among all paths p ∈ P can be done in parallelwith O ( n log n ) work and O (log n ) depth. Now we look at the case when the tree edges e, f ∈ T that minimize cut ( e, f ) belong to diﬀerentpaths p, q ∈ P of combined length ℓ .First observe that collapsing tree edges e ′ p ∪ q produces a residual graph with p ∪ q as itsspanning tree, without changing the cut values determined by one edge being in p and another in q . We could then run a similar algorithm to the previous section and ﬁnd the smallest 2-respectingcut which respects one edge in p and another in q . Doing this for every possible path-pair in P eﬀectively serves to ﬁnd the tree edges e and f that minimize cut ( e, f )—but is not eﬃcient, as thenumber of possible path-pairs is O ( n ). Mukhopadhyay and Nanongkai solve this by showing thatone must only inspect a small subset of interested path pairs . (See Figure 1 for an illustration ofthe notion of interest .) Deﬁnition 4.7 (Interest [MN20, GMW20]) . Let T e denote the sub-tree of T rooted at the lowerendpoint (furthest from the root) of e , and let w ( T e , T f ) be the total weight of edges between T e and T f . In particular, we denote w ( T e ) = w ( T e , T \ T e ). Then:1. an edge e ∈ T is said to be cross-interested in an edge f T \ T e if w ( T e ) < w ( T e , T f ),2. an edge e ∈ T is said to be down-interested in an edge f ∈ T e if w ( T e ) < w ( T f , T \ T e ),3. an edge e ∈ T is said to be interested in a path p ∈ P if it is cross-interested or down-interestedin some edge of p ,4. two paths p, q ∈ P are said to be an interested path pair if p has an edge interested in q andvice versa. Claim 4.8 ([MN20, GMW20]) . For any tree edge e :1. all the edges that e is cross-interested in consist of a single path in T going down from theroot of T up to some node c e .2. all the edges that e is down-interested in consist of a single path in T going down from e upto some node d e . Observe that, from Property 4.3 and Claim 4.8, an edge is interested in at most O (log n ) pathsfrom P . Hence, the total number of interested path pairs is O ( n log n ). Finding such a subset of12aths is one of the main insights of Gawrichowski, Mozes, and Wiemann’s simpliﬁcation [GMW20]of the original MN’s algorithm. They do this through a centroid decomposition of the tree T which,for every edge e ∈ T , serves to guide the search for the nodes c e and d e that delimit the paths onwhich an edge can be interested in. Once these nodes have been identiﬁed, using the data structureof Lemma 4.5 one may ﬁnd all interested edge pairs.Before looking at the parallel implementation of such procedure, however, we give the followinglemma to eﬃciently determine in parallel whether an edge e is interested in another edge f . Thisalso serves to compute the cut query cut ( e, f ) with optimal work and low depth. Lemma 4.9.

Given a weighted graph G = ( V, E ) and a spanning tree T of G , there exists a datastructure that can be preprocessed in parallel with O ( m log m ) work and O (log n ) depth, and giventwo edges e and f , it can report the following with O (log n ) work and O (log n ) depth: (1) the value cut ( e, f ) , (2) whether e is cross-interested in f , and (3) whether e is down-interested in f .Proof. Proof in the appendix.

Finding interested path pairs

We now look at the parallel procedure to ﬁnd the subset ofinterested path pairs. To begin, we look at the deﬁnition of a centroid decomposition and itsparallel construction.

Deﬁnition 4.10 (Centroid) . Given a tree T , a node v ∈ T is a centroid if every connectedcomponent of T \ { v } consists of at most | T | / T is a binary tree. Otherwise,simply replace a node of degree d with a binary tree of size O ( d ), where internal edges have weight ∞ and edges incident to leaves preserve their original weight. It is easy to see that this can be donein parallel with O ( d ) work and O (log d ) depth by constructing each tree in a bottom-up fashion. Deﬁnition 4.11 (Centroid decomposition) . Given a tree T , its centroid decomposition consists ofanother tree T ′ deﬁned recursively as follows: • the root of T ′ is a centroid of T . • children of the root of T ′ are centroids of the sub-trees that result from removing the centroidof T . Lemma 4.12.

The centroid decomposition of a tree T of size n can be computed in parallel withoptimal O ( n log n ) work and O (log n ) depth.Proof. First, we compute sub-tree sizes for each node u ∈ T . In parallel, this can be done withoptimal O ( n ) work and O (log n ) depth by the application of the Eulerian circuit technique fromparallel graph algorithms [J´92]. To ﬁnd a centroid, we can simply test each node for the centroidproperty independently in parallel. This can be done with O ( n ) work and O (1) depth, as each nodereads the sub-tree sizes of at most 3 other nodes. Next, we aggregate the results in a bottom-upfashion and return any of the identiﬁed centroids. This requires O ( n ) work and O (log n ) depth.From recursing O (log n ) times (in parallel) and at each level of the recursion performing only O ( n )work and O (log n ) depth, the overall parallel complexity follows.As mentioned previously, to ﬁnd the set of interested path pairs, we identify the nodes c e and d e for every edge e ∈ T . Claim 4.13.

For every edge e ∈ T , we can identify the nodes c e and d e in parallel using O ( n log n ) work and O (log n ) depth. roof. Let e ′ be a tree edge, and let c be the centroid node of T , we obtain the node c e (resp. d e ) asfollows: (1) for the (up to) three edges e , e and e incident to c , check if edge e ′ is cross-interested(resp. down-interested) in each of them, (2) if e ′ is interested in e i , recurse on the subtree T e i , (3)else, return the current centroid node as c e ′ (resp. d e ′ ). The correctness of the algorithm rests onthe correctness of Claim 4.8.Using Lemma 4.9 we can check with O (log n ) work and O (log n ) depth whether edge e isinterested in e , e and e . And since the depth of the recursion is O (log n ) (as we assume T isbinary), the procedure takes O (log n ) work and O (log n ) depth for a single edge e . Since we mayuse execute this algorithm independently in parallel for each edge, the main claim follows frommultiplying the work by the number of edges n . Checking interested path pairs.

For each pair of interested paths p, q ∈ P , we want to deter-mine the list r = { e , . . . , e p } of edges of p that are interested in q and the list s = { f , . . . , f p } ofedges from q that are interested in p . Observe that since the number of interested path pairs isonly O ( n log n ), the total length of these lists is also O ( n log n ). Deﬁnition 4.14 (Interest tuples) . For a tree edge e ∈ p such that p ∈ P , given its terminal nodes c e and d e , we deﬁne its interest tuples T e as the list of tuples of the form ( p, q, e ) such that q ∈ P belongs to the unique path from c e to the root of T , and likewise for d e . Claim 4.15.

The list of interest tuples

Γ = S e ∈ T T e can be computed in parallel using O ( n log n ) work and O (log n ) depth.Proof. For an edge e ∈ T , given the nodes c e and d e we can identify the set of paths P e from P that e is cross- and down-interested in simply from issuing two Root-paths queries to the datastructure of Lemma 4.5 with nodes c e and d e . This requires O (log n ) work and depth per edge.From issuing all O ( n ) queries in parallel, we get the claimed complexity bounds. Preparing theinterest tuple ( p, q, e ) on each iteration accounts for only O (1) time. Lemma 4.16.

Given a sequence Γ of n tuples of the form ( A, B, x ) , with A and B drawn from atotally ordered set S , there exists a parallel algorithm that generates a sequence F of tuples of theform ( A, { x , . . . , x a } , B, { y , . . . , y b } ) with O ( n log n ) work and O (log n ) depth; such that for each x i the tuple ( A, B, x i ) appears in Γ , and for each y i the tuple ( B, A, y i ) appears in Γ .Proof. See Lemma 4.16 in [LM20].The application of Claim 4.15 together with Lemma 4.16 serves two purposes: (1) to identifythe set of interested pairs, and (2) for each such pair p, q ∈ P , to identify the lists r and s of edgesthat actively participate in the interest relation. Once we have a sequence of tuples F as in Lemma4.16, and thus, the set of all interested path pairs, we can proceed to ﬁnd the minimum 2-respectingcut as follows.Consider any tuple ( p, r, q, s ) ∈ F . Let | r | and | s | denote the sizes of lists r and s , respectively,and let ℓ be the combined size of the two. Collapsing tree edges e ′ r ∪ s produces a residual graphwith r ∪ s as its spanning tree, without changing the cut values determined by one edge being in p ′ and another in q ′ . Let M pq be the ( | r | − × ( | s | −

1) matrix deﬁned by M rs [ i, j ] = cut ( e i , e j )where e i is the i -th edge of r and e j the j -th edge of s . Similar to single-path case, Mukhopadhyayand Nanongkai observe that the matrix M rs is a Monge matrix (see e.g., [MN20, Claim 3.5] and[GMW20, Lemma 2] for details). That is, it satisﬁes that M rs [ i, j ] − M rs [ i, j + 1] ≥ M rs [ i + 1 , j ] − M rs [ i + 1 , j + 1] for any i, j (in contrast with i = j for single paths).A simpliﬁed variant of the divide-and-conquer algorithm discussed in Section 4.1.2 can be easilyparallelized with optimal O ( ℓ log ℓ · log n ) work and O (log ℓ · log n ) [LM20]. However, we can opt14or an algorithm with better work and use the parallel (and randomized) algorithm of Ramanand Vishkin [RV94] that inspects only a linear O ( ℓ ) number of entries of a Monge matrix with O (log ℓ ) depth. Now observe that each tuple f ∈ F may be processed independently in parallel.Using Lemma 4.9 for each entry inspection, and the fact that the P ℓ over all interested pairs is O ( n log n ), we obtain the following parallel bounds. Lemma 4.17.

Finding the minimum 2-respecting cut among all interested path pairs p, q ∈ P canbe done in parallel with O ( n log n ) work and O (log n ) depth. Putting together the results of Lemmas 4.5, 4.6, 4.9 and 4.17 we obtain the parallel boundsstated in Theorem 4.2.

Let G = ( V, E ) be an undirected weighted graph. Karger already showed [Kar00, Corollary 4.2]that computing an appropriate tree packing in parallel can be done with O ( m log n + n log n ) workand O (log n ) depth . Using this result as a black box in Karger’s tree packing framework, alongwith Theorem 4.2, one can easily derive a randomized parallel min-cut algorithm that requires only O ( m log n + n log n ) work and O (log n ) depth. This is already an improvement compared toGeissmann and Gianinazzi’s O ( m log n ) work algorithm when graphs are at least near-linear insize (that is, m = Ω( n log n )). However, the algorithm is not work-optimal in any setting.In this section we show that the work bound for packing an appropriate set of spanning treescan be improved in a O (log n ) factor. More precisely, we obtain the following: Theorem 4.18.

Given a weighted graph G = ( V, E ) , we can compute a packing S of O (log n ) spanning trees (by weight) such that, w.h.p. the minimum cut 2-respects at least one of the trees in S , using O ( m log n + n log n ) work and O (log n ) depth. Plugging the bounds of Theorem 4.18 and Theorem 4.2 into equations (1) and (2) of Section 4,we obtain the complexity claim of Theorem 4.1.Recall that Karger’s [Kar00] tree packing step consists of two phases: (1) sparsiﬁer phase,where we compute a skeleton graph H of our input graph containing fewer edges; and (2) packing phase, where we ﬁnd a set of size O (log n ) (by weight) of appropriate spanning trees in H . Thelatter phase is a classical packing algorithm of Plotkin, Shmoys, and Tardos [PST95, TK00, You95]and consists of a series of O (log n ) sequential minimum spanning tree (MST) computations. Thisalgorithm can be easily parallelized with O ( n log n ) work and O (log n ) depth from considering anappropriate MST parallel algorithm: e.g., Pettie and Ramachandran’s [PR02] O ( n ) optimal workand O (log n ) depth (randomized) algorithm. This leaves the sparsiﬁer phase as the sole bottleneckfor constructing an appropriate tree packing with better work complexity.In the following, we assume that we have computed a constant factor underestimate ˜ λ of the min-cut of G via our approximation algorithm of Theorem 3.1. Simply ﬁnd a (1 ± / λ ′ of the min-cut and set ˜ λ = λ ′ /

2. This accounts for the additive O ( n log n ) factor in the workbound of Theorem 4.18. Given a weighted graph G = ( V, E ) with minimum cut λ , and an error parameter ε , we areinterested in ﬁnding a sparse subgraph H = ( V, E ′ ) on the same vertices that satisﬁes the followingproperties with high probability: In Karger’s paper this result is stated in terms of parallel time and the number of processors that realize it. roperty 4.19. H has total weight O ( n log n/ε ). Property 4.20.

The minimum cut in H has value λ ′ = O (log n/ε ). Property 4.21. the value of the minimum cut in G corresponds (under the same vertex partition)to a (1 ± ε ) times minimum cut of H .In the sequential case, Karger constructs H by (i) ﬁnding a sparse connectivity certiﬁcate G ′ of G with O ( n log n ) total weight, and then (ii) building the skeleton H with bounded min-cut value O (log n ). In parallel, however, this does not work as there is no parallel algorithm to construct asparse connectivity certiﬁcate with complexity independent of the min-cut value λ . One way toremedy this parallel dependency on λ is to ﬁrst construct an appropriate skeleton of G to reducethe eﬀective minimum cut in the graph to O (log n ), and then construct a sparse k -connectivitycertiﬁcate with k = O (log n ) to bound the total edge weight to O ( n log n ).To make sampling eﬃcient, we make the following simple observation (also appearing in [BLS20]). Observation 4.22.

For a skeleton graph H of G satisfying Properties 4.20 and 4.21, the weightof each edge in H need not be greater than the maximum size of the minimum cut in H , thus O (log n/ε ) .Proof. From Theorem 2.4, the minimum cut in H will be at most (1 + ε ) times its expected valueˆ λ = O (log n/ε ) with high probability. Since edges with weight greater than ˆ λ never cross theminimum cut, capping their weight to a value greater than ˆ λ but still O (log n/ε )—e.g., ⌈ c + ˆ λ ⌉ forsome constant c ≥ H , and Properties 4.20 and 4.21 thatonly depend on such min-cut being approximately preserved, are still satisﬁed.Recall that we can build the weighted skeleton G ′ of a graph G by letting edge e in G ′ have weightdrawn from the binomial distribution with probability p and the number of trials weight w ( e ) from G . Using inverse transform sampling [Fis79, Fis13] and Observation 4.22, we can avoid samplingvalues greater than the maximum size of the min-cut in G ′ , which is O (log n ). Thus, the weight ofeach edge e ∈ G ′ can be obtained using only O (log n ) sequential work. From sampling independentlyin parallel for every e ∈ G , and then applying Theorem 2.6 for the certiﬁcate construction, we obtainthe following lemma. Lemma 4.23.

Given a weighted graph G = ( V, E ) , an error parameter ε > , and a constantunderestimate of the min-cut ˜ λ , using O ( m log n/ε ) work and O (log n/ε ) depth, we can constructa sparse sub-graph graph H = ( V, E ′ ) satisfying properties 4.19, 4.20 and 4.21. The correctness follows from using an appropriate underestimate ˜ λ of the min-cut which, byTheorem 2.4, produces a skeleton graph satisfying properties 4.20 and 4.21. Property 4.19 issatisﬁed by the deﬁnition of sparse connectivity certiﬁcate. In this section we further improve the work bound of Theorem 4.1 for non-sparse input graphs;that is, when m = n . We need the following 1- and 2-dimensional data structures: Lemma 4.24.

For any ǫ > , given a set S of m weighted points in [ n ] , there is a parallel datastructure that preprocesses S with O ( m/ǫ ) work and O (log n ) depth, and reports the total weight ofall points in any given interval [ x , x ] in parallel with O ( n ǫ /ǫ ) work and O (log n ) depth.Proof. We ﬁrst show how to construct the data structure, and then analyze the query operation.16 reprocessing.

First we sort the points of S in ascending order with O ( m ) work and O (log n )depth using a parallel radix sort algorithm [Ble96]. Next, we construct a complete tree C on S withdegree n ǫ , where the leaves store the points of S in sorted order, and every inner node u stores thetuple ( key ( u ) , key ( v )) where u and v are, respectively, the leftmost and rightmost leaf nodes in thesub-tree rooted at u . The tree can be constructed in parallel in a bottom-up (or up-sweep) fashionstarting at the leaves using O ( n ) work and O (1 /ǫ ) < O (log n ) depth .Each internal node u stores the value W ( u ), consisting of the total weight of leaves in its sub-tree C u . This can be computed in an up-sweep fashion with O ( n ǫ ) work and O ( ǫ · log n ) depth.Hence, at each depth of the tree C , a total of O ( m ) work and O ( ǫ · log n ) depth is done. And sincethere are a total of O (1 /ǫ ) levels in C , then the whole construction takes O ( m/ǫ ) work and O (log n )depth. Query.

For an internal node u , the idea is to inspect each of its O ( n ǫ ) possible children indepen-dently in parallel and aggregate their weights in an up-sweep fashion. What we want is to identifythe left u ℓ and right u r children of u whose leaves (the leaves on their corresponding sub-trees) arenot entirely contained in the query interval [ x , x ], and to sum the total weight contributed byeach of the children in between. This can be done with O ( n ǫ ) work—as node u has up to O ( n ǫ )children—and O ( ǫ · log n ) depth. Now, each level must be processed sequentially, and the tree C has O (1 /ǫ ) depth. Hence, the total query work is O ( n ǫ /ǫ ) and the depth is O (log n ). Lemma 4.25.

For any ǫ > , given m ≥ n weighted points in the [ n ] × [ n ] grid, we can constructwith work O ( m/ǫ ) and depth O (log n ) a data structure that reports the total weight of all pointsin any given rectangle [ x , x ] × [ y , y ] with O ( n ǫ /ǫ ) work and O (log n ) depth.Proof. We show how to construct the data structure and then analyze the query operation.

Preprocessing.

Similar to the standard 2-dimensional range tree data structure [Ben79], we splitour construction into two parts: (1) constructing a main, or ﬁrst-level, x -coordinate tree T on S ;and (2) at the second-level, constructing y -coordinate auxiliary array A ( u ) and tree T aux ( u ) on theleaves spanned by the sub-tree rooted at u , for each internal node u ∈ T .1. First level: We begin by sorting the points in S in ascending order along their x -coordinatesusing a parallel radix sort algorithm [Ble96] with O ( m ) work and O (log n ) depth. Next, weconstruct the complete O ( n ǫ )-degree tree T in parallel as in the one-dimensional case using O ( m/ǫ ) work and O (log n ) depth.2. Second level: We break this step into: (i) the construction of auxiliary arrays A ( u ), and (ii)the construction of the auxiliary trees T aux ( u ), for each node u ∈ T .(a) Auxiliary arrays: For each node u ∈ T , we want an array A aux ( u ) that contains the leafelements of the sub-tree rooted at u sorted by y-coordinate. For an internal node v , thiscan be obtained from combining the sorted arrays of its children { A ( w ) | w ∈ children ( v ) } .One way to do this is to use radix sort on the union of elements in such arrays. Thistakes linear work and logarithmic depth on the total size of the arrays. Now, observethat at any given level of T the total size of auxiliary arrays is m . Hence, processingthe nodes at each level of the tree independently in parallel accounts for O ( m ) work and O (log n ) depth. And since there are O (1 /ǫ ) levels in the tree, then the construction ofthe auxiliary arrays requires total O ( m/ǫ ) work and O (log n/ǫ ) < O (log n ) depth . This is true because the degree n ǫ must be greater than or equal to 2 to be valid, hence ǫ > / log n and1 /ǫ ≤ log n . u ∈ T , we want to build a one-dimensional structure T aux ( u ) as in Lemma 4.24 for the leaf elements of the sub-tree rooted at u , sorted by y -coordinate. With the arrays A aux ( u ) from the previous step, it is easy to constructsuch trees in parallel by processing the nodes at each level of the main tree T simulta-neously. From Lemma 4.24, a tree T aux ( u ) can be constructed with O ( | A aux | ) work and O (log | A aux | ) depth. And at each level of T , a total of O ( m ) work is performed, hence O ( m/ǫ ) overall. On the other hand, the total depth telescopes to O (log n/ǫ ), which webound as O (log n ). Query.

Consider a query rectangle [ x , x ] × [ y , y ]. The query proceeds by searching for x and x in the main tree T , and identify in the process the set V x of internal nodes u whosecorresponding leaves (determined by the sub-tree rooted at u ) are entirely contained in the interval[ x , x ] (and whose parent nodes do not have all their leaves in the interval). From this set, the2-dimensional query is reduced to performing 1-dimensional queries (with the interval [ y , y ]) tothe auxiliary structures T aux at each node in V x , and adding up the results. Let S ( u ) denote theleaf elements in the sub-tree rooted at u . Then we are interested in ﬁnding the set V x = { v ∈ T | S ( v ) ∈ S ∩ [ x , x ] and S ( p ( v )) S ∩ [ x , x ] } , where p ( u ) denotes the parent node of u .To ﬁnd the set V x in parallel, we inspect each child of an internal node u independently in paralleland test whether it is entirely contained in the interval [ x , x ] or not. In an up-sweep fashion, weidentify the ”bounding” children such that all the children in between are fully contained in [ x , x ],and we add such ”bounded” children to V x . This requires O ( n ǫ ) work and O ( ǫ · log n ) depth. Wethen proceed the search recursively on the two ”bounding” children performing the same work anddepth per level of the tree, up until reaching the leaves of T . Since there are O (1 /ǫ ) levels in thetree, we have total O ( n ǫ /ǫ ) work and O (log n ) depth. Observe that at each level of the tree, atmost O ( n ǫ ) nodes are added to the set V x , hence the set has cardinality at most O ( n ǫ /ǫ ).Now, for each element on V x we perform a 1-dimensional query with the interval [ y , y ] as inLemma 4.24 and we aggregate (sum) the results in a bottom-up fashion. Performing the O ( n ǫ /ǫ )one-dimensional queries takes total O ( n ǫ /ǫ ) work and O (log n ) depth. And the ﬁnal sum simplytakes linear work and O (log n ) depth, from which the claimed complexity follows.If we replace the parallel geometric data structure of Lemma 4.9 with the 2-dimensional datastructure of Lemma 4.25 we obtain a parallel 2-respecting min-cut algorithm that has O ( m/ǫ + n ǫ (log n ) /ǫ + n log n ) work and O (log n ) depth. Plugging these bounds into equations (1) and(2), along with the bounds of Theorem 4.18, results in our main theorem. Theorem 4.26.

Given a weighted graph G = ( V, E ) the minimum cut can be computed in parallelw.h.p. using O ( m (log n ) /ǫ + n ǫ (log n ) /ǫ + n log n ) work and O (log n ) depth, for any ﬁxed ǫ > . By readjusting the parameter ǫ and suppressing constant scaling factors, we can obtain a workbound of O ( m log n + n ǫ ) for any constant ǫ > c (log log n ) / log n for big enough c . If m = n ,this can be simpliﬁed to O ( m log n ) work. References [AB21] Daniel Anderson and Guy E. Blelloch. Parallel minimum cuts in o(m log2 (n)) workand low depth.

CoRR , abs/2102.05301, 2021.[AKPS90] A. Aggarwal, D. Kravets, J. Park, and S. Sen. Parallel searching in generalized mongearrays with applications. In

Proceedings of the Second Annual ACM Symposium on arallel Algorithms and Architectures , SPAA ’90, page 259–268, New York, NY, USA,1990. Association for Computing Machinery.[Ben79] Jon Louis Bentley. Decomposable searching problems. Information Processing Letters ,8(5):244 – 251, 1979.[Ble96] Guy E. Blelloch. Programming parallel algorithms.

Commun. ACM , 39:85–97, 03 1996.[BLS20] Nalin Bhardwaj, Antonio J. Molina Lovett, and Bryce Sandlund. A Simple Algorithm forMinimum Cuts in Near-Linear Time. In . Schloss Dagstuhl-Leibniz-Zentrum f¨ur Informatik,2020.[Bre74] Richard P. Brent. The parallel evaluation of general arithmetic expressions.

J. ACM ,21(2):201–206, April 1974.[CLRS09] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Cliﬀord Stein.

Introduc-tion to algorithms . MIT press, 2009.[Fis79] George S Fishman. Sampling from the binomial distribution on a computer.

Journal ofthe American Statistical Association , 74(366a):418–423, 1979.[Fis13] George S Fishman.

Discrete-event simulation: modeling, programming, and analysis .Springer Science & Business Media, 2013.[GG18] Barbara Geissmann and Lukas Gianinazzi. Parallel minimum cuts in near-linear workand low depth. In

Proceedings of the 30th ACM Symposium on Parallelism in Algorithmsand Architectures , SPAA ’18, page 1–11, New York, NY, USA, 2018. Association forComputing Machinery.[GMW19] Pawel Gawrychowski, Shay Mozes, and Oren Weimann. Minimum cut in o ( m/ log n )time. arXiv preprint arXiv:1911.01145 , 2019.[GMW20] Pawe l Gawrychowski, Shay Mozes, and Oren Weimann. A note on a recent algorithmfor minimum cut, 2020.[GMW21] Pawe l Gawrychowski, Shay Mozes, and Oren Weimann. A note on a recent algorithm forminimum cut. In Symposium on Simplicity in Algorithms (SOSA) , pages 74–79. SIAM,2021.[GNT20] Mohsen Ghaﬀari, Krzysztof Nowicki, and Mikkel Thorup. Faster algorithms for edgeconnectivity via random 2-out contractions. In

SODA . SIAM, 2020.[HRW17] Monika Henzinger, Satish Rao, and Di Wang. Local ﬂow partitioning for faster edgeconnectivity. In

SODA , pages 1919–1938. SIAM, 2017.[HZ01] Shay Halperin and Uri Zwick. Optimal randomized erew pram algorithms for ﬁndingspanning forests.

Journal of Algorithms , 39(1):1 – 46, 2001.[J´92] Joseph J´aJ´a.

An Introduction to Parallel Algorithms . Addison Wesley Longman Pub-lishing Co., Inc., USA, 1992.[Kar94] David R. Karger.

Random Sampling in Graph Optimization Problems . PhD thesis,Stanford University, Stanford, CA 94305, 1994.19Kar99] David R. Karger. Random sampling in cut, ﬂow, and network design problems.

Mathe-matics of Operations Research , 24(2):383–413, 1999.[Kar00] David R. Karger. Minimum cuts in near-linear time.

J. ACM , 47(1):46–76, 2000. an-nounced at STOC’96.[KS88] Voratas Kachitvichyanukul and Bruce W. Schmeiser. Binomial random variate genera-tion.

Commun. ACM , 31(2):216–222, February 1988.[LM20] Andr´es L´opez Mart´ınez. Parallel minimum cuts : An improved crew pram algorithm.Master’s thesis, KTH, School of Electrical Engineering and Computer Science (EECS),2020.[Mat93] David W Matula. A linear time 2+ ε approximation algorithm for edge connectivity. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms , pages500–504, 1993.[MN20] Sagnik Mukhopadhyay and Danupon Nanongkai. Weighted min-cut: Sequential, cut-query, and streaming algorithms. In

Proceedings of the 52nd Annual ACM SIGACTSymposium on Theory of Computing , STOC 2020, page 496–509, New York, NY, USA,2020. Association for Computing Machinery.[NI92a] Hiroshi Nagamochi and Toshihide Ibaraki. Computing edge-connectivity in multigraphsand capacitated graphs.

SIAM Journal on Discrete Mathematics , 5(1):54–66, 1992.[NI92b] Hiroshi Nagamochi and Toshihide Ibaraki. A linear-time algorithm for ﬁnding a sparsek-connected spanning subgraph of ak-connected graph.

Algorithmica , 7(1-6):583–596,1992.[PR02] Seth Pettie and Vijaya Ramachandran. A randomized time-work optimal parallel algo-rithm for ﬁnding a minimum spanning forest.

SIAM Journal on Computing , 31(6):1879–1895, 2002.[PST95] Serge A Plotkin, David B Shmoys, and ´Eva Tardos. Fast approximation algorithmsfor fractional packing and covering problems.

Mathematics of Operations Research ,20(2):257–301, 1995.[RV94] Rajeev Raman and Uzi Vishkin. Optimal randomized parallel algorithms for computingthe row maxima of a totally monotone matrix. In

Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’94, page 613–621, USA, 1994. Societyfor Industrial and Applied Mathematics.[ST83] Daniel D. Sleator and Robert Endre Tarjan. A data structure for dynamic trees.

Journalof Computer and System Sciences , 26(3):362 – 391, 1983.[SV82] Yossi Shiloach and Uzi Vishkin. An o (n2log n) parallel max-ﬂow algorithm.

Journal ofAlgorithms , 3(2):128–146, 1982.[TK00] Mikkel Thorup and David R Karger. Dynamic graph algorithms with applications. In

Scandinavian Workshop on Algorithm Theory , pages 1–9. Springer, 2000.[You95] Neal E Young. Randomized rounding without solving the linear program. In

Proceedingsof the sixth annual ACM-SIAM symposium on Discrete algorithms , volume 76, page 170.SIAM, 1995. 20

Proof of Lemma 4.9

Recall that T e denotes the sub-tree of T rooted at the lower endpoint (furthest from the root) of e ,and w ( T e , T f ) is the total weight of edges between T e and T f . In particular, w ( T e ) = w ( T e , T \ T e ).Let p ( u ) denote the parent of a node u in T . Given an edge e = ( u, p ( u )) we use u ↓ to denote theset of nodes in the tree T e .To prove the lemma, ﬁrst, we need the following data structure. Lemma A.1.

Given an weighted graph G = ( V, E ) , and a spanning tree T of G , there exists adata structure that can be preprocessed in parallel with O ( m log m ) work and O (log n ) depth, andsupports the following queries with O (log n ) work and O (log n ) depth: • cost( u ): given a node u ∈ V , compute w ( T e ) , with e = ( u, p ( u )) , • cross-cost( u, v ): given nodes u and v , compute w ( T e , T f ) , e = ( u, p ( u )) and f = ( v, p ( v )) • down-cost( u, v ): given edges u and v , compute w ( T e , V \ T f ) , e = ( u, p ( u )) and f = ( v, p ( v )) .Proof. We start by computing for each node u ∈ V its postorder number in T , denoted by post ( u ),in parallel with O ( n ) work and O (log n ) depth [J´92]. This produces at each node u its rank in apostorder traversal of T , denoted by post ( u ). Additionally, we determine for each node u ∈ V itsnumber of descendants, denoted by size ( u ), with O ( n ) work and O (log n ) depth [J´92].Let A be the array constructed by setting A [ post ( u )] = u for all u ∈ V in parallel with the samework and depth as before. From the deﬁnition of postorder traversal, we know that the postordertraversals of the children of a node u appear, from left to right, immediately before u in the array A . This means that the leftmost leaf v of the sub-tree rooted at u can be found size ( n ) positionsleft of u in array A ; i.e., post ( v ) = post ( u ) − size ( u ). We denote this position as start ( u ). Thus:1. ∀ u ∈ V , u ↓ deﬁnes the continuous range A [ start ( u ) : post ( u )],2. ∀ u ∈ V , V − u ↓ deﬁnes two continuous ranges A [0 : start ( u ) −

1] and A [ post ( u ) + 1 : n − x and y axes both labeled from 0 to n −

1. Supposethat graph G is represented as an array of edges A E . We create a new array A P in parallel, suchthat for each e i = ( u, v ) ∈ A E with weight w i we set A P [ i ] = ( post ( u ) , post ( v )) with the sameweight w i . Array A P represents points in the plane labeled by the postorder traversal of T . Weuse A P as input to construct a parallel 2-d range searching data structure D as in Lemma 4.25.Setting ǫ = log( n ), the preprocessing takes O ( m log n ) work and O (log n ) depth.From facts (1) and (2) above, at most two queries to D suﬃce to answer each of the cut queriesfrom the claim. Speciﬁcally, cost( u ): This query can be answered from the sum of two range addition queries to D withranges R = [ start ( u ) , post ( u )] × [0 , start ( u ) − ,R = [ start ( u ) , post ( u )] × [ post ( u ) + 1 , n − . cross-cost( u, v ): This query can be answered with one range addition query to D with range R = [ start ( v ) , post ( v )] × [ start ( u ) , post ( u )] . down-cost( u, v ): This query can be answered from the sum of two range addition queries to D with ranges R = [ start ( u ) , post ( u )] × [0 , start ( v ) − ,R = [ start ( u ) , post ( u )] × [ post ( v ) + 1 , n − .

21s for the parallel complexity, the preprocessing cost of this structure is O ( m log m ) work and O (log n ) depth, which mostly comes from the construction of D . The postorder numbering andpoint mapping only account for additional logarithmic depth and linear work. Accessing post and start can be done with constant work, hence the parallel cost of each query operation simply followsfrom the cost of the query in Lemma 4.25. Lemma A.2.

The data structure from Lemma A.1 supports weighted cut-queries of the form cut ( e, f ) with O (log n ) work and O (log n ) depth.Proof. Given edges e = ( u ′ , u ) and f = ( v ′ , v ), let u and v denote the descendants of u ′ and v ′ respectively. There are two possibilities for e and f : (1) they belong to diﬀerent subtrees in T , or (2)one edge is a descendant of the other. From the post and start indices created in the preprocessingstep of Lemma A.1, we can easily check which case is satisﬁed, since start ( v ) ≤ pos ( u ) ≤ pos ( v ) iﬀ u is in the sub-tree rooted at v . Then, we can compute the cut query as follows: cut ( e, f ) =  w ( T e ) + w ( T f ) − w ( T e , T f ) if e T f ∧ f T e w ( T e ) + w ( T f ) − w ( T e , T \ T f ) if e ∈ T f w ( T e ) + w ( T f ) − w ( T f , T \ T u ) if f ∈ T u where queries of the form w ( T x ) and w ( T x , T y ) can both be answered with O (log n ) work and O (log nn