[PDF] A Deterministic Parallel APSP Algorithm and its Applications

Abstract

In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has \tilde{O}(nm+(n/d)^3) work and \tilde{O}(d) depth for any depth parameter d\in [1,n]. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP'17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has \tilde{O}(nm+n^3/d^2) work and \tilde{O}(d) depth [Ullman and Yannakakis, SIAM J. Comput. '91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. One notable ingredient of our parallel APSP algorithm is a simple deterministic \tilde{O}(nm)-work \tilde{O}(d)-depth procedure for computing \tilde{O}(n/d)-size hitting sets of shortest d-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called d-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an \tilde{O}(nm)-time deterministic algorithm for finding a shortest negative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has been so far only achieved using a randomized algorithm [Orlin et al., Discret. Appl. Math. '18].

Full PDF

aa r X i v : . [ c s . D S ] J a n A Deterministic Parallel APSP Algorithm and its Applications

Adam Karczmarz ∗ and Piotr Sankowski † Institute of Informatics, University of Warsaw, Poland

Abstract

In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has e O ( nm + ( n/d ) ) work and e O ( d ) depth for anydepth parameter d ∈ [1 , n ] . To the best of our knowledge, such a trade-oﬀ has only beenpreviously described for the real-weighted single-source shortest paths problem using random-ization [Bringmann et al., ICALP’17]. Moreover, our result improves upon the parallelism ofthe state-of-the-art randomized parallel algorithm for computing transitive closure, which has e O ( nm + n /d ) work and e O ( d ) depth [Ullman and Yannakakis, SIAM J. Comput. ’91].Our APSP algorithm turns out to be a powerful tool for designing eﬃcient planar graphalgorithms in both parallel and sequential regimes. By suitably adjusting the depth parameter d and applying known techniques, we obtain:(1) nearly work-eﬃcient e O ( n / ) -depth parallel algorithms for the real-weighted single-sourceshortest paths problem and ﬁnding a bipartite perfect matching in a planar graph,(2) an e O ( n / ) -time sequential strongly polynomial algorithm for computing a minimum meancycle or a minimum cost-to-time-ratio cycle of a planar graph,(3) a slightly faster algorithm for computing so-called external dense distance graphs of allpieces of a recursive decomposition of a planar graph.One notable ingredient of our parallel APSP algorithm is a simple deterministic e O ( nm ) -work e O ( d ) -depth procedure for computing e O ( n/d ) -size hitting sets of shortest d -hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets havealso been called d -hub sets. Hub sets have previously proved especially useful in designing paral-lel or dynamic shortest paths algorithms and are typically obtained via random sampling. Ourprocedure implies, for example, an e O ( nm ) -time deterministic algorithm for ﬁnding a shortestnegative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has beenso far only achieved using a randomized algorithm [Orlin et al., Discret. Appl. Math. ’18]. ∗ [email protected] . Supported by ERC Consolidator Grant 772346 TUgbOAT, the Polish NationalScience Centre 2018/29/N/ST6/00757 grant, and by the Foundation for Polish Science (FNP) via the START pro-gramme. † [email protected] . Supported by ERC Consolidator Grant 772346 TUgbOAT. Introduction

The all-pairs shortest paths problem (APSP) is one of the most fundamental graph problems. Ithas been studied in numerous variants, for many computational models and graph classes. In thispaper we study the APSP problem on real-weighted, possibly sparse graphs in the parallel setting.The eﬃciency of a parallel algorithm is usually characterized using the notions of work and depth (also called span or time ). The work is the total number of primitive operations performed.The depth is the longest chain of sequential dependencies between these operations. A parallelalgorithm of work W ( n ) and depth D ( n ) (where n is the problem size) can be generally scheduledto run in e O ( D ( n )) time using e O ( W ( n ) /D ( n )) processors [10], where the e O ( · ) notation suppresses O (polylog n ) factors. The quantity W ( n ) /D ( n ) is often called the parallelism of a parallel algorithm.An algorithm is called nearly work-eﬃcient if W ( n ) = e O ( T ( n )) , where T ( n ) is the best known timebound needed to solve the problem using a sequential algorithm.There exists a very simple folklore parallel algorithm for the APSP problem via repeatedlysquaring the weighted adjacency matrix using min-plus product. It has e O ( n ) work and O (polylog n ) depth. This can be slightly tweaked to obtain polylogarithmic-factor improvement in work anddepth [25]. However, these algorithms are nearly work-eﬃcient only for dense graphs since the bestknown sequential algorithms run in e O ( nm ) time [30, 54]. Hence, we are missing APSP algorithmscompetitive with the state-of-the-art sequential algorithms even for moderately dense graphs.Dealing with sparser graphs in the parallel setting turns out to be a much more challenging taskeven for an easier problem of computing the transitive closure. To the best of our knowledge, thestate-of-the-art for parallel transitive closure for sparse graphs is the classical tradeoﬀ of Ullman andYannakakis [60]. They showed a Monte Carlo randomized parallel algorithm with e O ( nm + n /d ) work and e O ( d ) depth for any parameter d ∈ [1 , n ] . Hence, their algorithm is nearly work-eﬃcientfor d = e Ω( n/ √ m ) and can achieve parallelism of order e O ( m / ) while being nearly work-eﬃcient. Our main result is a deterministic parallel all-pairs shortest paths algorithm for real-weighted di-rected graphs that improves upon the 30-year-old randomized transitive closure trade-oﬀ of [60].

Theorem 1.

Let G be a real-weighted digraph. For any d ∈ [1 , n ] , there exists a deterministicparallel algorithm computing all-pairs shortest paths in G with e O ( nm + ( n/d ) ) work and e O ( d ) depth. Observe that our algorithm is nearly work-eﬃcient for d = Ω( n / /m / ) , which is Ω( n / ) forsparse graph. As a result, as long as the number of used processors is p = e O ( n / m / ) , we cancompute all-pairs shortest paths in e O ( nm/p ) parallel time. To the best of our knowledge, such atradeoﬀ for real-weighted digraphs has only been achieved for single-source shortest paths (SSSP)and negative cycle detection problems [11]. Both of these results require randomization, whereasour algorithm is deterministic. Bringmann et al. [11] also show a deterministic variant of theiralgorithm with work e O ( nmd + ( n/d ) ) that is not nearly work-eﬃcient unless the graph is dense. In the aforementioned previous results [11, 60], randomization is used only for computing asmall subset of V that is, roughly speaking, guaranteed to contain some vertex of each “long”shortest path consisting of at least h hops. Following [33], we call such set a h -hub set of G (a Bringmann et al. [11, Theorem 19] mistakenly state the work of their algorithm to be e O ( nm + ( n/d ) + n d ) –even though in Section 4.1.2 they correctly bound the work in the ﬁrst step of their algorithm to be Θ( nmd ) . . A classical argument of Ullman and Yannakakis [60]shows that a randomly sampled Θ(( n/h ) · log n ) -size subset of V constitutes an h -hub set with highprobability.We show that for any h ∈ [1 , n ] , an h -hub set of size e O ( n/h ) can be computed deterministicallyusing e O ( nm ) work and e O ( h ) depth. Our procedure works in presence of negative edge weights andeven – to some extent – when negative cycles are allowed. The constructed hub set is guaranteed tohit all-pairs shortest h -hop paths (which are well-deﬁned regardless of whether the APSP problem isfeasible) unless the shortest (in terms of hops) negative cycle has at most h edges. As a by-product,we also obtain the following result in the sequential regime. Theorem 2.

Let G be a real-weighted directed graph. A shortest (in terms of hops) negative-weightcycle in G can be found deterministically in O ( nm log n ) time. So far, an e O ( nm ) bound for the shortest negative cycle problem has only been obtained usinga randomized algorithm by Orlin et al. [53] . The best known deterministic algorithms [57, 58]require e Ω( n ) worst-case time. One can easily argue that our bound is the best possible, upto polylogarithmic factors, unless an unexpected algorithmic breakthrough is made, that wouldimply progress for other core problems as well. In particular, the shortest negative cycle capturesthe unweighted directed girth problem (for which the trivial O ( nm ) bound stands for decades).Moreover, one should not hope for an O ( n − ǫ ) fast matrix multiplication-based algorithm since theshortest negative cycle problem also captures the negative triangle detection problem known to besubcubic-equivalent to the APSP problem [64].We believe that our procedure for computing hub sets might be useful in obtaining other deter-ministic sequential and parallel algorithms. As a direct application of our APSP algorithm one canobtain, e.g., more eﬃcient parallel algorithms for computing closeness centrality [4]. Moreover, asshown below, we can use it to improve algorithms for several planar graph problems. Theorem 1 can be used to highly parallelize theframework used by Fakcharoenphol and Rao [20] to obtain nearly optimal algorithms for twofundamental planar graph problems.

Theorem 3.

Let G be a real-weighted planar digraph. There exists a deterministic parallel algo-rithm for negative cycle detection and single-source shortest paths in G with e O ( n ) work and e O ( n / ) depth. By a well-known duality-based reduction [50], this also implies feasible ﬂow and bipartite per-fect matching algorithms with the same bounds. It is worth noting that even though there exist O (polylog n ) depth algorithms (i.e., belonging to NC class) for ﬁnding perfect matchings in bipar-tite planar graphs, they are very far from being work-eﬃcient [47, 50]. The same applies to the e O ( √ n ) -depth algorithm implied by the interior-point method-based result for general graphs [24].Since the s, t -max ﬂow problem on planar graphs with capacities [1 , C ] can be reduced to O (log C ) feasible ﬂow computations [50], this also yields a nearly work-eﬃcient e O ( n / ) -depth algorithm formaximum s, t -ﬂow with polynomially bounded capacities. Similar bounds can be obtained for the Zwick [66] uses the name bridging set for an analogous concept. Some works also use the term hitting set , buthitting set is a more general notion, which in our paper is used in multiple diﬀerent contexts. The algorithm of [53] runs in O ( nm log n ) expected time. However, if one aims at high-probability correctness,its running time is O ( nm log n ) which matches our bound. replacement paths problem using the recent reduction [16] to the all-edge shortest cyclesproblem (see Section 4.2 and Appendix A).To the best of our knowledge, the parallel complexity of the aforementioned problems on planargraphs has not achieved much attention in recent years. However, one can easily obtain near-optimal work and e O ( √ n ) -depth algorithm for these problems using the breakthrough frameworkof Fakcharoenphol and Rao [20]. In this framework, one repeatedly computes all-pairs shortestpaths on certain dense distance graphs using multiple runs of a clever implementation of Dijkstra’salgorithm (so-called FR-Dijkstra). Unfortunately, it is not clear how to break the Ω( √ n ) depthbound this way since Dijkstra’s algorithm is inherently sequential. Our improved depth bound isobtained by replacing the simple-minded Dijkstra-based APSP algorithm with that of Theorem 1.As we show, the Monge property of dense distance graphs that is crucial for the eﬃciency ofFR-Dijkstra can also be employed in our algorithm to yield a signiﬁcant parallel speed-up. Minimum mean and cost-to-time ratio cycle problems.

By plugging in our parallel nega-tive cycle detection algorithm into Megiddo’s parametric search framework [48] we obtain improved strongly polynomial algorithms for the minimum mean cycle and minimum cost-to-time ratio cycle problems on planar graphs (for formal deﬁnitions of these problems, refer to Section 4.3). Theorem 4.

Let G be a real-weighed planar graph. There exists an e O ( n / ) -time strongly poly-nomial algorithm for computing a minimum ratio cycle (and thus also a minimum mean cycle) in G . The minimum mean cycle and minimum cost-to-time ratio cycle problems are classical graphproblems studied since the seventies. They are used to construct strongly polynomial algorithmsfor computing minimum-cost ﬂows [59]. Moreover, via the cut-cycle duality in planar graphs, bothproblems have found practical applications in the area of image segmentation [29, 61–63]It is known that both problems can be reduced to negative cycle detection via binary search [42].However, this way we can obtain only weakly polynomial time algorithms with running times de-pendent on the magnitude of edge weights. For general graphs, the classical algorithm of Karp [34]solves the minimum mean cycle problem in O ( nm ) time, matching the best known strongly poly-nomial negative cycle detection bound achieved by the Bellman-Ford algorithm. Karp’s algorithm(and other minimum mean cycle algorithms for general graphs [26, 65]) operates on limited-hopshortest paths. As a result it is not clear how to take advantage of planarity to speed it up .The e O ( nm ) bound has not been matched to date for the more general minimum cost-to-timeratio problem which seems to be the original inspiration for the invention of the parametric searchtechnique [48] that later found other applications (e.g., [1]). This technique can be used to convertan eﬃcient parallel negative cycle detection algorithm into a strongly polynomial minimum ratiocycle algorithm. The best known strongly polynomial e O ( m / n / ) bound for the minimum ratiocycle problem is due to Bringmann et al. [11] and also follows by plugging their aforementionedparallel negative cycle detection algorithm into Megiddo’s framework.It seems that no strongly polynomial algorithms for the minimum mean cycle problem to datehave been designed speciﬁcally for planar graphs. However, by plugging in previously known parallelnegative cycle detection algorithms [20, 44] into the parametric search framework, one would onlyobtain e O ( n / ) -time strongly polynomial algorithms. Theorem 4 improves upon this signiﬁcantly. Also known as the minimum ratio cycle problem. At-most- h -hop shortest paths connecting pairs of vertices of a single face of a plane graph do not seem to admitalgorithmically useful properties, like the non-crossing property of usual shortest paths. omputing external dense distance graphs. Finally, our parallel APSP algorithm can beused to improve the bound for computing so-called external dense distance graphs wrt. a planargraph’s recursive decomposition – an important black box with applications in computing maximumﬂows [46], minimum cuts [9], and constructing distance oracles [14, 15, 51].Suppose a planar graph G is recursively decomposed using small cycle separators [49] of size O ( √ n ) until the obtained pieces have constant size. The decomposition procedure produces abinary tree T ( G ) whose nodes correspond to subgraphs of G ( pieces ). The boundary vertices ∂H ofa piece H ∈ T ( G ) are vertices that H shares with the remaining part G − H of the entire graph G .We denote by DDG H the dense distance graph – a complete weighted graph on ∂H whose edgeweights represent distances between all pairs of vertices of ∂H in H . Eﬃcient construction ofpiecewise DDGs and FR-Dijkstra alone are enough to obtain e.g., nearly linear-space static anddynamic exact distance oracles with sublinear query time [20, 31, 37]. Originally, Fakcharoenpholand Rao [20] gave an O ( n log n ) algorithm for computing each of the graphs DDG H , H ∈ T ( G ) ,inductively, based on the children graphs DDG H , DDG H . The key ingredient in their algorithmwas the aforementioned fast implementation of Dijkstra’s algorithm on a dense distance graph.Later, Klein [37] showed a more eﬃcient, O ( n log n ) -time algorithm that computed every DDG H directly, by building the so-called multiple-source shortest paths (MSSP) data structurefor each piece H ∈ T ( G ) separately. This was possible since the total size of all pieces H of thedecomposition is O ( n log n ) . However, some important applications [9, 14, 15, 46, 51] also require external graphs DDG G − H representing distances between ∂H in the complement graph G − H (for all H ∈ T ( G ) ). In this case Klein’s approach fails since the total size of all possible graphs G − H is Ω( n ) . Instead, one has to stick to the original inductive method of [20] and compute each DDG G − H i based on DDG G − H and DDG H − i (where, again, H is a parent of H , H in T ( G ) ).This takes O ( n log n ) time through all H . We show that our parallel APSP algorithm can be usedto obtain an algorithm whose running time almost matches the O ( n log n ) MSSP-based bound.

Lemma 5.

The external dense distance graphs

DDG G − H for all H ∈ T ( G ) can be computed in O ( n log n · log log n · α ( n )) time. For general directed graphs, the parallel single-source shortest paths problem (SSSP) is much betterunderstood. Ullman and Yannakakis [60] showed an e O ( m √ n ) -work algorithm with e O ( √ n ) depth forunweighted digraphs. Besides the aforementioned result of Bringmann et al. [11] and e Θ( n ) -depthparallel implementations of Dijkstra’s algorithm [12], all known parallel exact SSSP algorithmswork for weighted graphs with non-negative and integral weights bounded by W and are weaklypolynomial. Klein and Subramanian [41] generalized the bound of [60] to the weighted setting atthe cost of e O (log W ) factor in work/depth bounds. In this setting, they also gave a nearly work-eﬃcient polylog-depth algorithm for planar graphs [40]. Spencer [56] gave a trade-oﬀ algorithmwith e O (( n /d + m ) log W ) work and e O ( d ) depth. Forster and Nanongkai [22] in turn have recentlyshown a trade-oﬀ algorithm with e O (( md + mn/d + ( n/d ) ) log W ) work and e O ( d ) depth.Recently, following similar results for single-source reachability [21, 45], Cao et al. [13] showedthat (1+ ǫ ) -approximate single-source shortest paths can be found using near-optimal work e O ( m log W ) and e O ( n / o (1) ) depth. Note that this SSSP algorithm, if run from every source, yields a nearlywork-eﬃcient (approximate) APSP algorithm, but with polynomially worse depth than ours.Finally, parallel single-source shortest paths problems in undirected graphs have also receivedmuch attention, both in the exact [8, 18, 55] and the approximate [3, 19, 43] setting.4 .3 Technical Overview Our parallel APSP algorithm is based on the techniques used in the state-of-the-art Monte Carlorandomized decremental all-pairs shortest paths algorithm for weighted digraphs, due to Bern-stein [7]. Without loss of generality assume that our depth parameter is a power of two, i.e., d = 2 K . Our algorithm makes use of a hierarchy of hub sets V = H , H , . . . , H i , . . . , H d ⊆ V suchthat each H i is a i -hub set of G and has size O (( n/ i ) log n ) . Roughly speaking, this means thatfor all pairs u, v ∈ V , some shortest u → v path, if it consists of at least i edges, contains a vertexof H i .Let us start with describing a randomized version of our algorithm. A well-known fact attributedto Ullman and Yannakakis [60] states that picking each H i to be a random Θ(( n/ i ) · log n ) -subsetof V guarantees that H i forms a i -hub-set of G with high probability.Rather than using the inherently sequential Dijkstra’s algorithm, as sequential e O ( nm ) -timeAPSP algorithms [30, 54] do, we rely exclusively on the Bellman-Ford algorithm. A variant ofBellman-Ford algorithm, given a source s , maintains distance labels and performs a number of steps relaxing all edges in arbitrary order in O ( m ) time. After k steps, the distance label of v ∈ V is equal to the length δ kG ( s, v ) of a shortest at-most- k -hop path from s to v . In a single step, foreach vertex we need to combine edge relaxations ending in this vertex, so a single step requires O (polylog n ) depth. Consequently, performing k steps of Bellman-Ford requires O ( mk ) work butonly e O ( k ) depth.The key idea behind Bernstein’s algorithm [7] is that computing shortest paths δ G ( u, v ) for ( u, v ) ∈ H i × V can be reduced to computing at-most- i +1 -hop shortest paths on a graph G i,s obtained from G by augmenting it with | H i +1 | = O ( n ) auxiliary edges st for all t ∈ H i +1 , suchthat the weight of st equals the distance δ G ( s, t ) from s to t . This idea suggests an inductivealgorithm that, given distances for all pairs in ( H i +1 × V ) ∪ ( V × H i +1 ) , where i < K , lifts themto distances for all pairs in ( H i × V ) ∪ ( V × H i ) using | H i | parallel i +1 -step Bellman-Ford runs.These can be performed in O ( | H i | · i · m ) = O ( nm log n ) work and e O (2 i ) depth. Eventually, since H = V , we obtain the all-pairs distances in G . Summing through all inductive steps, the work is O ( nm log n log d ) , whereas the depth is e O (cid:0) + . . . + 2 K (cid:1) = e O (cid:0) K (cid:1) = e O ( d ) .The last thing missing in the above algorithm is the induction base, i.e., computing δ G ( s, t ) forall s, t ∈ H d in only e O ( d ) depth. This is where we depart from Bernstein’s approach [7]. Notethat lengths δ d +1 G ( s, t ) of shortest at-most- ( d + 1) hop paths from all s ∈ H d can be computedusing ( d + 1) -step parallel Bellman-Ford using O ( | H d | dm ) = O ( nm log n ) work and e O ( d ) depth. Inorder to combine these bounded-hop lengths into actual distances between s, t ∈ H d , we switch tothe repeated-squaring algorithm for dense graphs, as was also done in the parallel SSSP [11] andtransitive closure [60] algorithms. Namely, we run this algorithm on a complete graph G d on H d whose edge uv has weight δ d +1 G ( u, v ) . The correctness of this approach follows by observing thata shortest s → t path that uses more than d + 1 hops has to encounter an intermediate vertex z ∈ H d after no more than d + 1 hops. The O (polylog n ) -depth repeated-squaring algorithm addsan e O (( n/d ) ) term to the work but does not increase the depth, up to polylogarithmic factors. Computing hub sets deterministically.

Now let us brieﬂy describe how to derandomize theabove parallel APSP algorithm by replacing sampling with a deterministic preprocessing step thatcomputes all hub sets H , . . . , H d within e O ( nm ) work and e O ( d ) depth. The all-pairs distances for H d can be lifted to distances for all pairs in ( H d × V ) ∪ ( V × H d ) using e O ( nm ) workand e O ( d ) depth as in the inductive step by setting H d := H d . The same strategy of switching to a dense-graph algorithm has also proved useful in obtaining a deterministicincremental algorithm for APSP in weighted directed graphs [33]. h -hub sets typically obtained them by (1) computing (or maintaining – in the dynamicsetting) at-most- h -hop shortest paths between all pairs of vertices s, t ∈ V , and (2) running a greedy O (log n ) -approximation algorithm for computing a O (( n/h ) log n ) -size hitting set of a family of h -element subsets of an n -element universe (see e.g., [35] for analysis). Unfortunately, the ﬁrst stepof this approach seems to require Ω( nmh ) time.To obtain an improved bound, we reuse the inductive approach. Speciﬁcally, we show thatgiven a hub set H i , one can obtain a i +1 -hub set H i +1 of size O (( n/ i ) · log n ) by running theaforementioned greedy hitting set algorithm on shortest i -hop paths from H i . Constructingthese paths using Bellman-Ford costs O ( | H i | · i · m ) = O ( nm log n ) time and requires e O (2 i ) depth.Luckily, the deterministic greedy hitting set algorithm has its nearly work-eﬃcient parallel versionwith O (polylog n ) depth [6]. Therefore, one can construct H d using e O ( nm ) work and e O ( d ) depth.It is worth noting that the analysis of the above algorithm breaks if the shortest negative cyclein G has at most d edges. We stress that this is not a problem for the APSP application though,since any negative cycles in G make the APSP problem infeasible. That being said, our deterministichub set algorithm can be easily extended to correctly report the shortest negative cycle within thesame bounds and thus match the randomized bound of [53] for the shortest negative cycle problem.We also remark that even though the hub sets are very useful in designing dynamic graphalgorithms, it is unlikely that our construction can help match the best known randomized boundsfor dynamic problems (like [7]) using deterministic algorithms. The power of a randomly sampledhub set H (and, possibly, the so-called oblivious adversary assumption) lies in the fact that H retains its guarantees through all, in particular future , versions of a dynamic graph. Comparison to the transitive closure algorithm of [60].

Ullman and Yannakakis [60] usea single randomly sampled d -hub set H d of size O (( n/d ) log n ) . In similar way they computereachability between the nodes of H d in e O ( nm + ( n/d ) ) work and e O ( d ) depth – they ﬁrst applylimited-depth BFS from all H d , and then use repeated squaring. Adding the e O (( n/d ) ) “shortcuts”between H d to G reduces G ’s diameter to e O ( d ) . Finally, a limited-depth BFS is run from eachsource in parallel. Since the augmented graph has e O ( m + ( n/d ) ) edges, this takes e O ( nm + n /d ) work and e O ( d ) depth. The eﬃciency of this approach crucially relies on the fact that BFS hasnear-linear work and, as a result, does not generalize to real-weighted graphs which require usingBellman-Ford. Planar graph applications.

All of the numerous consequences of our parallel APSP algorithmfor planar graphs essentially follow by showing improved parallel and sequential bounds for thefollowing problem, which we call dense distance graph APSP (DDG APSP).Let H , H ∈ T ( G ) be two pieces of a recursive decomposition of G and let b = | ∂H | + | ∂H | .We would like to compute all-pairs shortest paths in the graph DDG H ∪ DDG H .A parallel algorithm solving the above problem using T ( b, d ) = Ω( b ) work and e O ( d ) depth,plugged in the framework of Fakcharoenphol and Rao [20], implies: • An e O ( n + T ( √ n, d )) -work and e O ( d ) -depth algorithm for negative-cycle detection on real-weighted planar graphs. Via known reductions [16, 50], the same bound can be achieved forthe feasible ﬂow, bipartite perfect matching, and replacement paths problems. A somewhat similar trick has been used in [33] for improving the state-of-the-art randomized partially dynamicAPSP algorithms [5, 7] from Monte Carlo to Las Vegas. e O ( nd + T ( √ n, d )) -time sequential strongly polynomial algorithms for minimum mean andminimum cost-to-time ratio cycle problems.Moreover, a sequential algorithm solving the DDG APSP problem in S ( b ) = Ω( b ) time impliesan O ( S ( √ n ) log n ) algorithm for computing all external dense distance graphs DDG G − H for H ∈T ( G ) .Fakcharoenphol and Rao’s [20] algorithm for solving DDP APSP uses Johnson’s [30] approach:ﬁrst a feasible price function is computed using Bellman-Ford algorithm to reduce the task to thenon-negatively weighted case. Subsequently, Dijkstra’s algorithm is run from each of Ω( b ) sources.Fakcharoenphol and Rao gave a very eﬃcient Dijkstra implementation running in O ( b log b ) time on DDG H ∪ DDG H . They also showed that a single step of Bellman-Ford can be performed on DDG H ∪ DDG H in O ( b log b ) time. Klein et al. [39] later noticed that a Bellman-Ford step canbe performed in O ( b · α ( b )) time. By these bounds, the dense distance graph APSP can be solvedsequentially in O ( b log b ) time and using a parallel algorithm with e O ( b ) work and e O ( b ) depth.We show that the dense distance graph APSP problem can be solved using a parallel algorithmwith work e O ( b + ( b/d ) ) and depth e O ( d ) . This essentially follows by using a more eﬃcient parallelimplementation of a Bellman-Ford step in the algorithm of Theorem 1. As observed in [20], aBellman-Ford step on a dense distance graph can be rephrased as computing column minimaof a certain matrix with Monge property (a Monge matrix , in short). Since column minima ofa Monge matrix can be computed using a polylogarithmic time parallel algorithm [2], a singleBellman-Ford step can be implemented using only e O ( b ) work and O (polylog n ) depth, even though DDG H ∪ DDG H has Ω( b ) edges.Moreover, we show a faster sequential algorithm for DDG APSP with O ( b log b · log log b · α ( b )) running time. First, we observe that our parallel APSP algorithm (given a feasible price function)has a sequential implementation with O (cid:0) n · B ( n, m ) · log n · log d + (cid:0) nd log n (cid:1) · D ( n, m ) (cid:1) time, where B ( n, m ) denotes the cost of a single Bellman-Ford step, and D ( n, m ) denotes the cost of a singleDijkstra step (on a graph with n vertices and m edges). Second, we leverage the asymmetrybetween the O ( b · α ( b )) cost of a Bellman-Ford step and the O ( b log b ) cost of running FR-Dijkstraon DDG H ∪ DDG H . To obtain our improved bound, it is enough to choose d = log b . In this paper we deal with real-weighted directed graphs. We write V ( G ) and E ( G ) to denote thesets of vertices and edges of G , respectively. A graph H is a subgraph of G , which we denote by H ⊆ G , if and only if V ( H ) ⊆ V ( G ) and E ( H ) ⊆ E ( G ) . We write uv ∈ E ( G ) when referring toedges of G and use w G ( uv ) to denote the weight of uv . If uv / ∈ E , we assume w G ( uv ) = ∞ .A sequence of edges P = e . . . e k , where k ≥ and e i = u i v i ∈ E ( G ) , is called an s → t pathin G if s = u , v k = t and v i − = u i for each i = 2 , . . . , k . For brevity we sometimes also express P as a sequence of k + 1 vertices u u . . . u k v k or as a subgraph of G with vertices { u , . . . , u k , v k } and edges { e , . . . , e k } . The hop-length | P | is equal to the number of edges in P . We also say that P is a k -hop path . The length of the path ℓ ( P ) is deﬁned as ℓ ( P ) = P ki =1 w G ( e i ) . For convenience,we sometimes consider a single edge uv as a path of hop-length . If P is a u → v path and P is a v → w path, we denote by P · P (or simply P P ) a path obtained by concatenating P with P .We deﬁne δ kG ( u, v ) to be the length of the shortest path from u to v among paths of at most k edges. Formally, δ kG ( u, v ) = min { ℓ ( P ) : u → v = P ⊆ G and | P | ≤ k } . The distance δ G ( u, v ) between the vertices u, v ∈ V ( G ) is the length of the shortest u → v path in G , or ∞ , if no u → v path exists in G . In other words, δ G ( u, v ) = min k ≥ δ kG ( u, v ) . Note that the distance iswell-deﬁned only if G contains no negative cycles. It is well known that G has no negative cycles7f and only if there exists a feasible price function p : v → R satisfying w G ( e ) + p ( u ) − p ( v ) ≥ forall uv = e ∈ E ( G ) . We deﬁne minimal paths as follows. Deﬁnition 6.

We call a u → v path P ⊆ G minimal if ℓ ( P ) = δ | P | G ( u, v ) < δ | P |− G ( u, v ) . Observation 7.

All subpaths of a minimal path are also minimal.

If the graph G that we refer to is clear from the context, we sometimes omit the subscript G and write w ( uv ) , δ ( u, v ) , δ k ( u, v ) etc. instead of w G ( u, v ) , δ G ( u, v ) , δ kG ( u, v ) , etc., respectively. Parallel model.

Formally, we use the work-depth model as used in recent literature on parallelreachability and shortest paths, e.g., [21, 43]. An algorithm in this PRAM model diﬀers from asequential algorithm only by the inclusion of the parallel foreach loops. The work of an algorithmis the total number of operations performed if all parallel foreach loops were executed sequentially.The depth of an algorithm is the total number of operations performed by sequential steps, plus thesum, over all parallel foreach loops, of the maximum (sequential) iteration cost of a loop. In orderto not focus on low-level details, we specify neither what is the depth overhead of a k -way parallelforeach loop, nor what is the precise shared memory access model (e.g., EREW, CREW). Instead,we state all our parallel work and depth bounds using e O ( · ) notation that suppresses polylogarithmicfactors. This is justiﬁed by the existence of general reductions between these diﬀerent PRAMvariants that yield only polylogarithmic multiplicative overhead in work and depth [28]. Bellman-Ford algorithm.

The Bellman-Ford algorithm is a classical algorithm for computingshortest paths from a single source s or detecting a negative cycle if one exists. It maintains adistance label vector d : V → R , where initially d ( s ) = 0 and d ( v ) = ∞ for all v ∈ V \ { s } , andproceeds in steps . Classically, a Bellman-Ford step consists of performing edge relaxations , i.e.,substitutions d ( v ) := min( d ( v ) , d ( u ) + w ( uv )) for all edges e = uv , in any order. It is well-knownthat: (1) if d ( v ) = ∞ then d ( v ) is a length of some s → v path in G , (2) after k Bellman-Ford stepswe have d ( v ) ≤ δ kG ( s, v ) for all v ∈ V , (3) if d ( u ) + w ( uv ) < d ( v ) for some uv ∈ E after n − steps,then G has a negative cycle, (4) if G has no negative cycle and s can reach every other vertex, thenthe obtained distance label vector constitutes a feasible price function of G . A single Bellman-Fordstep clearly takes O ( m ) time, so the Bellman-Ford algorithm runs in O ( mk ) time if k steps areperformed. However, in general, the results of individual relaxations in a single step may dependon the results of relaxations that happened earlier in that step.In this paper we actually use a variant of the Bellman-Ford algorithm that might converge to theanswer slower, but is easier to reason about. Namely, in a single step , for all v at once , we replace d ( v ) with min ( d ( v ) , min uv ∈ E { d ( u ) + w ( uv ) } ) . Equivalently, at the beginning of a step we couldstore a copy of vector d as d ′ , and then again relax the edges in any order, where the relaxation isnow a substitution d ( v ) := min( d ( v ) , d ′ ( u ) + w ( uv )) . It is easy to prove that in this variant, after k steps we have precisely d ( v ) = δ kG ( v ) for all v ∈ V . The properties (3) and (4) of the classicalBellman-Ford algorithm hold for this variant as well. Moreover, the result of each relaxation nowonly depends on the distance labels in the previous step. Consequently, for this variant, a singleBellman-Ford step can be clearly performed in parallel using e O ( m ) work and O (polylog n ) depth. Hitting sets.

Let F be a collection of subsets of some universe U . Then X ⊆ U is called a hittingset of F if X ∩ S = ∅ for all S ∈ F . Lemma 8 ([6, 35]) . Let Π be a collection of k simple h -hop paths of G (i.e., k ( h + 1) -elementsubsets of V ( G ) ). A hitting set Π of size O (( n/h ) · log k ) can be computed in a deterministic way: sequentially using a greedy algorithm in O ( kh + n ) time, • using a parallel algorithm with O (polylog n ) depth and e O ( kh + n ) work. In this section we describe the main result of this paper. Our algorithm will use the concept ofan h -hub set, as deﬁned below. Deﬁnition 9.

We call a set H h ⊆ V a h -hub set if for all u, v ∈ V such that δ hG ( u, v ) < δ h − G ( u, v ) there exists a minimal path P = u → v in G such that | P | = h and P goes through a vertex of H h . The following randomized construction of h -hub sets is attributed to Ullman and Yannakakis [60]. Fact 10 ([60]) . For any h ∈ [1 , n ] , a random Θ (cid:0) nh log n (cid:1) -element subset of V constitutes an h -hubset of G with high probability . First, we will show how having hub sets for various values h leads to a randomized algorithm,whereas the deterministic construction of hubs will follow. In the remaining part of this section we assume that G has no negative cycles. We will deal withnegative cycle detection later on.Let d ∈ [1 , n ] be a parameter that controls the depth of our algorithm. By possibly decreas-ing the demanded depth by just a constant factor, we can assume, without loss of generality,that d = 2 K , where K is an integer. We ﬁrst show how to compute APSP given hub sets V = H , H , . . . , H i , . . . H d , where each H i has size O (( n/ i ) · log n ) . By Fact 10, all thesehub sets can be obtained with e O ( n ) work and O (polylog n ) depth using sampling.The ﬁrst step of our algorithm is to compute shortest ≤ ( d + 1) -hop distances δ d +1 G ( s, t ) for all s, t ∈ H d . This can be done by running ( d + 1) steps of the Bellman-Ford algorithm from all s ∈ H d in parallel using e O ( | H d | · d · m ) = e O ( nm ) work and e O ( d ) depth, or in O ( nm log n ) time sequentially.Let G d be deﬁned as a complete graph on H d with weights given by w G d ( uv ) = δ d +1 G ( u, v ) forall u, v ∈ H d . The second step is computing the actual shortest paths δ G ( s, t ) for all s, t ∈ H d byrunning the APSP algorithm based on repeated-squaring (of the weighted adjacency matrix usingmin-plus product) on the graph G d . Lemma 11.

Let s, t ∈ H d . Then δ G d ( s, t ) = δ G ( s, t ) .Proof. Since the edge lengths in G d encode path lengths in G , we clearly have δ G d ( s, t ) ≥ δ G ( s, t ) .Let h be the hop-length of a minimal shortest s → t path in G , i.e., minimum h such that δ hG ( s, t ) = δ G ( s, t ) . We prove δ G d ( s, t ) ≤ δ G ( s, t ) by induction on h .If h ≤ d + 1 , then δ G ( s, t ) = δ d +1 G ( s, t ) = w G d ( st ) ≥ δ G d ( s, t ) . Otherwise, suppose h > d + 1 andconsider some minimal shortest s → t path P in G . Let us write P = P ′ QP ′′ , where | P ′ | , | P ′′ | ≥ , | Q | = d , and Q is a p → q path, p, q ∈ V . Since P is minimal, so is Q (by Observation 7) and, asa result, we have δ d − G ( p, q ) > δ dG ( p, q ) . Therefore, by the deﬁnition of a d -hub set, there exists a The algorithm of Berger et al. [6] actually produces a hitting set a constant-factor larger than the greedy algorithm. Here we abuse the notation slightly. Formally, by increasing the constant c ≥ hidden in the Θ notation, onecan achieve probability − /n c . Q ′ = p → q in G with length δ dG ( p, q ) = ℓ ( Q ) , | Q ′ | = | Q | , and going through a vertex z ∈ H d .Consequently, P ′ Q ′ P ′′ is a minimal shortest s → t path with z ∈ H d as an intermediate vertex.By Observation 7, this implies δ h − G ( s, z ) = δ G ( s, z ) and δ h − G ( z, t ) = δ G ( z, t ) . By the inductiveassumption we conclude δ G d ( s, z ) = δ G ( s, z ) and δ G d ( z, t ) = δ G ( z, t ) . Finally, by z ∈ V ( G d ) andtriangle inequality we obtain δ G d ( s, t ) ≤ δ G d ( s, z ) + δ G d ( z, t ) = δ G ( s, z ) + δ G ( z, t ) = δ G ( s, t ) .The repeated squaring APSP algorithm has e O ( | H d | ) = e O (( n/d ) ) work and O (polylog n ) depth(one can also think of the min-plus product as n Bellman-Ford steps that can be performed inparallel, and hence a single product requires O (polylog n ) depth). If one wanted to implement thisstep sequentially, the Floyd-Warshall APSP algorithm would yield O (( n/d ) log n ) time.Finally, the last step is to inductively compute, for each k = K, . . . , down to , the distances δ G ( s, t ) for all pairs ( s, t ) ∈ ( H k × V ) ∪ ( V × H k ) . Recall that we have set H = V , so aftercompleting the step for k = 0 , this will give the all-pairs distance matrix as desired.For convenience, let us set H K +1 := H K . Let us focus on computing δ G ( s, t ) for s ∈ H k andall t ∈ V assuming that the steps for larger k have already been completed and so the distances δ G ( u, v ) for ( u, v ) ∈ ( H k +1 × V ) ∪ ( V × H k +1 ) are already known. Actually, for the inductive stepwe only require these distances for the pairs from ( H k +1 × H k ) ∪ ( H k × H k +1 ) , which impliesthat in the ﬁrst step (for k = K ) they can be retrieved from the distance matrix of G d . Computing δ G ( s, t ) for ( s, t ) ∈ V × H k is symmetric. Let G k,s be G with an auxiliary edge e v = sv ofweight w G k,s ( e v ) = δ G ( s, v ) added for all v ∈ H k +1 . Observe that all the auxiliary edges’ weightshave been already computed in the previous step. We compute the desired distances δ G ( s, t ) byrunning k +1 + 1 steps of the Bellman-Ford algorithm from each s ∈ H d (in parallel). This costs e O ( | H k | · m · k ) = e O ( mn ) work and e O (2 k ) depth in parallel, or O ( nm log n ) time sequentially.Note that through all k , the total work is e O ( nm ) , whereas the depth is e O ( d ) . The sequentialtime cost of the ﬁnal phase is O ( nm log n log d ) . The correctness of the above algorithm followsfrom the lemma below. Lemma 12.

For any ( s, t ) ∈ H k × V , δ k +1 +1 G k,s ( s, t ) = δ G ( s, t ) .Proof. Let b := 2 k +1 + 1 . Since G ⊆ G k,s , and the auxiliary edges encode some path lengths in G ,we clearly have δ bG k,s ( s, t ) ≥ δ G k,s ( s, t ) = δ G ( s, t ) .Let us now prove δ bG k,s ( s, t ) ≤ δ G ( s, t ) . Let P be a minimal shortest s → t path in G . If | P | ≤ b ,then our claim holds by G ⊆ G k,s . Suppose | P | > b . Let us write P = QR , where | R | = 2 k +1 and R = u → t . Observe that the path R is shortest and minimal. Hence, by the deﬁnitionof H k +1 , there exists a minimal shortest path R ′ = u → t going through a vertex z ∈ H k +1 and | R ′ | = 2 k +1 . Therefore, QR ′ is also a minimal shortest s → t path. Let us express QR ′ as P P , where P is a minimal shortest z → t path. Note that we have | P | ≤ k +1 . Moreover, ℓ ( P ) = δ G ( s, z ) = w G k,s ( e z ) . Hence, the s → t path e z P of length δ G ( s, t ) is contained in G k,s and its hop-length does not exceed k +1 + 1 = b . Its existence implies δ bG k,s ( s, t ) ≤ δ G ( s, t ) .Hence, we obtain the following lemma. Lemma 13.

Let d = 2 K ≤ n and assume that G does not contain a negative cycle. Given a col-lection of sets H k , where k = 0 , . . . , K , such that H k is a k -hub set of G of size O (( n/ k ) log n ) ,all-pairs shortest paths can be computed in parallel using e O ( nm ) work and e O ( d ) depth, or sequen-tially in O ( nm log n log d ) time. .2 Deterministic Construction of Hubs In this section we show how to construct hub sets in a deterministic way. We start with a fewtechnical lemmas.

Lemma 14.

Let H h be a h -hub set of G . Let Π be a collection of minimal h -hop paths P st = s → t ,one path for each pair ( s, t ) ∈ H h × V for which such a minimal path exists.Let B be a hitting set of Π . Then B is a h -hub set of G .Proof. We need to prove that, for each u, v ∈ V such that δ hG ( u, v ) < δ h − G ( u, v ) , there exists aminimal h -hop u → v path in G going through a vertex of B . To this end, let P be some minimal u → v path such that | P | = 2 h . Split P evenly into two paths P P of hop-length h . Note that,by Observation 7, every subpath of P , in particular P = u → w , is minimal. Since H h is a hubset, there is a vertex z ∈ H h on some minimal path Q = u → w such that | Q | = h . Let P ′ = QP ,and let us express P ′ = RST so that S = z → y is a path satisfying | S | = h . Note that byminimality of P ′ , S is also minimal. Therefore, δ hG ( z, y ) < δ h − G ( z, y ) , and thus we have a minimal z → y path P zy of hop-length h in Π . Finally, observe that RP zy T is a minimal u → v path ofhop-length h and goes through a vertex of B by the deﬁnition of a hitting set of Π . Lemma 15.

Suppose that G has no negative cycles with at most h edges. Then every minimal h -hop path in G is simple.Proof. A non-simple path contains a cycle. If a minimal h -hop path P contained a non-negativecycle, there would exist a path of hop-length < h and length no more than ℓ ( P ) , thus contradictingminimality of P . If P contained a negative cycle, the cycle would have at most h edges. Lemma 16.

Suppose that G has no negative cycles with at most h edges. Let H h be a h -hub setof G . Then in O ( | H h | hm ) time we can: • construct a h -hub set H h of G such that | H h | = O (( n/h ) · log n ) , • ﬁnd a shortest negative cycle in G with no more than h edges, if one exists.Proof. To ﬁnd H h , we ﬁrst compute shortest ≤ h -hop paths from all s ∈ H h to all v ∈ V using h steps of Bellman-Ford algorithm in O ( | H h | hm ) time. Note that indeed minimal h -hop paths canbe inferred from the Bellman-Ford execution by storing (1) the distance labels d i from all the steps,and (2) the predecessor vectors p i such that d i +1 ( v ) = min uv ∈ E { d i ( u ) + w ( uv ) } = d i ( p i ( v )) + w ( p i ( v ) v ) . Clearly, an s → v minimal path of hop-length h exists if d h ( v ) < d h − ( v ) and it can be easily recon-structed from the predecessor vectors. Since no negative cycle in G has ≤ h edges, by Lemma 15,the computed minimal h -hop paths are all simple. As a result, by Lemma 8, we can computea hitting set B of size O (( n/h ) log n ) of these paths in O (( | H h | · n ) · h ) time. By Lemma 14, B constitutes a h -hub set of G .Suppose a shortest negative cycle C = v . . . v k with | C | = k ∈ ( h, h ] exists in G . Since C hasa minimal number of hops (in particular, it is simple), the subpath P = v → v h +1 of C satisﬁes δ hG ( v , v h +1 ) ≤ ℓ ( P ) < δ h − G ( v , v h +1 ) . As a result, and by the deﬁnition of H h , there exists a h -hopminimal path P ′ = v → v h +1 such that z ∈ H h ∩ V ( P ′ ) and ℓ ( P ′ ) ≤ ℓ ( P ) . Hence, if we replace thepreﬁx P of C with P ′ , we obtain another shortest negative cycle C ′ with | C ′ | ∈ ( h, h ] and goingthrough z . It follows that δ hG ( z, z ) < . By performing h Bellman-Ford steps from all z ∈ H h k ≤ h such that δ k ( z, z ) < for some z ∈ H h , if oneexists. This k clearly equals | C | . Since we have to check all z ∈ H h , this takes O ( | H h | hm ) timethrough all z .We are now ready to describe the preprocessing step of our APSP algorithm that computesthe hub sets H , . . . , H d =2 K and possibly detects the shortest negative cycle. Let us ﬁrst discuss asequential algorithm. We proceed inductively, using Lemma 16.We set H = V . For each k = 0 , . . . , K − we proceed as follows. We maintain an invariantthat G contains no negative cycles of hop-length at most k and | H k | = O (( n/ k ) log n ) . Hence,the invariant is true initially for k = 0 since, clearly, there are no single-edge negative cycles. Byour invariant, we can apply Lemma 16 for h = 2 k and thus in O ( nm log n ) time either detect ashortest negative cycle with no more than h edges (and declare the APSP problem infeasible) orconstruct a k +1 -hub set H k +1 of size O (( n/ k +1 ) log n ) and guarantee the invariant for k + 1 .The above sequential algorithm trivially implies a parallel algorithm: all the above O (log d ) steps of our computation can be implemented using a number of parallel invocations of an O ( d ) -stepBellman-Ford algorithm and an O (polylog n ) -depth parallel computation (as given in Lemma 8)of a small hitting set of a collection of simple paths with a total hop-length of O ( n log n ) (thecollection in step k has O (( n / k ) · log n ) paths of length Θ(2 k ) ). We thus obtain the followingtheorem. Theorem 17.

Let d = 2 K ≤ n . Then, a shortest negative cycle of G , provided it has hop-length atmost d , can be computed deterministically: • in O ( nm log n log d ) time using a sequential algorithm, • using a parallel algorithm with e O ( nm ) work and e O ( d ) depth.If no negative cycle of hop-length at most d exists in G , then the algorithm can produce, within thesame time bounds, a collection of sets H k , where k = 0 , . . . , K , such that H k is a k -hub set of G of size O (( n/ k ) · log n ) . Together with Lemma 13 this ﬁnishes the proof of our main theorem.

Theorem 1.

Theorem 2.

Let G be a real-weighted directed graph. A shortest (in terms of hops) negative-weightcycle in G can be found deterministically in O ( nm log n ) time. In this section we present our improved planar graph algorithms. We start from describing theframework used by Fakcharoenphol and Rao [20] to solve the negative cycle detection problem onplanar graphs in near-linear time. This framework forms the basis for all our parallel and sequentialalgorithms. We remark that the negative cycle detection algorithm of [20] has been subsequentlyimproved by Klein et al. [39], whereas the currently best known bound O ( n log n/ log log n ) is dueto Mozes and Wulﬀ-Nilsen [52]. These algorithms take a slightly simpler recursive approach, butthe fundamental diﬀerence compared to [20] is using Klein’s MSSP algorithm [37] on each piece H DDG H in O (( | V ( H ) | + | ∂H | ) log n ) time. This approach allowsto avoid using the costly FR-Dijkstra (Lemma 19) and thus saves at least a O (log n ) factor in therunning time. However, it seems that Klein’s MSSP algorithm is inherently sequential and this iswhy in the following we will stick to the original approach of [20].Transferring the approach of [20] to the parallel setting directly leads to e O ( √ n ) -depth work-eﬃcient parallel algorithms for single source shortest paths and negative cycle detection. We ﬁrstshow how to improve the depth to e O ( n / ) using our parallel APSP algorithm. Using this, welater show improved algorithms for minimum cost-to-time ratio cycle problem and external densedistance graphs computation. Let H be a weighted plane digraph with a distinguished set ∂H of boundary vertices that necessarilylie on a O (1) faces of H . We denote by DDG H (a dense distance graph ) the complete weightedgraph on ∂H whose edge weights represent distances between all pairs of vertices of ∂H in H .Fakcharoenphol and Rao [20] developed the concept of a dense distance graph along with eﬃcientalgorithms for constructing and processing DDGs as a way to obtain their breakthrough O ( n log n ) -time algorithm for negative cycle detection in a real-weighted planar digraph. We now review avariant of this algorithm using slightly more modern terminology.After augmenting G to be connected and triangulated, G is recursively decomposed using smallcycle separators [49] of size O ( √ n ) until the obtained pieces have constant size. The decompositionprocedure produces in O ( n log n ) time a binary tree T ( G ) whose nodes correspond to subgraphsof G ( pieces ), with the root being all of G and the leaves being pieces of constant size. Weidentify each piece H with the node representing it in T ( G ) . We can thus abuse notation and write H ∈ T ( G ) . The boundary vertices ∂H of a piece H are vertices that H shares with some otherpiece Q ∈ T ( G ) that is not H ’s ancestor. For convenience we extend the boundary set ∂L of aleaf piece L to its entire vertex set V ( L ) . It is known that (see e.g., [9, 38]) one can additionallyguarantee that for each piece H ∈ T ( G ) , (1) H is connected, (2) ∂H lies on some O (1) faces of H , and (3) | ∂H | = O ( √ n ) . Moreover, one can assume that P H ∈T ( G ) | V ( H ) | = O ( n log n ) and P H ∈T ( G ) | ∂H | = O ( n log n ) .Given the decomposition, the algorithm processes the pieces H ∈ T ( G ) bottom-up. Clearly,if any piece contains a negative cycle, the whole G does so as well. On the other hand, if H contains no negative cycle, the dense distance graph DDG H on ∂H is well-deﬁned. Therefore, thecomputation for a piece H either detects a negative cycle in H or produces DDG H otherwise. Let H , H be the children of the node H in T ( G ) and suppose neither of them contains a negativecycle. Let H ′ = DDG H ∪ DDG H . It can be easily shown that (1) H contains a negative cycle ifand only if H ′ contains a negative cycle, (2) for any u, v ∈ ∂H , if H contains no negative cycle, then δ H ( u, v ) = δ H ′ ( u, v ) . Consequently, in the algorithm of [20], for each piece H we ﬁrst run Bellman-Ford-based SSSP algorithm on H ′ which either detects a negative cycle or produces a feasibleprice function p on H ′ . The second step is to compute all-pairs shortest paths on H ′ by running | ∂H ∪ ∂H | Dijkstra-based single-source computations with edge costs in H ′ reduced with p . Thegraph DDG H can be easily obtained from the computed distance matrix since ∂H ⊆ ∂H ∪ ∂H .Since each DDG H has | ∂H | edges, using Bellman-Ford and Dijkstra naively would lead to e O (cid:16)P H ∈T ( G ) | ∂H | (cid:17) = e O ( n / ) running time. The main contribution of [20] lies in showing thatthe special structure of a dense distance graph (that is, the distance matrix behind DDG H consistsof two so-called staircase Monge matrices ) can be leveraged to speed up naive implementations ofthese algorithms. The original implementations of [20] have been slightly improved, and currently13e have the following bounds. Lemma 18 ([20, 39]) . A single step of Bellman-Ford algorithm can be simulated on

DDG H ∪ DDG H in O (( | ∂H | + | ∂H | ) α ( n )) time, where α ( n ) is the inverse Ackermann function. The above lemma is a simple application of the algorithm of Klawe and Kleitman [36] forcomputing row minima of a staircase m × m Monge matrix in O ( mα ( m )) time. Lemma 19 (FR-Dijkstra [20, 23]) . Given a feasible price function p on DDG H ∪ DDG H , onecan simulate Dijkstra’s algorithm on DDG H ∪ DDG H in O (cid:16) ( | ∂H | + | ∂H | ) log n log log n (cid:17) time. The above lemmas imply that the running time of Fakcharoenphol and Rao’s algorithm is O (cid:16)P H ∈T ( G ) | ∂H | log n/ log log n (cid:17) = O ( n log n/ log log n ) . A parallel implementation.

We note that the above algorithm can be easily turned into aparallel algorithm with e O ( n ) work and e O ( √ n ) depth, as explained below.First, computing the decomposition within e O ( n ) work and O (polylog n ) depth follows from thefact that O ( √ n ) -size simple cycle separators can be computed within these bounds [40, 49]. Next,it is known that row minima of a staircase Monge matrix can be computed using near-linear workand O (polylog n ) depth [2]. This implies the following. Lemma 20.

A single step of Bellman-Ford algorithm can be simulated on

DDG H ∪ DDG H usinga parallel algorithm with e O ( | ∂H | + | ∂H | ) work and O (polylog n ) depth. Consequently, the Bellman-Ford algorithm can be simulated on

DDG H ∪ DDG H using e O (cid:0) ( | ∂H | + | ∂H | ) (cid:1) work and e O ( | ∂H | + | ∂H | ) = e O ( √ n ) depth.Finally, since each of | ∂H | Dijkstra’s algorithm runs used to compute

DDG H based on DDG H and DDG H are independent, they can be performed in parallel to give e O ( √ n ) depth bound onthis step. As the depth of T ( G ) is O (log n ) , the total work is e O ( n ) , whereas the total depth is e O ( √ n ) . In this section we apply our parallel APSP algorithm to obtain a polynomially smaller bound onthe depth required to detect a negative cycle and to compute single-source shortest paths in areal-weighted planar digraph.First, observe that having an APSP algorithm that can handle negative cycle detection onthe ﬂy allows us to replace the two-phase Bellman-Ford-Dijkstra approach in the framework ofFakcharoenphol and Rao. Indeed, then for each piece H all we do is computing DDG H by runningthe all-pairs shortest paths algorithm directly on DDG H ∪ DDG H , without using price functionsthat reduce the problem to the non-negatively weighted case.Recall that the algorithm of Theorem 1 is a certain combination of a number of limited-hopBellman-Ford invocations on various graphs G ′ obtained from G by adding O ( n ) auxiliary edges,and a single run of a repeated squaring algorithm. In fact, if we were able to execute a singleBellman-Ford step on such a graph G ′ using O ( t ( n, m )) work and O (polylog n ) depth, the algorithmof Theorem 1 would have e O ( n + n · t ( n, m ) + ( n/d ) ) work and e O ( d ) depth for any d . We use thisfact to prove the following lemma. Lemma 21.

Let H , H be the children of a node H ∈ T ( G ) . Let b = | ∂H | + | ∂H | . We cancompute all-pairs shortest paths in DDG H ∪ DDG H (and thus obtain DDG H ), using a parallel lgorithm with e O ( b ) work and e O ( b / ) depth. If the problem is infeasible, the algorithm detects anegative cycle within the same bounds.Proof. Let X be the graph DDG H ∪ DDG H with some O ( b ) auxiliary edges added. Every Bellman-Ford step in the algorithm of Theorem 1 run on DDG H ∪ DDG H is performed on a graph of thisform. Hence, it is enough to show how to perform a Bellman-Ford step on X , which can have Ω( b ) edges. Since in a Bellman-Ford step the order of edge relaxations is arbitrary, we can ﬁrst relax alledges of DDG H ∪ DDG H and then all the auxiliary edges. Relaxing the former takes O ( b ) workand O (polylog n ) depth by Lemma 20. The latter can be relaxed within the same bounds naively.Hence, the APSP algorithm of Theorem 1 can be implemented so that it has e O ( b + ( b/d ) ) workand e O ( d ) depth for any d . By choosing d = b / we obtain the desired bounds. Theorem 3.

Let G be a real-weighted planar digraph. There exists a deterministic parallel algo-rithm for negative cycle detection and single-source shortest paths in G with e O ( n ) work and e O ( n / ) depth.Proof. To obtain a parallel negative cycle detection algorithm, we simply replace the Bellman-Fordand Dijkstra steps in the computation for each piece H ∈ T ( G ) in the algorithm of Fakcharoenpholand Rao [20] by a single computation of DDG H from DDG H and DDG H as in Lemma 21. Thework remains e O (cid:16)P H ∈T ( G ) | ∂H | (cid:17) = e O ( n ) , whereas the depth is e O (cid:18) max H ∈T ( G ) | ∂H | / (cid:19) = e O (( √ n ) / ) = e O ( n / ) . In order to solve the single-source shortest paths problem one can use a trick ﬁrst describedby Cohen [17] (and also used in [32]). Denote by H ∗ a complete graph on ∂H ∪ ∂H whose edgeweights represent distances in DDG H ∪ DDG H . Note that H ∗ is precisely the graph that wecompute using our new APSP algorithm to obtain DDG H ⊆ H ∗ . Now, consider the graph G ∗ =  [ non-leaf H ∈T ( G ) H ∗  ∪  [ leaf L ∈T ( G ) L  . Since G ⊆ G ∗ and the edges in G ∗ correspond to paths in G , it is clear that δ G ( s, t ) = δ G ∗ ( s, t ) for all s, t ∈ V ( G ) . However, as proven in [17, 32], the graph G ∗ , although no longer planar, hashop-diameter O (log n ) . In other words, we in fact have δ O (log n ) G ∗ ( s, t ) = δ G ( s, t ) for all s, t ∈ V . As aresult, shortest paths in G from a single source s can be found by running O (log n ) simple-mindedBellman-Ford steps from s on G ∗ . Since | E ( G ∗ ) | = e O (cid:16) n + P H ∈T ( G ) | ∂H | (cid:17) = e O ( n ) , this takes e O ( n ) work and O (polylog n ) depth. Corollary 22.

The following problems on planar directed graphs have nearly work-eﬃcient algo-rithms with e O ( n / ) depth:(1) computing a feasible ﬂow for real-weighted capacities,(2) computing a bipartite perfect matching,(3) computing a maximum s, t -ﬂow for polynomially bounded integral edge capacities, That is, given a vertex demand function b : V → R , a ﬂow f such that the excess e f ( v ) of each vertex v is b ( v ) .

4) ﬁnding a shortest cycle going through each e ∈ E ( G ) ,(5) ﬁnding s, t -replacement paths.Proof. For the feasible ﬂow problem, Miller and Naor [50] gave a duality-based nearly work-eﬃcient, O (polylog n ) -depth reduction to the real-weighted single-source shortest paths problem.Bipartite perfect matching is directly reducible to the feasible ﬂow problem by setting the vertexdemands on one side of the graph to − and on the other side to .The maximum s, t -ﬂow problem can be reduced at the cost of O (log nC ) multiplicative overheadto the feasible ﬂow problem via binary search over the ﬂow value, where the edge capacities are from Z ∩ [0 , C ] . This implies an e O ( n ) work and e O ( n / ) depth algorithm for that problem if C = poly n .In Appendix A we describe how external dense distance graphs, deﬁned and discussed later inthis Section, can be used to compute the shortest cycles through all edges within desired bounds.Finally, Chechik and Nechushtan [16] have recently shown that one can reduce the s, t -replacementpaths problem (using a planarity-preserving near-linear time reduction) to computing the shortestcycles through all edges of a ﬁxed shortest path of a graph. All the reduction does is basically com-puting a shortest s → t path P in G and reversing it. Therefore, by Theorem 3, the reduction canbe performed in a nearly work-eﬃcient manner using e O ( n / ) depth. The ﬁnal step is to computeshortest cycles through all edges of the reversed path P , which can be done by item (4). In this section we assume that each edge e of a planar digraph G is assigned, besides a real weight w ( e ) , a time parameter t ( e ) ∈ R > . Our goal is to compute a directed cycle C ⊆ G minimizing thevalue λ ∗ = P e ∈ C w ( e ) P e ∈ C t ( e ) . Such a cycle is called a minimum cost-to-time ratio cycle , or minimum ratiocycle , in short. In the special case when t ( e ) = 1 for all e ∈ E , C is called a minimum mean cycle . Parametric search.

It is well-known that one can reduce the minimum ratio cycle problem tothe negative cycle detection problem using binary search, as follows. The binary search algorithmmaintains an interval [ λ , λ ] such that λ ∗ ∈ [ λ , λ ] . Given some λ ∈ [ λ , λ ] , we wish to decidewhether λ ∗ < λ or λ ≤ λ ∗ . Note that λ ∗ < λ if and only if there exists a cycle C such that X e ∈ C w ( e ) − λ · t ( e ) < . This condition can be clearly checked by running a negative cycle detection algorithm on thegraph G λ obtained from G by changing the edge weight function to w λ ( e ) := w ( e ) − λt ( e ) . Bypicking λ = ( λ + λ ) / , this also allows us to shrink the interval [ λ , λ ] by half using a singlenegative cycle detection step. If all the edge weights and times are integers whose absolute valuesare bounded by W , the algorithm stops in O (log ( nW )) steps. Since negative cycle detection can besolved in e O ( n ) time for planar graphs, this leads to a weakly polynomial e O ( n log W ) -time algorithmfor the minimum ratio cycle problem.Megiddo’s parametric search technique can be used to convert a strongly polynomial parallel negative cycle detection algorithm into a strongly polynomial time minimum ratio cycle algorithm.All known strongly polynomial algorithms for this problem use variants of this technique.Suppose we have two strongly polynomial negative cycle detection algorithms: (1) a parallelone P with work W ( n, m ) and depth D ( n, m ) , and (2) a sequential one S with running time T ( n, m ) . Additionally, suppose the parallel algorithm operates on edge weights/times (and all thestored values dependent on the weights) only by either additions/subtractions and comparisons.16he idea is, conceptually, to simulate (sequentially) the parallel algorithm P “generically” on allthe possible graphs G λ with λ ∈ [ λ , λ ] (where [ λ , λ ] shrinks in time). This, in particular, meansthat the edge weights of G λ and all the stored values (e.g., the distance labels in Bellman-Fordalgorithm) are linear functions of the form − λ · a + b . Adding and subtracting linear functions canbe done straightforwardly. Moreover, adding or subtracting such linear functions clearly leads tofunctions of the same form. However, the result of a comparison of two values parameterized by λ generally depends on λ , and diﬀerent results of such a comparison might cause diﬀerent ﬂow of P in the future – we would need to handle both branches of the algorithm’s ﬂow if we wanted todetect a negative cycle in all G λ . However, this is not our goal: we only care about locating λ ∗ .So, instead, whenever P performs a comparison − λa + b < − λc + d , we compute the breakpoint x = ( d − b ) / ( c − a ) (for c = a ) and use the sequential negative cycle detection algorithm on G x to test whether λ ∗ < x or λ ∗ ≥ x . This allows us to decide which branch would be chosen for λ = λ ∗ and discard the other branch (i.e., shrink [ λ , λ ] to either [ λ , x ] or [ x, λ ] ). Note that thisalready implies a strongly polynomial bound of O ( W ( n, m ) · T ( n, m )) since P performs O ( W ( n, m )) comparisons.However, one can use the parallelism of P to do better. In each of O ( D ( n, m )) parallelsteps s , P performs some number p s of comparisons, whose results depend on where λ ∗ lies relativelyto some breakpoints λ = x ≤ x < . . . < x p s ≤ x p s +1 = λ , where the sum of p s over all parallelsteps s is O ( W ( n, m )) . Note that the breakpoints can be sorted in e O ( p s ) time. Afterwards, we canﬁnd such x i that λ ∗ ∈ [ x i , x i +1 ] via binary search using O (log ( W ( n, m ))) sequential negative cycledetection runs. Observe that this allows us to choose the correct branch for all the comparisonsin parallel step s at once in just e O ( T ( n, m )) time. Consequently, we obtain a (sequential) stronglypolynomial algorithm with e O ( W ( n, m ) + D ( n, m ) · T ( n, m )) running time. Planar graphs.

Note that all our algorithms (and therefore also the algorithm of Lemma 20)indeed operate on edge weights only by performing additions and comparisons. Hence, by pluggingthe algorithm of Theorem 3 as both the parallel and sequential algorithm into the parametric searchframework, we already obtain an e O ( n / ) -time strongly polynomial minimum ratio cycle algorithm(then, T ( n, m ) , W ( n, m ) ∈ e O ( n ) and D ( n, m ) = e O ( n / ) ).However, we can do better by slightly decreasing the depth of the parallel algorithm of Theorem 3at the cost of increasing its work (we stress that we still stick to using a sequential algorithm with T ( n, m ) = e O ( n ) ). Recall from the proof of Lemma 21 that we could actually achieve depth e O ( d ) within work bounded by e O  n + X H ∈T ( G ) | ∂H | /d  = e O  n + √ nd X H ∈T ( G ) | ∂H |  = e O ( n + n / /d ) . As a result, we can obtain a minimum ratio cycle algorithm that runs in time e O ( dn + n / /d ) forany d . We balance these terms by choosing d = n / and obtain the following theorem. Theorem 4.

Let G be a real-weighed planar graph. There exists an e O ( n / ) -time strongly poly-nomial algorithm for computing a minimum ratio cycle (and thus also a minimum mean cycle) in G . Piecewise dense distance graphs

DDG H for H ∈ T ( G ) have numerous other applications in sequen-tial planar graph algorithms beyond negative cycle detection. Typically, though, they are computed17sing the aforementioned Klein’s MSSP data structure [37] in O (( | H | + | ∂H | ) log n ) time ratherthan inductively in O (cid:16) ( | ∂H | + | ∂H | ) log n log log n (cid:17) time using FR-Dijkstra. However, in some situa-tions, only the latter inductive method can be applied. One such application is computing so-called external dense distance graphs , i.e., the graphs DDG G − H for all H ∈ T ( G ) , where G − H is thegraph obtained from G by removing the vertices V ( H ) \ ∂H . We set ∂ ( G − H ) := ∂H since indeed ∂H contains all vertices that G − H shares with H . One can also argue that if the faces of H containing ∂H are simple and disjoint , ∂H can be assumed to lie on O (1) faces of G − H as well.It is well-known (and also not diﬃcult to prove, see, e.g., [51]) that if H is a sibling of node H ,whereas H is H ’s parent in T ( G ) , then DDG G − H can be obtained by computing all-pairs shortestpaths on DDG G − H ∪ DDG H Using FR-Dijkstra (Lemma 19), this takes O (cid:16) ( | ∂H | + | ∂H | ) log n log log n (cid:17) = O (cid:16) ( | ∂H | + | ∂H | ) log n log log n (cid:17) time and thus O (cid:16) n log n log log n (cid:17) time through all pieces H ∈ T ( G ) . Onthe other hand, using MSSP to compute all the external dense distance graphs is very ineﬃcient,since clearly P H ∈T ( G ) | V ( G − H ) | = Ω( n ) . In fact, the inductive FR-Dijkstra-based approach hasbeen so far the only known way to compute external dense distance graphs eﬃciently. ExternalDDGs alone can be used to compute, for example, shortest cycles through all edges of the graph(see Appendix A). They have also proved useful in obtaining very eﬃcient algorithms for maximumﬂow and minimum cut related problems [9, 46]. Moreover, the computation of external dense dis-tance graphs is the bottleneck of construction algorithms for, e.g., so-called cycle-MSSP [51] datastructure, or distance oracles supporting failed vertices [15].The below lemma shows that by combining the sequential version of our APSP algorithm withFR-Dijkstra we can obtain faster – by a factor of almost Θ(log n ) – algorithm for inductivelycomputing dense distance graphs. For simplicity, we focus on computing DDG H from DDG H ∪ DDG H , as the algorithm for computing DDG G − H (as argued above) is identical. Lemma 23.

Let H , H be the children of node H ∈ T ( G ) . Let b = | ∂H | + | ∂H | . We can computeall-pairs shortest paths in DDG H ∪ DDG H in O ( b log n · log log n · α ( n )) sequential time. If theproblem is infeasible, the algorithm ﬁnds a negative cycle within the same bounds.Proof. By Lemma 18, in O ( b · α ( n )) time we can either ﬁnd a negative cycle in DDG H ∪ DDG H or compute a feasible price function p on that graph. The price function allows us to make theweights in each DDG H i non-negative. Now we use the sequential version of the algorithm ofTheorem 1 with two changes. First, we use a combination of Lemma 18 and O ( b ) naive relaxationsfor each Bellman-Ford step performed as we did in the proof of Lemma 21. Even more importantly,we replace the O (( n/d ) log n ) time Floyd-Warshall-based APSP computation of shortest pathsbetween all s, t ∈ H d with | H d | invocations of FR-Dijkstra, as given in Lemma 19. As a result, therunning time of the APSP algorithm becomes O (cid:18) b · ( b · α ( n )) · log n log d + (cid:18) bd log n (cid:19) · b log n (cid:19) = O (cid:18) b log n · (cid:18) α ( n ) log d + log nd (cid:19)(cid:19) . We obtain the desired bound by choosing d = log n .Lemma 23 easily implies the following lemma. Lemma 5.

The external dense distance graphs

DDG G − H for all H ∈ T ( G ) can be computed in O ( n log n · log log n · α ( n )) time. Dealing with non-simple or non-disjoint faces is merely a tedious technical diﬃculty (see e.g., [14, 27, 31]) – theycan be avoided by suitably extending the input graph along with its decomposition. eferences [1] Pankaj K. Agarwal, Micha Sharir, and Sivan Toledo. Applications of parametric searching ingeometric optimization. J. Algorithms , 17(3):292–318, 1994. doi:10.1006/jagm.1994.1038 .[2] Alok Aggarwal, Dina Kravets, James K. Park, and Sandeep Sen. Parallel searching in gener-alized monge arrays.

Algorithmica , 19(3):291–317, 1997. doi:10.1007/PL00009175 .[3] Alexandr Andoni, Cliﬀord Stein, and Peilin Zhong. Parallel approximate undirected shortestpaths via low hop emulators. In

Proccedings of the 52nd Annual ACM SIGACT Symposiumon Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020 , pages 322–335,2020. doi:10.1145/3357713.3384321 .[4] D. A. Bader and K. Madduri. Parallel algorithms for evaluating centrality indices in real-worldnetworks. In , pages 539–550,2006.[5] Surender Baswana, Ramesh Hariharan, and Sandeep Sen. Improved decremental algorithmsfor maintaining transitive closure and all-pairs shortest paths.

J. Algorithms , 62(2):74–92,2007. doi:10.1016/j.jalgor.2004.08.004 .[6] Bonnie Berger, John Rompel, and Peter W. Shor. Eﬃcient NC algorithms for set coverwith applications to learning and geometry.

J. Comput. Syst. Sci. , 49(3):454–477, 1994. doi:10.1016/S0022-0000(05)80068-6 .[7] Aaron Bernstein. Maintaining shortest paths under deletions in weighted directed graphs.

SIAM J. Comput. , 45(2):548–574, 2016. doi:10.1137/130938670 .[8] Guy E. Blelloch, Yan Gu, Yihan Sun, and Kanat Tangwongsan. Parallel shortest paths usingradius stepping. In

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms andArchitectures, SPAA 2016, Asilomar State Beach/Paciﬁc Grove, CA, USA, July 11-13, 2016 ,pages 443–454, 2016. doi:10.1145/2935764.2935765 .[9] Glencora Borradaile, Piotr Sankowski, and Christian Wulﬀ-Nilsen. Min st -cut oracle for planargraphs with near-linear preprocessing time. ACM Trans. Algorithms , 11(3):16:1–16:29, 2015. doi:10.1145/2684068 .[10] Richard P. Brent. The parallel evaluation of general arithmetic expressions.

J. ACM , 21(2):201–206, 1974. doi:10.1145/321812.321815 .[11] Karl Bringmann, Thomas Dueholm Hansen, and Sebastian Krinninger. Improved algorithmsfor computing the cycle of minimum cost-to-time ratio in directed graphs. In , pages 124:1–124:16, 2017. doi:10.4230/LIPIcs.ICALP.2017.124 .[12] Gerth Stølting Brodal, Jesper Larsson Tr¨aﬀ, and Christos D. Zaroliagis. A parallel prior-ity queue with constant time operations.

J. Parallel Distributed Comput. , 49(1):4–21, 1998. doi:10.1006/jpdc.1998.1425 .[13] Nairen Cao, Jeremy T. Fineman, and Katina Russell. Eﬃcient construction of directed hopsetsand parallel approximate shortest paths. In Konstantin Makarychev, Yury Makarychev, Mad-hur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors,

Proccedings of the 52nd Annual CM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June22-26, 2020 , pages 336–349. ACM, 2020. doi:10.1145/3357713.3384270 .[14] Panagiotis Charalampopoulos, Pawel Gawrychowski, Shay Mozes, and Oren Weimann. Al-most optimal distance oracles for planar graphs. In

Proceedings of the 51st AnnualACM SIGACT Symposium on Theory of Computing, STOC 2019 , pages 138–151, 2019. doi:10.1145/3313276.3316316 .[15] Panagiotis Charalampopoulos, Shay Mozes, and Benjamin Tebeka. Exact distance or-acles for planar graphs with failing vertices. In

Proceedings of the Thirtieth AnnualACM-SIAM Symposium on Discrete Algorithms, SODA 2019 , pages 2110–2123, 2019. doi:10.1137/1.9781611975482.127 .[16] Shiri Chechik and Moran Nechushtan. Simplifying and unifying replacement paths algorithmsin weighted directed graphs. In ,pages 29:1–29:12, 2020. doi:10.4230/LIPIcs.ICALP.2020.29 .[17] Edith Cohen. Eﬃcient parallel shortest-paths in digraphs with a separator decomposition.

J.Algorithms , 21(2):331–357, 1996. doi:10.1006/jagm.1996.0048 .[18] Edith Cohen. Using selective path-doubling for parallel shortest-path computations.

J. Algo-rithms , 22(1):30–56, 1997. doi:10.1006/jagm.1996.0813 .[19] Edith Cohen. Polylog-time and near-linear work approximation scheme for undirected shortestpaths.

J. ACM , 47(1):132–166, 2000. doi:10.1145/331605.331610 .[20] Jittat Fakcharoenphol and Satish Rao. Planar graphs, negative weight edges, short-est paths, and near linear time.

J. Comput. Syst. Sci. , 72(5):868–889, 2006. doi:10.1016/j.jcss.2005.05.007 .[21] Jeremy T. Fineman. Nearly work-eﬃcient parallel algorithm for digraph reachability. In IliasDiakonikolas, David Kempe, and Monika Henzinger, editors,

Proceedings of the 50th AnnualACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA,June 25-29, 2018 , pages 457–470. ACM, 2018. doi:10.1145/3188745.3188926 .[22] Sebastian Forster and Danupon Nanongkai. A faster distributed single-source shortest pathsalgorithm. In Mikkel Thorup, editor, , pages 686–697. IEEE ComputerSociety, 2018. doi:10.1109/FOCS.2018.00071 .[23] Pawel Gawrychowski and Adam Karczmarz. Improved bounds for shortest paths in densedistance graphs. In , pages 61:1–61:15, 2018. doi:10.4230/LIPIcs.ICALP.2018.61 .[24] Andrew V. Goldberg, Serge A. Plotkin, David B. Shmoys, and ´Eva Tardos. Using interior-point methods for fast parallel algorithms for bipartite matching and related problems.

SIAMJ. Comput. , 21(1):140–150, February 1992. doi:10.1137/0221011 .[25] Y. Han, V. Pan, and John Reif. Eﬃcient parallel algorithms for computing all pair shortestpaths in directed graphs. In

Proceedings of the Fourth Annual ACM Symposium on ParallelAlgorithms and Architectures , SPAA ’92, page 353–362, New York, NY, USA, 1992. Associationfor Computing Machinery. doi:10.1145/140901.141913 .2026] Mark Hartmann and James B. Orlin. Finding minimum cost to time ratio cycles with smallintegral transit times.

Networks , 23(6):567–574, 1993. doi:10.1002/net.3230230607 .[27] Giuseppe F. Italiano, Adam Karczmarz, Jakub Lacki, and Piotr Sankowski. Decremen-tal single-source reachability in planar digraphs. In

Proceedings of the 49th Annual ACMSIGACT Symposium on Theory of Computing, STOC 2017 , pages 1108–1121. ACM, 2017. doi:10.1145/3055399.3055480 .[28] Joseph J´aJ´a.

An Introduction to Parallel Algorithms . Addison-Wesley, 1992.[29] Ian Jermyn and Hiroshi Ishikawa. Globally optimal regions and boundaries as minimumratio weight cycles.

IEEE Trans. Pattern Anal. Mach. Intell. , 23(10):1075–1088, 2001. doi:10.1109/34.954599 .[30] Donald B. Johnson. Eﬃcient algorithms for shortest paths in sparse networks.

J. ACM ,24(1):1–13, 1977. doi:10.1145/321992.321993 .[31] Haim Kaplan, Shay Mozes, Yahav Nussbaum, and Micha Sharir. Submatrix maximum queriesin monge matrices and partial monge matrices, and their applications.

ACM Trans. Algorithms ,13(2):26:1–26:42, 2017. doi:10.1145/3039873 .[32] Adam Karczmarz. Decremental transitive closure and shortest paths for planar digraphs andbeyond. In

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA 2018 , pages 73–92. SIAM, 2018. doi:10.1137/1.9781611975031.5 .[33] Adam Karczmarz and Jakub Łącki. Reliable hubs for partially-dynamic all-pairs shortestpaths in directed graphs. In ,pages 65:1–65:15, 2019. doi:10.4230/LIPIcs.ESA.2019.65 .[34] Richard M. Karp. A characterization of the minimum cycle mean in a digraph.

Discret. Math. ,23(3):309–311, 1978. doi:10.1016/0012-365X(78)90011-0 .[35] Valerie King. Fully dynamic algorithms for maintaining all-pairs shortest paths andtransitive closure in digraphs. In , pages 81–91, 1999. doi:10.1109/SFFCS.1999.814580 .[36] Maria M. Klawe and Daniel J. Kleitman. An almost linear time algorithm for generalizedmatrix searching.

SIAM J. Discret. Math. , 3(1):81–97, 1990. doi:10.1137/0403009 .[37] Philip N. Klein. Multiple-source shortest paths in planar graphs. In

Proceedings of the SixteenthAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005 , pages 146–155, 2005.[38] Philip N. Klein, Shay Mozes, and Christian Sommer. Structured recursive separator decompo-sitions for planar graphs in linear time. In

Symposium on Theory of Computing Conference,STOC’13 , pages 505–514, 2013. doi:10.1145/2488608.2488672 .[39] Philip N. Klein, Shay Mozes, and Oren Weimann. Shortest paths in directed planar graphswith negative lengths: A linear-space O ( n log n )-time algorithm. ACM Trans. Algorithms ,6(2):30:1–30:18, 2010. doi:10.1145/1721837.1721846 .2140] Philip N. Klein and Sairam Subramanian. A linear-processor polylog-time algorithm forshortest paths in planar graphs. In , pages 259–270, 1993. doi:10.1109/SFCS.1993.366861 .[41] Philip N. Klein and Sairam Subramanian. A randomized parallel algorithm for single-sourceshortest paths.

J. Algorithms , 25(2):205–220, 1997. doi:10.1006/jagm.1997.0888 .[42] Eugene L. Lawler. Optimal cycles in doubly weighted linear graphs. In

Theory of Graphs:International Symposium , pages 209–213, 1966.[43] Jason Li. Faster parallel algorithm for approximate shortest path. In

Proccedings of the 52ndAnnual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA,June 22-26, 2020 , pages 308–321, 2020. doi:10.1145/3357713.3384268 .[44] Richard J. Lipton and Robert Endre Tarjan. Applications of a planar separator theorem.

SIAM J. Comput. , 9(3):615–627, 1980. doi:10.1137/0209046 .[45] Yang P. Liu, Arun Jambulapati, and Aaron Sidford. Parallel reachability in almost linearwork and square root depth. In David Zuckerman, editor, , pages 1664–1686. IEEE Computer Society, 2019. doi:10.1109/FOCS.2019.00098 .[46] Jakub Łącki and Yahav Nussbaum and Piotr Sankowski and Christian Wulﬀ-Nilsen. Singlesource - all sinks max ﬂows in planar digraphs. In ,pages 599–608, 2012. doi:10.1109/FOCS.2012.66 .[47] Meena Mahajan and Kasturi R. Varadarajan. A new nc-algorithm for ﬁnding a perfect match-ing in bipartite planar and small genus graphs (extended abstract). In

Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing , STOC ’00, page 351–357, New York,NY, USA, 2000. Association for Computing Machinery. doi:10.1145/335305.335346 .[48] Nimrod Megiddo. Applying parallel computation algorithms in the design of serial algorithms.

J. ACM , 30(4):852–865, 1983. doi:10.1145/2157.322410 .[49] Gary L. Miller. Finding small simple cycle separators for 2-connected planar graphs.

J. Comput.Syst. Sci. , 32(3):265–279, 1986. doi:10.1016/0022-0000(86)90030-9 .[50] Gary L. Miller and Joseph Naor. Flow in planar graphs with multiple sources and sinks.

SIAMJ. Comput. , 24(5):1002–1017, 1995. doi:10.1137/S0097539789162997 .[51] Shay Mozes and Christian Sommer. Exact distance oracles for planar graphs. In

Proceedings ofthe Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012 , pages209–222, 2012. doi:10.1137/1.9781611973099.19 .[52] Shay Mozes and Christian Wulﬀ-Nilsen. Shortest paths in planar graphs with real lengthsin O ( n log n /loglog n ) time. In Algorithms - ESA 2010, 18th Annual European Symposium.Proceedings, Part II , pages 206–217, 2010. doi:10.1007/978-3-642-15781-3_18 .[53] James B. Orlin, K. Subramani, and Piotr J. Wojciechowski. Randomized algorithms for ﬁnd-ing the shortest negative cost cycle in networks.

Discret. Appl. Math. , 236:387–394, 2018. doi:10.1016/j.dam.2017.10.011 . 2254] Seth Pettie. A new approach to all-pairs shortest paths on real-weighted graphs.

Theor.Comput. Sci. , 312(1):47–74, 2004. doi:10.1016/S0304-3975(03)00402-X .[55] Hanmao Shi and Thomas H. Spencer. Time-work tradeoﬀs of the single-source shortest pathsproblem.

J. Algorithms , 30(1):19–32, 1999. doi:10.1006/jagm.1998.0968 .[56] Thomas H. Spencer. Time-work tradeoﬀs for parallel algorithms.

J. ACM , 44(5):742–778,1997. doi:10.1145/265910.265923 .[57] K. Subramani. Optimal length resolution refutations of diﬀerence constraint systems.

J.Autom. Reasoning , 43(2):121–137, 2009. doi:10.1007/s10817-009-9139-4 .[58] K. Subramani, Matthew D. Williamson, and Xiaofeng Gu. Improved algorithms for optimallength resolution refutation in diﬀerence constraint systems.

Formal Asp. Comput. , 25(2):319–341, 2013. doi:10.1007/s00165-011-0186-3 .[59] ´Eva Tardos. A strongly polynomial minimum cost circulation algorithm.

Combinatorica ,5(3):247–256, 1985. doi:10.1007/BF02579369 .[60] Jeﬀrey D. Ullman and Mihalis Yannakakis. High-probability parallel transitive-closure algo-rithms.

SIAM J. Comput. , 20(1):100–125, 1991. doi:10.1137/0220006 .[61] Olga Veksler. Stereo correspondence with compact windows via minimum ra-tio cycle.

IEEE Trans. Pattern Anal. Mach. Intell. , 24(12):1654–1660, 2002. doi:10.1109/TPAMI.2002.1114859 .[62] Song Wang and Jeﬀrey Mark Siskind. Image segmentation with minimum mean cut. In

Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Van-couver, British Columbia, Canada, July 7-14, 2001 - Volume 1 , pages 517–524, 2001. doi:10.1109/ICCV.2001.10090 .[63] Song Wang and Jeﬀrey Mark Siskind. Image segmentation with ratio cut.

IEEE Trans. PatternAnal. Mach. Intell. , 25(6):675–690, 2003. doi:10.1109/TPAMI.2003.1201819 .[64] Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path,matrix, and triangle problems.

J. ACM , 65(5):27:1–27:38, 2018. doi:10.1145/3186893 .[65] Neal E. Young, Robert Endre Tarjan, and James B. Orlin. Faster parametric shortest path andminimum-balance algorithms.

Networks , 21(2):205–221, 1991. doi:10.1002/net.3230210206 .[66] Uri Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication.

J. ACM , 49(3):289–317, 2002. doi:10.1145/567112.567114 .23

Computing All-Edges Shortest Cycles

The following corollary follows easily by Lemma 21 and extending the algorithm behind Theorem 3to also compute external DDGs.

Corollary 24.

All external dense distance graphs

DDG G − H can be computed in parallel using e O ( n ) work and e O ( n / ) depth. This allows us to prove the next lemma.

Lemma 25.

Let G be a real-weighted planar digraph with no negative cycles. Then, one cancompute for each e ∈ E ( G ) the shortest cycle going through e : • sequentially in O ( n log n · log log n · α ( n )) time, • in parallel using e O ( n ) work and e O ( n / ) depth.Proof. Let T ( G ) be the recursive decomposition of G . By Lemma 5, we can compute the externaldistance graphs DDG G − H for all H ∈ T ( G ) in O ( n log n · log log n · α ( n )) time. By Corollary 24,the same task can be completed within e O ( n ) work and e O ( n / ) depth.Let L e be any leaf node L e ∈ T ( G ) containing e = uv . The length of the shortest cycle goingthrough e is clearly δ G ( v, u )+ w ( e ) . So we need to compute δ G ( v, u ) . Since ∂L e = V ( L e ) , u, v ∈ ∂L e .Consider the graph G e = DDG G − L e ∪ L e . We show that δ G ( v, u ) = δ G e ( v, u ) . Indeed, considera shortest v → u path P = P . . . P k in G such that each P i is either a maximal subpath entirelycontained in G − L e , or a single edge in L e . Observe that if x i → y i = P i ⊆ G − L e , then bymaximality and u, v ∈ ∂L e , we have x i , y i ∈ ∂L e . As a result, the weight of an edge x i y i in G e is atmost δ G − L e ( v, u ) by the deﬁnition of DDG G − L e . On the other hand, if x i y i is a single edge in L e ,then it is also preserved in G e . As a result, a path of length ℓ ( P ) can be found in G e .Since each G e has O (1) size, computing shortest paths in all G e takes linear extra time. Asthe computations for each G e are independent, they can be parallelized within e O ( n ) work and O (polylog n ))