[PDF] Constructing a Distance Sensitivity Oracle in O(n^{2.5794}M) Time

Abstract

We continue the study of distance sensitivity oracles (DSOs). Given a directed graph G with n vertices and edge weights in \{1, 2, \dots, M\}, we want to build a data structure such that given any source vertex u, any target vertex v, and any failure f (which is either a vertex or an edge), it outputs the length of the shortest path from u to v not going through f. Our main result is a DSO with preprocessing time O(n^{2.5794}M) and constant query time. Previously, the best preprocessing time of DSOs for directed graphs is O(n^{2.7233}M), and even in the easier case of undirected graphs, the best preprocessing time is O(n^{2.6865}M) [Ren, ESA 2020]. One drawback of our DSOs, though, is that it only supports distance queries but not path queries. Our main technical ingredient is an algorithm that computes the inverse of a degree-d polynomial matrix (i.e. a matrix whose entries are degree-d univariate polynomials) modulo x^r. The algorithm is adapted from [Zhou, Labahn, and Storjohann, Journal of Complexity, 2015], and we replace some of its intermediate steps with faster rectangular matrix multiplication algorithms. We also show how to compute unique shortest paths in a directed graph with edge weights in \{1, 2, \dots, M\}, in O(n^{2.5286}M) time. This algorithm is crucial in the preprocessing algorithm of our DSO. Our solution improves the O(n^{2.6865}M) time bound in [Ren, ESA 2020], and matches the current best time bound for computing all-pairs shortest paths.

Full PDF

aa r X i v : . [ c s . D S ] F e b Constructing a Distance Sensitivity Oracle in O ( n . M ) Time

Yong Gu *1 and Hanlin Ren †11 Institute for Interdisciplinary Information Sciences, Tsinghua UniversityFebruary 18, 2021

Abstract

We continue the study of distance sensitivity oracles (DSOs). Given a directed graph G with n vertices and edge weights in { , , . . . , M } , we want to build a data structure such that given anysource vertex u , any target vertex v , and any failure f (which is either a vertex or an edge), it outputsthe length of the shortest path from u to v not going through f . Our main result is a DSO withpreprocessing time O ( n . M ) and constant query time. Previously, the best preprocessing timeof DSOs for directed graphs is O ( n . M ) , and even in the easier case of undirected graphs, thebest preprocessing time is O ( n . M ) [Ren, ESA 2020]. One drawback of our DSOs, though, isthat it only supports distance queries but not path queries.Our main technical ingredient is an algorithm that computes the inverse of a degree- d polynomialmatrix (i.e. a matrix whose entries are degree- d univariate polynomials) modulo x r . The algorithmis adapted from [Zhou, Labahn and Storjohann, Journal of Complexity , 2015], and we replace someof its intermediate steps with faster rectangular matrix multiplication algorithms.We also show how to compute unique shortest paths in a directed graph with edge weights in { , , . . . , M } , in O ( n . M ) time. This algorithm is crucial in the preprocessing algorithm ofour DSO. Our solution improves the O ( n . M ) time bound in [Ren, ESA 2020], and matches thecurrent best time bound for computing all-pairs shortest paths. In this paper, we consider the problem of constructing a distance sensitivity oracle (DSO). A DSO is adata structure that preprocesses a directed graph G = ( V, E ) with n vertices and m edges, and supportsqueries of the following form: Given a source vertex u , a target vertex v , and a failure f (which can beeither a vertex or an edge), output the length of the shortest path from u to v that does not go through f .One motivation for constructing DSOs is the fact that real-life networks often suffer from failures.Consider a communication network among n servers. When a server u wants to send a message toanother server v , the most efﬁcient way would be to send the message along the shortest path from u to v . However, if a failure happens in a server or a link between two servers, we would need to recomputethe shortest path with the failure taken into account. It may be too slow to compute the shortest pathfrom scratch each time a failure happens. A better solution is to construct a DSO for the communicationnetwork, and invoke the query algorithm of the DSO whenever a failure happens. The problem of constructing DSOs has received a lot of attention in the literature. A naïve solutionis to precompute the answers for every possible query ( u, v, f ) , but it requires Ω( n m ) space to store * [email protected] . † [email protected] . O ( n log n ) space that answers a queryin constant time. However, the preprocessing time of the DSO in [DTCR08] is O ( mn + n log n ) ,which is inefﬁcient for large networks. Subsequently, Bernstein and Karger improved the preprocessingtime to ˜ O ( n √ m ) [BK08], and ﬁnally ˜ O ( mn ) [BK09]. The preprocessing time ˜ O ( mn ) matches thecurrent best time bound for the easier problem of computing all-pairs shortest paths (APSP), and itis conjectured that APSP requires mn − o (1) time [LWW18]. In this sense, the ˜ O ( mn ) time bound of[BK09] is optimal. Duan and Zhang [DZ17] improved the space complexity of the DSO to O ( n ) ,eliminating the last log n factor, while preserving constant query time and ˜ O ( mn ) preprocessing time.However, for dense graphs (i.e. m = Θ( n ) ) with edge weights in [ − M, M ] , it is possible to com-pute APSP in time faster than ˜ O ( mn ) = ˜ O ( n ) . The best APSP algorithm for undirected graphs runs in ˜ O ( n ω M ) time [Sei95,SZ99], and the best APSP algorithm for directed graphs runs in O ( n . M ) time[AGM97, Zwi02]. (Here ω < . is the exponent of matrix multiplication [AW21].) Therefore,it is natural to ask whether one can beat ˜ O ( n ) preprocessing time for DSOs in this regime.The answer turned out to be yes . Weimann and Yuster [WY13] showed that for any constant < α < , there is a DSO with ˜ O ( n − α + ω M ) preprocessing time and ˜ O ( n α ) query time. Sub-sequently, Grandoni and Williams [GW20] showed that for any constant < α < , there is a DSOwith ˜ O ( n ω +1 / M + n ω + α (4 − ω ) M ) preprocessing time and ˜ O ( n − α ) query time. Recently, Chechikand Cohen [CC20] constructed the ﬁrst DSO that achieves both sub-cubic ( O ( n . M ) ) preprocessingtime and poly-logarithmic query time simultaneously. For the case that edge weights are positive, Ren[Ren20] improved the previous results by presenting a much simpler DSO with ˜ O ( n . M ) prepro-cessing time and constant query time.Note that most DSOs mentioned above are randomized. Recently, there are also some efforts onderandomizing these DSOs, see e.g. [ACC19, SP21]. Our main result is an improved randomized DSO for directed graphs with integer edge weights in [1 , M ] .In particular, our DSO has preprocessing time O ( n . M ) and constant query time. Theorem 1.1 (main) . Given as input a directed graph G = ( V, E ) with edge weights in { , , . . . , M } ,we can construct a DSO with O ( n . M ) preprocessing time and constant query time.Remark . Our preprocessing algorithm uses fast rectangular matrix multiplication algorithms. Toexpress our time bound as a function of ω , we could also simulate rectangular matrix multiplications bysquare matrix multiplications, e.g. multiply an n × m matrix and an m × n matrix by ⌈ m/n ⌉ square matrixmultiplications of dimension n . In this case, the preprocessing time becomes ˜ O ( n / (4 − ω ) M )

Inverting a polynomial matrix modulo x r . Let r be an integer parameter, and F be a polynomialmatrix of degree d (i.e. each entry of F is a degree- d polynomial over some formal variable x ) that is ˜ O hides polylog( n ) factors. There are three previous DSOs with both sub-cubic preprocessing time and constant query time: [GW20], [CC20], and[Ren20]. (The query time of the ﬁrst two DSOs can be brought down to constant using Observation 2.1 of [Ren20]. In the caseof [GW20], this increases the preprocessing time by an additive factor of ˜ O ( n − α ) .) Even when ω = 2 , the preprocessingtime bounds of these DSOs are ˜ O ( n / ) (setting α appropriately), ˜ O ( n / ) , and ˜ O ( n / ) respectively. F − mod x r in time ˜ O ( dn ω ) + ( r /d ) · MM ( n, nd/r, nd/r ) · n o (1) . (That is, we only preserve the monomials in F − with degrees at most r − .) Here, MM ( n , n , n ) isthe time complexity of multiplying an n × n matrix and an n × n matrix.It is shown in [ZLS15] that we can compute the full F − (instead of F − mod x r ) in ˜ O ( n d ) time. We examine their algorithm carefully, and adapt it to our case where we only want to compute F − mod x r . We modulo each polynomial in the intermediate steps of the algorithm by x r , and use fastrectangular matrix multiplication to speed up the algorithm. Theorem 1.4.

Our DSO needs to invoke [Ren20, Observation 2.1] (seealso [BK09]), which needs a consistent set of (incoming and outgoing) shortest path trees rooted at eachvertex. Here, by consistent , we mean that for every pair of vertices u, v , and any two trees T and T , if u can reach v in both T and T , then the u v paths in T and T are the same path. In other words,we want to specify a unique shortest path between each pair of vertices, such that for every vertex v , theshortest paths starting from v (or ending at v , respectively) form a tree.Note that this problem is quite nontrivial in small-weighted graphs. There may be many shortestpaths between two vertices, and it is not obvious how to pick one shortest path for each vertex pair,while guaranteeing consistency. Also, we cannot randomly perturb the edge weights by small values, asthat would brak the property that edge weights are small integers. It is also unclear how to construct sucha set of shortest path trees from the APSP algorithm in [Zwi02]. Previously, [Ren20] observed that analgorithm in [DP09b] can be used to compute such shortest path trees in ˜ O ( n (3+ ω ) / M ) ≤ O ( n . M ) time; unfortunately, this time bound is worse than our claimed time bound O ( n . M ) in Theorem 1.1.In this paper, we show how to construct consistent shortest paths trees in O ( n . M ) time, match-ing the currently best time bound for APSP [Zwi02]. Theorem 1.5 (informal version) . Given a directed graph G = ( V, E ) with edge weights in { , , . . . , M } ,we can compute a set of incoming and outgoing shortest path trees rooted at each vertex that are con-sistent, in O ( n . M ) time. ˜ O ( n (3+ ω ) / M ) Preprocessing Time

Actually, the ideas in [vdBS19] of maintaining the adjoint of the symbolic adjacency matrix (see Sec-tion 3), together with ideas in [Ren20], already give us a DSO with ˜ O ( n (3+ ω ) / M ) preprocessing timeand constant query time. As a warm-up, we brieﬂy describe this DSO before we proceed into the detailsof Theorem 1.1.An r -truncated DSO [Ren20] is a DSO that only needs to be correct for the queries ( u, v, f ) whoseanswer (i.e. length of the corresponding shortest path) is at most r . If the answer is greater than r ,it should return r instead. In what follows, we will describe how to construct an r -truncated DSO in ˜ O ( rn ω ) preprocessing time and ˜ O ( r ) query time. Using techniques in [Ren20] (see also Section 3.3),this implies a DSO with ˜ O ( n (3+ ω ) / M ) preprocessing time and constant query time.Let F be a sufﬁciently large ﬁnite ﬁeld, and A be the following matrix. For every vertices u, v , ifthere is an edge from u to v with weight l , then let A u,v = a u,v x l , where a u,v is a random elementin F , and x is an indeterminate. Furthermore, for every vertex v , let A v,v = 1 . It is well-known[San05] that with high probability over the choices of a u,v , the adjoint matrix of A encodes the shortestpath information of the input graph, as follows. Let adj( A ) be the adjoint matrix of A , and u, v be3wo vertices, then the lowest degree of adj( A ) u,v is exactly the distance from u to v . For example, if adj( A ) u,v = 7 x + 6 x − x , then the distance from u to v is .A big advantage of the adjoint matrix is that it is easy to perform low-rank updates, by the Sherman-Morrison-Woodbury formula (see Theorem 3.2). Given a matrix A , its adjoint adj( A ) , and a low-rankmatrix B , we can compute a speciﬁc element of adj( A + B ) u,v , in time much faster than brute force.Therefore, we answer a query ( u, v, f ) as follows: We ﬁrst express the failure as a rank-one matrix F , such that A + F is the matrix corresponding to the graph with f removed. Then we can compute adj( A + F ) u,v quickly. Given this element (a polynomial over F ), we can easily compute the answer tothe query.What is the complexity of this DSO? Recall that we only want to construct an r -truncated DSO,so we can modulo every entry in the process of computing adj( A ) by the polynomial x r . Every arith-metic operation in the commutative ring F [ x ] /x r only takes ˜ O ( r ) time. Computing the adjoint of amatrix reduces to inverting that matrix, which takes ˜ O ( n ω ) arithmetic operations [BH74]. Therefore ittakes ˜ O ( rn ω ) time to compute adj( A ) mod x r . A close inspection of the Sherman-Morrison-Woodburyformula shows that each query can be completed in O (1) arithmetic operations, i.e. ˜ O ( r ) time.The ˜ O ( rn ω ) -time algorithm for inverting a polynomial matrix modulo x r is not optimal; the timebound in Theorem 1.4 is better. In Section 4, we use fast rectangular matrix multiplication algorithmsto speed up the algorithm in [ZLS15], obtaining a faster algorithm for inverting polynomial matricesmodulo x r . In this paper, we say an event happens with high probability (w.h.p.) if it happens with probability atleast − /n c , for a constant c that can be made arbitrarily large. Our DSOs (or r -truncated DSOs)will have a randomized preprocessing algorithm and a deterministic query algorithm. We say a DSO is correct with high probability if w.h.p. over its (randomized) preprocessing algorithm, it answers everypossible query ( u, v, f ) correctly. Notation.

We use the following notation in [DP09a, Ren20].• Let p be a path, we use | p | to denote the number of edges in p , and use k p k to denote the length of p (i.e. total weight of edges in p ).• Let u, v be two vertices, we deﬁne k uv k as the length of the shortest path from u to v . Furthermore,let f be a failure (which is either an edge or a vertex), we deﬁne k uv ⋄ f k as the length of theshortest path from u to v that does not go through f .• Let u, v be two vertices, we deﬁne | uv | as the number of edges in the shortest path from u to v .In the case that there are many shortest paths from u to v , it turns out that the following deﬁnitionwill be convenient in Section 5: We deﬁne | uv | as the largest number of edges in any shortest pathfrom u to v . Fast matrix multiplication.

Let ω be the exponent of matrix multiplication; the current best upperbound is ω ≤ . [AW21]. For positive integers n , n , n , let MM ( n , n , n ) denote theminimum number of arithmetic operations needed to multiply an n × n matrix and an n × n matrix.We deﬁne ω ( a, b, c ) to be the exponent of multiplying an n a × n b matrix and an n b × n c matrix, i.e. ω ( a, b, c ) = inf { w : MM ( n a , n b , n c ) = O ( n w ) } . It is a classical result that ω (1 , , λ ) = ω (1 , λ,

1) = ω ( λ, , for any real number λ > [LR83]; wedenote ω ( λ ) = ω (1 , , λ ) .We will need the following lemmas about the exponent of rectangular matrix multiplication. Forcompleteness, we include proofs for these lemmas in Appendix A.4 emma 2.1. Let a, b, c, r be positive real numbers, then r + ω ( a, b, c ) ≤ ω ( a, b + r, c + r ) . Lemma 2.2.

Consider the function f ( τ ) = ω (1 , − τ, − τ ) , where τ ∈ [0 , . Then τ + f ( τ ) ismonotonically non-increasing in τ , and τ + f ( τ ) is monotonically non-decreasing in τ . Polynomial operations.

Let p, q ∈ F [ x ] be two polynomials of degree d . It is easy to compute p + q or p − q in O ( d ) ﬁeld operations. We can also compute p · q in ˜ O ( d ) ﬁeld operations using fast Fouriertransform. (Here, ˜ O hides polylog( d ) factors.) When p is invertible, it is also possible to compute p − mod x d in ˜ O ( d ) ﬁeld operations [AHU74, Section 8.3]. O ( n . M ) Time

In this section, we show how to preprocess a distance sensitivity oracle in O ( n . M ) time, such thatevery query can be answered in constant time. Our preprocessing algorithm is randomized; with highprobability over the preprocessing algorithm, the query algorithm always returns the correct answer. First, our preprocessing algorithm will use the following algorithm for inverting a polynomial matrix. Adetailed description of this algorithm will be given in Section 4.

Theorem 1.4.

5e need the following theorem that allows us to maintain the adjoint of a matrix under rank- queries. Theorem 3.2.

Let R be an arbitrary commutative ring, A ∈ R n × n be an invertible matrix, u , v ∈ R n be column vectors, and γ = 1 + v T A − u . Suppose γ is invertible, then A + uv T is also invertible, and adj( A + uv T ) = det( A )( γ A − − ( A − uv T A − )) . Proof Sketch.

By the matrix determinant lemma, we have det( A + uv T ) = γ · det( A ) . Since γ is invertible, we can use the Sherman-Morrison-Woodbury formula [SM50, Woo50]: ( A + uv T ) − = A − − γ − ( A − uv T A − ) . The theorem is proved by multiplying the above two formulas together.We need the Schwartz-Zippel lemma that guarantees the correctness of our randomized algorithm.

Theorem 3.3 (Schwartz-Zippel Lemma, [Sch80, Zip79]) . Let p ( x , x , . . . , x m ) be a non-zero polyno-mial of (total) degree d over a ﬁeld F . Let S be a ﬁnite subset of F , and r , r , . . . , r m be independentlyand uniformly sampled from S . Then Pr[ p ( r , r , . . . , r m ) = 0] ≤ d | S | . We also need the following algorithm that computes the determinant of a polynomial matrix.

Theorem 3.4 ([Sto03,LNZ17]) . Let B ∈ F [ x ] n × n be a matrix of degree at most d , then we can compute det( B ) in ˜ O ( dn ω ) ﬁeld operations. r -Truncated DSO Recall that for a failure f (which is either a vertex or an edge), k uv ⋄ f k denotes the length of the shortestpath from u to v that avoids f . An r -truncated DSO, as deﬁned in [Ren20], is a DSO that given a query ( u, v, f ) , outputs the value min {k uv ⋄ f k , r } . The main result of this subsection is that given an integer r and an input graph G , an r -truncated DSO can be constructed in time ˜ O ( n ω M ) + r /M · MM ( n, nM/r, nM/r ) · n o (1) . Preprocessing algorithm.

Let C be a large enough constant. First, we choose a prime p ∈ [ n C , n C ] and let F = Z p . Then we let Z be an n × n matrix over F , where every Z i,j is sampled independentlyfrom F uniformly at random. We substitute Z into SA ( G ) to obtain the matrix SA Z ( G ) . Recall that eachelement of SA Z ( G ) is a polynomial over x with coefﬁcients in F , whose degree is at most M . Then wecompute SA Z ( G ) − and det( SA Z ( G )) using Theorem 1.4 and Theorem 3.4 respectively.Since we only want an r -truncated DSO, we only need to compute SA Z ( G ) − modulo x r , i.e. weonly preserve the monomials with degree less than r in every entry of SA Z ( G ) − . By Theorem 1.4, wecan compute SA Z ( G ) − mod x r in time ˜ O ( n ω M ) + ( r /M ) · MM ( n, nM/r, nM/r ) · n o (1) . By Theorem 3.4, we can compute det( SA Z ( G )) in ˜ O ( n ω M ) time. Again, we only need to store thepolynomial det( SA Z ( G )) mod x r . This concludes the preprocessing algorithm.For the following query algorithms, we use e i to denote the i -th standard unit vector, i.e. ( e i ) i = 1 ,and ( e i ) j = 0 for every index j = i . 6 uery algorithm for an edge failure. A query consists of vertices u, v ∈ V and a failed edge e . Weassume that e goes from vertex a to vertex b , and has weight l . Let G ′ be the graph obtained by removing e from G , then we have SA ( G ′ ) = SA ( G ) + uv T , where u = e a and v = − z a,b x l e b . Let• γ = 1 + v T SA ( G ) − u = 1 − z a,b x l SA ( G ) − b,a ,• β = ( SA ( G ) − uv T SA ( G ) − ) u,v = − SA ( G ) − u,a z a,b SA ( G ) − b,v x l , and• α = det( SA ( G ))( γ · SA ( G ) − u,v − β ) ,then by Theorem 3.2, we have α = adj( SA ( G ′ )) u,v . (Note that since l ≥ , γ is always invertible.) Query algorithm for a vertex failure.

A query consists of vertices u, v ∈ V and a failed vertex f ∈ V . It sufﬁces to remove every outgoing edge from f (and we do not need to also remove incomingedges to f ), as f already cannot appear as an intermediate vertex in every path from u to v . Therefore,we need to compute adj( SA ( G ′ )) u,v , where G ′ is obtained by removing all outgoing edges from f in G .Let u = e f , and v be the negation of the transpose of the f -th row of SA ( G ) , except that v f = 0 , i.e., v j = ( − z f,j x l if there is an edge from f to j with weight l in G, otherwise , It is easy to see SA ( G ′ ) = SA ( G ) + uv T . To compute adj( SA ( G ′ )) u,v using Theorem 3.2, we let• γ = 1 + v T SA ( G ) − u . Note that ( e f − v ) T is exactly the f -th row of SA ( G ) , so ( e f − v ) T SA ( G ) − = e T f , and v T SA ( G ) − = e T f SA ( G ) − − e T f . We have γ = 1 + e T f SA ( G ) − u − e T f u = SA ( G ) − f,f ;• β = ( SA ( G ) − uv T SA ( G ) − ) u,v = ( e T u SA ( G ) − u )( v T SA ( G ) − e v ) = SA ( G ) − u,f ( e T f SA ( G ) − e v ) = SA ( G ) − u,f SA ( G ) − f,v ;• and α = det( SA ( G ))( γ · SA ( G ) − u,v − β ) ,then we have α = adj( SA ( G ′ )) u,v . (Note that γ is always invertible since the constant term of SA ( G ) − f,f must be .)In the actual query algorithm, we will substitute each formal variable z i,j by Z i,j . Let γ Z denotethe resulting polynomial after this substitution. Note that γ Z is a polynomial in F [ x ] . Similarly we candeﬁne β Z and α Z . If α Z x r ) , then our query algorithm outputs deg ∗ x ( α Z ) ; otherwise itoutputs r .From the above formulas, we can compute γ Z , β Z , and α Z in O (1) arithmetic operations overpolynomials. Note that we only need to compute these polynomials modulo x r , so each such arithmeticoperation takes ˜ O ( r ) time. The total query time is thus ˜ O ( r ) . Remark . Our r -truncated DSO can also deal with undirectedgraphs, but the details are a bit different from the case of directed graphs. To remove an undirected edge,we need to update two entries in SA ( G ) , which corresponds to a rank- update to SA ( G ) . To remove avertex, we need to update one row and one column in SA ( G ) , which is also a rank- update to SA ( G ) .Therefore, we need to use the rank- version of Theorem 3.2 (see [vdBS19, Lemma 1.6]). Actually, our r -truncated DSOs also support deleting f failures, and the query time is ˜ O ( f ω r ) . We omit the detailshere and refer the interested readers to [vdBS19]. Theorem 3.6.

For every integer r , we can construct an r -truncated DSO with preprocessing time ˜ O ( n ω M ) + r /M · MM ( n, nM/r, nM/r ) · n o (1) , and query time ˜ O ( r ) . Our r -truncated DSO is correct w.h.p. r -truncated DSO is correct w.h.p, we mean that w.h.p. over its randomizedpreprocessing algorithm, it answers every query correctly.) Proof of Theorem 3.6.

We only need to prove the correctness of our r -truncated DSO. Consider a query ( u, v, f ) where f is an edge or a vertex, and let G ′ be the graph obtained by removing f from G . ByTheorem 3.2, we have α Z = adj( SA Z ( G ′ )) u,v . (Note that the constant term of γ Z is always , so γ Z isalways invertible.)If k uv ⋄ f k ≥ r , then by Theorem 3.1, adj( SA ( G ′ )) u,v must be a polynomial whose minimum degreeover x is at least r . In this case, we have α Z ≡ x r ) for every Z . Therefore, our algorithm returns r , which is correct.If k uv ⋄ f k = k < r , then by Theorem 3.1, adj( SA ( G ′ )) u,v must be a polynomial whose minimumdegree is exactly k . In this case, the coefﬁcient of x k in α is a polynomial of z i,j with (total) degreeat most n . (This is because adj( SA ( G ′ )) u,v is the determinant of a certain n × n matrix in whichevery entry has total degree at most one in the variables z i,j .) If this polynomial is nonzero at Z , then deg ∗ x ( α Z ) = k and our query algorithm is correct. By Theorem 3.3, this polynomial is with probabilityat most /n C − . Therefore, our query algorithm returns the correct answer k with probability at least − /n C − .In conclusion, for every ﬁxed query ( u, v, f ) , our query algorithm is correct with probability − /n C − over the choice of Z . By a union bound over O ( n ) possible queries, the probability (over ourrandomized preprocessing algorithm) that every query is answered correctly is at least − / Θ( n C − ) ,which is a high probability. Now we have constructed an r -truncated DSO, which we denote by D start . In this subsection, we willextend it to a full DSO using the techniques in [Ren20]. Speciﬁcally, we use the following two algorithmsfrom [Ren20].The ﬁrst algorithm transforms an ( r -truncated) DSO with a possibly large query time into an ( r -truncated) DSO with query time O (1) . More precisely: Lemma 3.7 ([Ren20, Observation 2.1]) . Given an r -truncated DSO D with preprocessing time P andquery time Q , we can build an r -truncated DSO Fast ( D ) with query time O (1) which is correct w.h.p.The preprocessing algorithm of Fast ( D ) is as follows: • It needs the all-pairs distance matrix of the input graph G , as well as the set of consistent (incom-ing and outgoing) shortest path trees rooted at each vertex in G . By Theorem 1.5, these shortestpath trees can be computed in O ( n . M ) time. For details, see Section 5. • It invokes the preprocessing algorithm of D on the input graph G once, and makes ˜ O ( n ) queriesto D . The preprocessing time is P + ˜ O ( n ) Q . The second algorithm we use is implicit in the argument of [Ren20, Section 2.3]. We formalize it asthe following lemma.

Lemma 3.8.

Given an r -truncated DSO D with preprocessing time P and query time O (1) , we can builda (3 / r -truncated DSO Extend ( D ) with preprocessing time P + O ( n ) and query time ˜ O ( nM/r ) . Thenew DSO is correct w.h.p. Now, we are ready to explain our algorithm to build a full DSO. Given an r -truncated DSO D start ,we ﬁrst obtain an r -truncated DSO D with query time O (1) by applying Lemma 3.7.Let i ⋆ = ⌊ log / ( nM/r ) ⌋ . For every ≤ i ≤ i ⋆ , we construct an r (3 / i +1 -truncated DSO D i +1 by applying Lemma 3.8 and Lemma 3.7 sequentially on D i , i.e. D i +1 = Fast ( Extend ( D i )) . Let theresulting DSO be D ﬁnal = D i ⋆ +1 , since r (3 / i ⋆ +1 ≥ nM , D ﬁnal is a full DSO.8e can also summarize our construction algorithm in one formula: D ﬁnal = Fast ( Extend ( Fast ( Extend ( · · · Fast ( D start ))))) | {z } O (log( nM/r )) times . Time complexity.

Let r = M n α , where α ∈ [0 , is a parameter to be determined. By Theorem 3.6,the preprocessing time of D start is ˜ O ( n ω M ) + r /M · MM ( n, nM/r, nM/r ) · n o (1) ≤ ˜ O ( n ω M ) + n α + ω (1 , − α, − α )+ o (1) M, and the query time of D start is ˜ O ( r ) = ˜ O ( n α M ) . By Lemma 3.7, the preprocessing time of D is ˜ O ( n α M + n ω M ) + n α + ω (1 , − α, − α )+ o (1) M. Now consider the preprocessing algorithm of D ﬁnal . We need to compute the all-pairs distancematrix and in/out shortest path trees of G as required by Lemma 3.7, which takes ˜ O ( n µ M ) time byTheorem 1.5. We also need to run the preprocessing algorithm of D . Also, for every ≤ i ≤ i ⋆ , weneed to preprocess the oracle D i +1 , which takes n · ˜ O ( nM/ ( r (3 / i +1 )) = ˜ O (cid:16) n − α M (3 / i (cid:17) time.Therefore, the preprocessing time of D ﬁnal is: ˜ O ( n α M + n ω M + n µ M ) + n α + ω (1 , − α, − α )+ o (1) M + ⌊ log / ( nM/r ) ⌋ X i =0 ˜ O (cid:18) n − α M (3 / i (cid:19) ≤ n max { α, µ, − α, α + ω (1 , − α, − α ) } + o (1) M. Let α = 0 . , β = − α , then . < β < . . Recall that for any real number λ , ω ( λ ) is ashorthand for ω (1 , , λ ) . We have ω (1 , − α, − α ) = (1 − α ) ω ( β ) ≤ (1 − α ) · (1 . − β ) ω (1 .

5) + ( β − . ω (1 . . − . (1) ≤ . · · (0 . · ω (1 .

5) + 0 . · ω (1 . ≤ . . (2)Here, Eq. (1) uses the convexity of the ω ( · ) function [LR83], and Eq. (2) uses the recent bounds in[GU18] that ω (1 . ≤ . and ω (1 . ≤ . . We can see that max { α, µ, − α, α + ω (1 , − α, − α ) } = 2 α + ω (1 , − α, − α ) ≤ . . By Lemma 3.7, the query time of D ﬁnal is O (1) . Therefore, we can construct a DSO with O ( n . M ) preprocessing time, and O (1) query time. x r As we see in Section 3, the algorithm in Theorem 1.4 for inverting a polynomial matrix modulo x r isvery crucial for our results. Theorem 1.4.

Let r be an integer, F be a ﬁnite ﬁeld. Let F ∈ F [ x ] n × n be an n × n matrix over the ringof univariate polynomials F [ x ] , and let d ≥ be an upper bound on the degrees of entries of F . If F isinvertible, the number of ﬁeld operations to compute F − mod x r is at most ˜ O ( dn ω ) + ( r /d ) · MM ( n, nd/r, nd/r ) · n o (1) . In this section, we work in a (large enough) ﬁeld F , and regard each polynomial in the matrix asan element of the commutative ring R = F [ x ] /x r . Without loss of generality, we assume n and r arepowers of throughout this section. 9 .1 An Informal Treatment Our algorithm is essentially the algorithm in [ZLS15]. In fact, the only difference is that we only considerpolynomials modulo x r . In Section 4.2, we will provide an improved analysis of this algorithm by usingrectangular matrix multiplication. Here we present a brief exposition of the algorithm in [ZLS15].Let F be an input polynomial matrix where each entry has degree at most d . We will compute a kernel basis decomposition of F , which is a chain of matrices A , A , . . . , A log n and a diagonal matrix B , such that F − = A A . . . A log n B − . (3)Then, to compute F − , we simply multiply the above matrices. Note that B is a diagonal matrix, soits inverse is easy to compute.To start, we write F = (cid:20) F U F D (cid:21) , where each F U or F D is an ( n/ × n matrix. Then we compute two n × ( n/ matrices N R and N L with full rank, such that F U N R = , and F D N L = . (This can bedone by [ZLS12, Theorem 4.2].) Let A = (cid:2) N L N R (cid:3) , then A has full rank, and F · A = (cid:20) F U N L F U N R F D N L F D N R (cid:21) = (cid:20) F U N L F D N R (cid:21) . Therefore, F · A is a block diagonal matrix with two blocks, each of size ( n/ × ( n/ . Wecan then recursively invoke the kernel basis decomposition of these two blocks, and form the matrices A , . . . , A log n . The diagonal matrix B is created at the base case of the recursion, where the diagonalblocks of F · A · · · · · A log n are of size × . It is shown in [ZLS15] that the kernel basis decompositioncosts only ˜ O ( dn ω ) time to compute.We still need to compute Eq. (3). From the above algorithm, we can see that each A i is a block-diagonal matrix, which consists of i − blocks of size ( n/ i − ) × ( n/ i − ) . Now we assume that eachentry in A i also has degree at most d · i − . (In reality, the behavior of degrees in A i may be complicated,and we need the notion of shifted column degree (see Deﬁnition 4.1) to control it.)To compute Eq. (3), we deﬁne M i = A A . . . A i , and compute each M i by the formula M i +1 = M i A i +1 . (4)The degree of each entry in M i will be at most O (2 i · d ) . As we only need the results modulo x r ,we can assume the degrees are actually O (cid:0) min { r, i · d } (cid:1) . Note that A i +1 consists of i blocks, eachof size ( n/ i ) × ( n/ i ) , and the degree of each (nonempty) entry in A i +1 is also O (cid:0) min { r, i · d } (cid:1) .Therefore, we can compute Eq. (4) in O (cid:0) min { r, i · d } (cid:1) · i · MM ( n, n/ i , n/ i ) (5)time. (It is basically i matrix products of size n × ( n/ i ) and ( n/ i ) × ( n/ i ) ; we need to multiplyanother factor of min { r, i · d } which is the degree of polynomials in these matrices.)Now, it is easy to see that the bottleneck of this algorithm occurs when r = 2 i · d , and the time forcomputing Eq. (4) is: (5) = ( r /d ) · MM ( n, nd/r, nd/r ) . As opposed to the informal description above, the maximum degrees in the matrices may not behavewell. We need to introduce the concept of column degrees and shifted column degrees to capture thebehavior of the degrees in these matrices.

Deﬁnition 4.1 ([ZLS15, Section 2.2]) . Let ~ p be a length- n column vector whose entries are polynomials.Then the column degree of ~ p , denoted as cdeg ~ p , is the maximum of the degrees of the entries in ~ p . Thatis: cdeg ~ p = n max i =1 { deg( p i ) } . ~s be a length- n vector of integers, called the shift of the degrees. Then the ~s -shifted columndegree of ~ p , or simply the ~s -column degree of ~ p , denoted as cdeg ~s ~ p , is deﬁned as cdeg ~s ~ p = n max i =1 { s i + deg( p i ) } . It is easy to see that cdeg ~ p = cdeg ~ ~ p , where ~ is the all-zero vector.Let A be an m × n polynomial matrix, then the column degree ( ~s -column degree resp.) of A , denotedas cdeg A ( cdeg ~s A resp.), is the length- n row vector whose i -th entry is the column degree ( ~s -columndegree resp.) of the i -th column of A .We need the following theorem. It is essentially Theorem 3.7 of [ZLS12], where we replace theinvocations of square matrix multiplication algorithms with (the faster) rectangular matrix multiplica-tion algorithms. It is straightforward to adapt the original proof in [ZLS12] to use rectangular matrixmultiplication, but for completeness we will include a proof in Section 4.3. Theorem 4.2.

Theorem 1.4.

In this sketch, we will use some results in [ZLS15] directly. We will also use somenotation introduced in Section 4.1.Let ~s = cdeg F . We ﬁrst invoke the kernel basis decomposition algorithm I NVERSE of [ZLS15]: ( A , A , . . . , A log n , B ) ← I NVERSE ( F , ~s ) . By [ZLS15, Theorem 8], the algorithm I

NVERSE takes only ˜ O ( dn ω ) time. Then we compute F − = A A . . . A log n B − . Note that B is a diagonal matrix, so it sufﬁces to compute A A . . . A log n . Also recall that forevery ≤ i < log n , A i +1 is a block diagonal matrix that consists of i diagonal blocks of size ( n/ i ) × ( n/ i ) . Let A ( j ) i +1 denote the j -th block, we write A i +1 = diag( A (1) i +1 , . . . , A (2 i ) i +1 ) . Let M i = A A . . . A i . Then for every ≤ i < log n , M i +1 = M i A i +1 . (4)In order to use results in [ZLS15, Lemma 10], we need to partition each A ( k ) i +1 into two kernel bases.Like how A was formed in Section 4.1, we denote A ( k ) i +1 = h N ( k ) i +1 , L N ( k ) i +1 , R i . Here, each N ( k ) i +1 , L N ( k ) i +1 , R is of dimension ( n/ i ) × ( n/ i +1 ) . We divide M i into submatrices (“column blocks”) ofdimension n × ( n/ i ) accordingly: M i = h M (1) i M (2) i . . . M (2 i ) i i . Then Eq. (4) is equivalent to M (2 k − i +1 = M ( k ) i · N ( k ) i +1 , L , and M (2 k ) i +1 = M ( k ) i · N ( k ) i +1 , R . (6)We use Theorem 4.2 to multiply these matrices. For each ≤ i < log n , in Eq. (6), we need toperform i +1 matrix multiplications of the form M · N . Here M = M ( k ) i , and N is either N ( k ) i +1 , L or N ( k ) i +1 , R . The dimension of M is n × ( n/ i ) , and the dimension of N is ( n/ i ) × ( n/ i +1 ) . Moreover,let ~t = cdeg ~s M ( k ) i , then by [ZLS15, Lemma 10]:(a) P n/ i j =1 t j ≤ P nj =1 s j ≤ dn .(b) P n/ i +1 j =1 (cdeg ~t N ( k ) i +1 , L ) j ≤ P nj =1 s j ≤ dn ; similarly, P n/ i +1 j =1 (cdeg ~t N ( k ) i +1 , R ) j ≤ dn .(Recall that ~s is the column degree of F .)Let ξ i = max  n/ i n/ i X j =1 t j , n/ i +1 n/ i +1 X k =1 (cdeg ~t N ) k  ≤ i +1 · d. Note that we are only interested in the polynomials modulo x r , thus by deﬁnition, every element in ~t and cdeg ~t N should be upper bounded by O ( r ) . Therefore if i +1 d ≥ r , we use the bound ξ i ≤ O ( r ) instead. By Theorem 4.2, the time complexity for computing M · N is ξ i · n ω (1 , − τ, − τ )+ o (1) , where τ = log n (2 i +1 ) .Let τ ⋆ = log( r/d )log n be the threshold such that i +1 d ≤ r if and only if τ ≤ τ ⋆ . Suppose i +1 d ≤ r ,then the time complexity for computing all i +1 ( = n τ ) matrix products is n τ · ξ i · n ω (1 , − τ, − τ )+ o (1) ≤ d · n τ + ω (1 , − τ, − τ )+ o (1) ≤ d · n τ ⋆ + ω (1 , − τ ⋆ , − τ ⋆ )+ o (1) By Lemma 2.2 ≤ ( r /d ) · MM ( n, nd/r, nd/r ) · n o (1) . On the other hand, suppose i +1 d > r , then the time complexity for computing all n τ matrix productsis n τ · r · n ω (1 , − τ, − τ )+ o (1) ≤ r · n τ ⋆ + ω (1 , − τ ⋆ , − τ ⋆ )+ o (1) By Lemma 2.2 ≤ ( r /d ) · MM ( n, nd/r, nd/r ) · n o (1) . Summing over every ≤ i < log n , we can see that the time complexity for inverting F is at most ˜ O ( dn ω ) + ( r /d ) · MM ( n, nd/r, nd/r ) · n o (1) . .3 Proof of Theorem 4.2 Theorem 4.2.

Let A be an n p × n q polynomial matrix, and B be an n q × n r polynomial matrix. Suppose ~s ≥ cdeg A is a shift that bounds the corresponding column degrees of A , and ξ = max ( n q n q X i =1 s i , n r n r X i =1 (cdeg ~s B ) i ) + 1 . Then the product A · B can be computed in ξ · n ω ( p,q,r )+ o (1) ﬁeld operations.Proof. W.l.o.g. we assume that n p , n q , n r are powers of . For every ≤ c ≤ r log n − , let B c denotethe set of columns of B whose ~s -column degrees are in the range (2 c ξ, c +1 ξ ] ; let B denote the restcolumns of B , i.e. those with ~s -column degrees no more than ξ . Then B , B , . . . , B r log n − form apartition of the columns of B . By the deﬁnition of ξ , for every ≤ c ≤ r log n − , there are at most n r / c columns in B c . To compute A · B , it sufﬁces to compute A · B c for each c .Now ﬁx an integer c , we need to compute A · B c . Using the same method above, we can alsopartition the columns of A into q log n groups. More precisely, for every ≤ c ′ ≤ q log n − , let A c ′ be the set of columns of A whose column degrees are in the range (2 c ′ ξ, c ′ +1 ξ ] ; let A be the restcolumns of A , i.e. those with column degrees no more than ξ . For notational convenience, we mayassume that A = (cid:2) A A . . . A q log n − (cid:3) , as otherwise we can rearrange the columns of A (along with the rows of B and the entries in ~s ). We alsonote that for every ≤ c ′ ≤ q log n − , there are at most n q / c ′ columns in A c ′ .The partition of columns of A induces a partition of rows of B c . In particular, we deﬁne B c,c ′ as therows of B c corresponding to columns of A c ′ , so B c =  B c, B c, ... B c,q log n −  . We can see that for every c ′ > c , B c,c ′ is the zero matrix. In fact, suppose the entry in the j -th rowand k -th column of B c is nonzero, and this entry belongs to B c,c ′ for some c ′ > c . Denote this columnas b k , then cdeg ~s b k ≥ s j . As the j -th column of A belongs to A c ′ , we have s j > c ′ ξ ≥ c +1 ξ .However, by deﬁnition of B c , we also have cdeg ~s b k ≤ c +1 ξ , a contradiction. Therefore A · B c = c X c ′ =0 A c ′ · B c,c ′ . Again, ﬁx c ′ ∈ [0 , c ] , we want to compute A c ′ · B c,c ′ . Recall that the dimension of A c ′ is at most n p × ( n q / c ′ ) , and each entry in A c ′ is a polynomial of degree at most c ′ +1 ξ ; the dimension of B c,c ′ is at most ( n q / c ′ ) × ( n r / c ) , and each entry in B c,c ′ is a polynomial of degree at most c +1 ξ . Let ∆ = 2 c ′ +1 ξ , we “decompose” B c,c ′ into ℓ = 2 c − c ′ matrices { B c,c ′ ,i } ℓ − i =0 , such that: B c,c ′ = B c,c ′ , + B c,c ′ , · x ∆ + B c,c ′ , · x + · · · + B c,c ′ ,ℓ − · x ( ℓ − , and each entry in each matrix B c,c ′ ,i has degree at most ∆ .We concatenate these degree- ∆ matrices together, to form a matrix d B c,c ′ = (cid:2) B c,c ′ , B c,c ′ , . . . B c,c ′ ,ℓ − (cid:3) . This matrix has at most ( n r / c ) · ℓ ≤ ( n r / c ′ ) columns.13hen we compute [ C c,c ′ = A c ′ · d B c,c ′ . We can see that [ C c,c ′ = (cid:2) A c ′ B c,c ′ , A c ′ B c,c ′ , . . . A c ′ B c,c ′ ,ℓ − (cid:3) . And we can directly compute A c ′ · B c,c ′ from [ C c,c ′ , as A c ′ · B c,c ′ = ℓ − X i =0 A c ′ B c,c ′ ,i · x i · ∆ . Now we ﬁnished the description of the algorithm.We analyze the time complexity. Fix constants ≤ c ′ ≤ c , we need to multiply A c ′ and d B c,c ′ . Let τ = log n (2 c ′ ) . In both of these matrices, the degree of every entry is at most ∆ = O (2 c ′ ξ ) = O ( n τ ξ ) .The dimensions of A c ′ and d B c,c ′ are upper bounded by n p × ( n q − τ ) and ( n q − τ ) × ( n r − τ ) respectively.Therefore the time complexity for this step is ˜ O (cid:16) n τ ξ · n ω ( p,q − τ,r − τ ) (cid:17) , which is at most ξ · n ω ( p,q,r )+ o (1) by Lemma 2.1. As we only need to consider O (log n ) pairs of ( c, c ′ ) ,it follows that the total time complexity of our algorithm is ξ · n ω ( p,q,r )+ o (1) . In this section, we show how to compute unique shortest paths in a directed graph in ˜ O ( n µ M ) time,matching the current best time bound for computing the all-pairs distances [Zwi02]. Here µ < . is the solution of ω (1 , , µ ) = 1 + 2 µ [GU18]. This algorithm is needed before we use Lemma 3.7.We may assume that before we proceed, we have already computed the all-pairs distances k uv k forevery u, v ∈ V , using the APSP algorithm in [Zwi02].Our tie-breaking method requires a (random) permutation π of all vertices, or equivalently a bijectionbetween the vertex set V and [ n ] , i.e. π : V → [ n ] . According to π , for every graph G on V and every u, v ∈ V , we will specify a shortest path ρ G ( u, v ) in G from u to v in a certain way. These shortestpaths will be consistent and easy to compute , which is captured by the following theorem. (See also[Ren20, Theorem 1.3 and 1.4].) Theorem 5.1.

Given a graph G on V , a representation of the set of shortest paths { ρ G ( u, v ) } u,v ∈ V canbe computed in ˜ O ( n µ M ) time, with high probability over the random choice of permutation π , suchthat the following hold.(Property a) Let G be a graph on V . For every u ′ , v ′ ∈ ρ G ( u, v ) such that u ′ appears before v ′ , theportion of u ′ v ′ in ρ G ( u, v ) coincides with the path ρ G ( u ′ , v ′ ) .(Property b) Let G be a graph on V , u, v ∈ V , and G ′ be a subgraph of G . Suppose ρ G ( u, v ) iscompletely contained in G ′ , then ρ G ′ ( u, v ) = ρ G ( u, v ) . From (Property a), for every vertex u , the shortest paths from u to every other vertex in G form atree, and we call this tree the outgoing shortest path tree rooted at u , denoted as T out ( u ) . Similarly,the shortest paths to u from every other vertex in G also form a tree, and we call this tree the incomingshortest path tree rooted at u , denoted as T in ( u ) . Actually, the “representation” computed is exactlythe set of n outgoing shortest path trees { T out ( u ) } u ∈ V and the set of n incoming shortest path trees { T in ( u ) } u ∈ V . The rest of this section.

We ﬁrst deﬁne the paths ρ G ( u, v ) in Section 5.1. Then we explain how tocompute them efﬁciently in Section 5.2, by presenting an algorithm that computes the incoming andoutgoing shortest path trees in ˜ O ( M n µ ) time. Finally, we prove (Property a) and (Property b) inSection 5.3. 14 .1 Deﬁning ρ G ( u, v ) Let G be an input graph, and π : V → [ n ] be a (random) bijection. Let u, v ∈ V , P be a path from u to v , we will say that any vertex on P that is neither u nor v is an internal vertex of P .Recall that we deﬁned | uv | as the largest number of edges in any shortest path from u to v . Inparticular:• | uv | = 0 if and only if u = v ;• | uv | = 1 if and only if the edge ( u, v ) is the only shortest path from u to v ;• | uv | = ∞ if there is no path from u to v in G ;• otherwise, we have ≤ | uv | < ∞ .We claim that the set of vertices mapped to small values by π is a good “hitting set” w.h.p: Claim 5.2.

Fix the graph G . For some large constant C , with high probability over the choice of π , thefollowing holds. For every pair of vertices u, v ∈ V such that ≤ | uv | < ∞ , there is a shortest path ρ ′ ( u, v ) from u to v , and an internal vertex z on ρ ′ ( u, v ) , such that π ( z ) ≤ CM n ln n/ k uv k .Proof. Fix two vertices u, v ∈ V , and any shortest path ρ ′ ( u, v ) from u to v . Denote r = k uv k ,if r ≤ M ln n then the claim is trivial. Otherwise, there are at least r/ . M vertices on ρ ′ ( u, v ) .Therefore, the probability over a random bijection π : V → [ n ] that π maps every vertex on ρ ′ ( u, v ) toan integer greater than CM n ln n/r is at most (1 − CM ln n/r ) r/ . M ≤ /n C/ . . Thus by a union bound, the probability that the above condition holds (for every u, v ) is at least − /n C/ . − , which is a high probability.Let u, v ∈ V such that ≤ | uv | < ∞ . Deﬁne w ( u, v ) as the intermediate vertex with the smallestlabel in any shortest path from u to v , i.e. w ( u, v ) = arg w min { π ( w ) : k uv k = k uw k + k wv k , w = u and w = v } . (7)Claim 5.2 states that w.h.p. for every vertices u, v ∈ V such that ≤ | uv | < ∞ , we have that π ( w ( u, v )) ≤ CM n ln n/ k uv k . (8)In the rest of this section, we assume that Eq. (8) holds for every vertices u, v ∈ V such that ≤ | uv | < ∞ . Now we deﬁne the paths ρ G ( u, v ) . Deﬁnition 5.3.

Let u, v ∈ V such that | uv | 6 = ∞ . The path ρ G ( u, v ) is recursively deﬁned as follows.• If u = v , then ρ G ( u, v ) is the empty path that starts and ends at u .• If | uv | = 1 , then ρ G ( u, v ) consists of a single edge, i.e. the edge from u to v .• Otherwise, let w = w ( u, v ) , then ρ G ( u, v ) is the concatenation of ρ G ( u, w ) and ρ G ( w, v ) .For every u, v such that ≤ | uv | < ∞ , since w is an intermediate vertex on some shortest path from u to v , it is easy to see that | uw | < | uv | and | wv | < | uv | . Therefore ρ G ( u, v ) is well deﬁned — it isinductively deﬁned in the nondecreasing order of | uv | . ˜ O ( M n µ ) Time

We will need the following classical algorithm for computing distance products:

Lemma 5.4 ([Zwi02]) . Let A be an n × m matrix, and B be an m × n matrix. Suppose every entry in A or B is either + ∞ or an integer with absolute value at most M . Then the distance product of A and B can be computed in ˜ O ( M · MM ( n, m, n )) time. omputing w ( u, v ) . We ﬁrst show how to compute w ( u, v ) for every u, v ∈ V such that ≤ | uv | < ∞ in ˜ O ( M n µ ) time. Then we use the values of all w ( u, v ) to compute the incoming and outgo-ing shortest path trees in ˜ O ( n ) additional time. Our strategy for computing w ( u, v ) is to mimic thealgorithm in [KL05, SYZ11] for computing maximum witness of Boolean matrix multiplication. In par-ticular, we divide the possible witnesses into blocks, and use fast matrix multiplication algorithms toﬁnd the block containing w ( u, v ) , for every u, v . After that, we use brute force to ﬁnd w ( u, v ) insidethat block. Details follow.Let r = 2 k be a parameter, we show how to compute w ( u, v ) for every pair of vertices u, v ∈ V such that r ≤ k uv k < r . Let H r = { z ∈ V : π ( z ) ≤ CM n ln n/r } . By Claim 5.2, for every vertices u, v such that k uv k ∈ [ r, r ) , we have w ( u, v ) ∈ H r .We deﬁne an n × |H r | matrix A and an |H r | × n matrix B as follows. For every u ∈ V and z ∈ H r ,we deﬁne A [ u, z ] = ( k uz k if k uz k ≤ r and u = z + ∞ otherwise , and B [ z, u ] = ( k zu k if k zu k ≤ r and u = z + ∞ otherwise . Then we compute the minimum witness of the distance product

A ⋆ B . To be more precise, we computethe matrix W [ · , · ] such that for every u, v ∈ V , W [ u, v ] = arg z min { π ( z ) : k uv k = A [ u, z ] + B [ z, v ] } . Correctness.

Fix u, v ∈ V , where k uv k ∈ [ r, r ) . We will show that if | uv | = 1 , then W [ u, v ] does not exist; otherwise W [ u, v ] coincides with w ( u, v ) deﬁned in Eq. (7).First, suppose | uv | = 1 , then there are no intermediate vertex z such that k uv k = k uz k + k zv k ,which means W [ u, v ] does not exist.Now we assume | uv | ≥ . Since k uv k ≥ r , by Claim 5.2, there is an intermediate vertex z ∈H r such that k uz k + k zv k = k uv k . Since k uz k , k zv k ≤ k uv k < r , we can see that k uv k = A [ u, z ] + B [ z, v ] , therefore W [ u, v ] exists. Let z = W [ u, v ] , then by Eq. (7), π ( w ( u, v )) ≤ π ( z ) .On the other hand, Claim 5.2 shows that w ( u, v ) ∈ H r , so by the deﬁnition of z = W [ u, v ] , we have π ( z ) ≤ π ( w ( u, v )) . Therefore z = w ( u, v ) and we have established the correctness of W [ · , · ] . Time complexity.

Now we show how to compute the matrix W [ · , · ] efﬁciently.Let s = n µ , where µ ∈ (0 , is a parameter to be determined later. If |H r | < s , then we cancompute the matrix W by brute force in ˜ O ( n s ) time. Otherwise, we partition H r into blocks of size s ,where the i -th block contains vertices that are mapped by π to values between ( i − · s + 1 and i · s . Forevery block i , we compute the distance product of A and B where only vertices in block i are allowedas witnesses. In other words, we compute the following matrix D i [ u, v ] = min { A [ u, z ] + B [ z, v ] : ( i − · s + 1 ≤ π ( z ) ≤ i · s } . By Lemma 5.4, this matrix can be computed in ˜ O ( r · MM ( n, s, n )) time. There are O ( |H r | /s ) =˜ O ( M n/ ( rs )) blocks, and we need to compute a distance product D i for each block i . Therefore thetotal time for computing all these distance products is ˜ O ( r · MM ( n, s, n ) · M n/ ( rs )) = ˜ O ( M · ( n/s ) · MM ( n, s, n )) . Now for every u, v ∈ V such that k uv k ∈ [ r, r ) and | uv | ≥ , we want to compute W [ u, v ] ,which is the vertex z ∈ H r with the minimum π ( z ) , such that k uv k = A [ u, z ] + B [ z, v ] . First, weﬁnd the smallest i such that D i [ u, v ] = k uv k , and we know that W [ u, v ] is in the i -th block. (If such i does not exist, then W [ u, v ] does not exist either, and | uv | = 1 .) This step takes ˜ O ( M n/ ( rs )) time.16hen we iterate through the vertices in this block, and ﬁnd the vertex z with the smallest π ( z ) such that A [ u, z ] + B [ z, v ] = k uv k . This step takes O ( s ) time.It follows that the time complexity for computing every w ( u, v ) where k uv k ∈ [ r, r ) is ˜ O ( M · MM ( n, s, n ) · ( n/s ) + n · M n/ ( rs ) + n s ) ≤ ˜ O ( M · MM ( n, s, n ) · ( n/s ) + n s ) (9) ≤ ˜ O ( M · n ω (1 ,µ, − µ + n µ ) . Here, Eq. (9) is because n · M n/ ( rs ) ≤ n · M · ( n/s ) ≤ M · MM ( n, s, n ) · ( n/s ) .Let µ be the solution to ω (1 , µ,

1) = 1 + 2 µ , then µ < . ([Zwi02, GU18]). It follows that thetime complexity for computing every w ( u, v ) , where r ≤ k uv k < r , is at most ˜ O ( M n µ ) . Putting it together.

We run the above algorithm for k from to ⌊ log( nW ) ⌋ , and for each k , weupdate the values w ( u, v ) where k uv k ∈ [2 k , k +1 ) . The total time to compute w ( u, v ) for all u, v isthus ˜ O ( M n µ ) . From w ( u, v ) to unique shortest paths. For every u, v ∈ V , we will compute the parent of u in thetree T in ( v ) , denoted as parent v ( u ) . In other words, parent v ( u ) is the second vertex in the path ρ G ( u, v ) (the ﬁrst being u ). After computing parent v ( u ) for every u, v ∈ V , it is easy to construct T in ( v ) forevery vertex v . We can compute every T out ( u ) in a symmetric fashion.We proceed by nondecreasing order of k uv k . Suppose that for every ( u ′ , v ′ ) such that k u ′ v ′ k < k uv k , we have already computed parent v ′ ( u ′ ) . Now we compute parent v ( u ) as follows. Let w = w ( u, v ) . If w does not exist, let parent v ( u ) = v ; otherwise parent v ( u ) = parent w ( u ) .This algorithm (that given every w ( u, v ) , computes every parent v ( u ) ) clearly runs in ˜ O ( n ) time.Notice that if w exists, then w is an intermediate vertex in ρ G ( u, v ) , thus k uw k < k uv k , and the secondvertex in the path ρ G ( u, v ) coincides with the second vertex in the path ρ G ( u, w ) . Hence, the correctnessof the algorithm can be easily proved by induction on k uv k . Theorem 5.1.

We prove it by induction on the number of edges of ρ G ( u, v ) . Let P = ρ G ( u, v ) .If u = v or P has only one edge, (Property a) is trivial. Now suppose P has k edges where k > . Let w = w ( u, v ) , then w must lie on P . Consider the following three cases:• Suppose u ′ appears after (or coincides with) w on P . By deﬁnition, P [ w, v ] = ρ G ( w, v ) . Then P [ u ′ , v ′ ] = ρ G ( u ′ , v ′ ) by induction hypothesis on ρ G ( w, v ) since it has fewer edges than ρ G ( u, v ) .• Suppose v ′ appears before (or coincides with) w . This case is symmetric to the above case.17 Otherwise, w lies between u ′ and v ′ on P .First, we claim that w = w ( u ′ , v ′ ) . As w lies on some shortest path from u ′ to v ′ (i.e. P [ u ′ , v ′ ] ),we have π ( w ( u ′ , v ′ )) ≤ π ( w ) . On the other hand, suppose there exists w ′ such that π ( w ′ ) < π ( w ) and w ′ is on some shortest path from u ′ to v ′ . Then w ′ also lies on some shortest path from u to v , so it is a better candidate for w ( u, v ) , contradicting the deﬁnition of w .Second, by induction hypothesis on ρ G ( u, w ) , which has fewer edges than ρ G ( u, v ) , we have P [ u ′ , w ] = ρ G ( u ′ , w ) . Similarly, P [ w, v ′ ] = ρ G ( w, v ′ ) . Therefore, by deﬁnition, P [ u ′ , v ′ ] = P [ u ′ , w ] ◦ P [ w, v ′ ] = ρ G ( u ′ , v ′ ) . Proof of (Property b).

We prove it by induction on the number of edges of ρ G ( u, v ) . Let P = ρ G ( u, v ) .If u = v or P has only one edge, (Property b) is trivial.Now suppose P has more than one edge. Let w = w G ( u, v ) (i.e. the vertex w ( u, v ) deﬁned inEq. (7) in graph G ), we claim that w coincides with w G ′ ( u, v ) (i.e. the vertex w ( u, v ) deﬁned in Eq. (7)in graph G ′ ). Since P is also a shortest path from u to v in G ′ , we have π ( w G ′ ( u, v )) ≤ π ( w ) . On theother hand, suppose there exists w ′ such that π ( w ′ ) < π ( w ) and w ′ is on some shortest path from u to v in G ′ . Then w ′ also lies on some shortest path from u to v in G , so it is a better candidate for w G ( u, v ) ,contradicting the deﬁnition of w .Since ρ G ( u, w ) has fewer edges than ρ G ( u, v ) , and ρ G ( u, w ) is completely contained in G ′ , we canuse induction hypothesis on ρ G ( u, w ) to conclude that P [ u, w ] = ρ G ′ ( u, w ) . Similarly, we can usethe induction hypothesis on ρ G ( w, v ) to conclude that P [ w, v ] = ρ G ′ ( w, v ) . Therefore, by deﬁnition, ρ G ′ ( u, v ) = ρ G ′ ( u, w ) ◦ ρ G ′ ( w, v ) = P . We presented an improved DSO for directed graphs with integer weights in [1 , M ] . The preprocessingtime is O ( n . M ) and the query time is O (1) . However, there is still a small gap between the pre-processing time of our DSO, and the current best time bound for the APSP problem in directed graphs,which is ˜ O ( n µ M ) ≤ O ( n . M ) [Zwi02]. Can we improve the preprocessing time to ˜ O ( n µ M ) ,matching the latter time bound? Another interesting problem is to investigate the complexity of prepro-cessing a DSO in undirected graphs — here, the best time bound for APSP is ˜ O ( n ω M ) [Sei95, SZ99].Can we preprocess a DSO in ˜ O ( n ω M ) time on undirected graphs?Compared to other DSOs [WY13, GW20, CC20], our oracle has two drawbacks. First, our queryalgorithm only outputs the shortest distance, but we do not know how to ﬁnd the actual shortest paths.So another open problem is whether we can ﬁnd the actual shortest path with additional O ( l ) query time,where l is the number of edges in the returned shortest path. Second, since we used [Ren20, Observation2.1], our oracle can only deal with positive edge weights. Can we extend our oracle to also deal withnegative edge weights?For every parameter f , the r -truncated DSO in Section 3.2 can actually handle f edge/vertex dele-tions in ˜ O ( f ω r ) query time. (See also [vdBS19].) However, as far as we know, [Ren20, Observation2.1] only works for one failure. It would be exciting to extend [Ren20, Observation 2.1] or our (full)DSO to also handle f failures. Acknowledgment

We would like to thank Ran Duan and Tianyi Zhang for helpful discussions during the initial stage ofthis research. We are grateful to anonymous reviewers for helpful comments, and for suggesting the titleof Section 5. 18 eferences [ACC19] Noga Alon, Shiri Chechik, and Sarel Cohen. Deterministic combinatorial replacement pathsand distance sensitivity oracles. In

Proc. 46th International Colloquium on Automata,Languages and Programming (ICALP) , volume 132 of

LIPIcs , pages 12:1–12:14, 2019. doi:10.4230/LIPIcs.ICALP.2019.12 . (cit. on p. 2)[AGM97] Noga Alon, Zvi Galil, and Oded Margalit. On the exponent of the all pairs short-est path problem.

Journal of Computer and System Sciences , 54(2):255–262, 1997. doi:10.1006/jcss.1997.1388 . (cit. on p. 2)[AHU74] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman.

The Design and Analysis ofComputer Algorithms . Addison-Wesley, 1974. (cit. on p. 5)[AW21] Josh Alman and Virginia Vassilevska Williams. A reﬁned laser method and faster ma-trix multiplication. In

Proc. 32nd Annual ACM-SIAM Symposium on Discrete Algorithms(SODA) , pages 522–539, 2021. doi:10.1137/1.9781611976465.32 . (cit. on p. 2,4)[BH74] James R. Bunch and John E. Hopcroft. Triangular factorization and inversion byfast matrix multiplication.

Mathematics of Computation , 28(125):231–236, 1974. doi:10.2307/2005828 . (cit. on p. 4)[BK08] Aaron Bernstein and David R. Karger. Improved distance sensitivity or-acles via random sampling. In

Proc. 19th Annual ACM-SIAM Sym-posium on Discrete Algorithms (SODA) , pages 34–43, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347087 . (cit. on p.2)[BK09] Aaron Bernstein and David R. Karger. A nearly optimal oracle for avoiding failed verticesand edges. In

Proc. 41st Annual ACM Symposium on Theory of Computing (STOC) , pages101–110, 2009. doi:10.1145/1536414.1536431 . (cit. on p. 2, 3)[Blä13] Markus Bläser. Fast matrix multiplication.

Theory of Computing, Graduate Surveys , 5:1–60,2013. doi:10.4086/toc.gs.2013.005 . (cit. on p. 21)[CC20] Shiri Chechik and Sarel Cohen. Distance sensitivity oracles with subcubic preprocessingtime and fast query time. In

Proc. 52nd Annual ACM Symposium on Theory of Computing(STOC) , pages 1375–1388, 2020. doi:10.1145/3357713.3384253 . (cit. on p. 2,18)[DP09a] Ran Duan and Seth Pettie. Dual-failure distance and connectivity oracles. In

Proc. 20thAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 506–515, 2009. doi:10.1137/1.9781611973068.56 . (cit. on p. 4)[DP09b] Ran Duan and Seth Pettie. Fast algorithms for (max, min)-matrix multiplication and bottle-neck shortest paths. In

Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms(SODA) , pages 384–391, 2009. doi:10.1137/1.9781611973068.43 . (cit. on p. 3)[DTCR08] Camil Demetrescu, Mikkel Thorup, Rezaul Alam Chowdhury, and Vijaya Ramachan-dran. Oracles for distances avoiding a failed node or link.

SIAM Journal of Computing ,37(5):1299–1318, 2008. doi:10.1137/S0097539705429847 . (cit. on p. 2)[DZ17] Ran Duan and Tianyi Zhang. Improved distance sensitivity oracles via tree partitioning. In

Proc. 15th International Symposium on Algorithms and Data Structures (WADS) , volume10389 of

LNCS , pages 349–360, 2017. doi:10.1007/978-3-319-62127-2\_30 .(cit. on p. 2) 19GU18] Francois Le Gall and Florent Urrutia. Improved rectangular matrix multiplica-tion using powers of the Coppersmith-Winograd tensor. In

Proc. 29th AnnualACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1029–1046, 2018. doi:10.1137/1.9781611975031.67 . (cit. on p. 9, 14, 17)[GW20] Fabrizio Grandoni and Virginia Vassilevska Williams. Faster replacement paths and dis-tance sensitivity oracles.

ACM Transactions on Algorithms , 16(1):15:1–15:25, 2020. doi:10.1145/3365835 . (cit. on p. 2, 18)[KL05] Miroslaw Kowaluk and Andrzej Lingas. LCA queries in directed acyclic graphs. In

Proc.32nd International Colloquium on Automata, Languages and Programming (ICALP) , vol-ume 3580 of

LNCS , pages 241–248, 2005. doi:10.1007/11523468\_20 . (cit. on p.16)[LNZ17] George Labahn, Vincent Neiger, and Wei Zhou. Fast, deterministic computation of the Her-mite normal form and determinant of a polynomial matrix.

Journal of Complexity , 42:44–71,2017. doi:10.1016/j.jco.2017.03.003 . (cit. on p. 6)[LR83] Grazia Lotti and Francesco Romani. On the asymptotic complexity of rect-angular matrix multiplication.

Theoretical Computer Science , 23:171–185, 1983. doi:10.1016/0304-3975(83)90054-3 . (cit. on p. 4, 9)[LWW18] Andrea Lincoln, Virginia Vassilevska Williams, and R. Ryan Williams. Tighthardness for shortest cycles and paths in sparse graphs. In

Proc. 29th AnnualACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1236–1252, 2018. doi:10.1137/1.9781611975031.80 . (cit. on p. 2)[Ren20] Hanlin Ren. Improved distance sensitivity oracles with subcubic preprocessing time. In

Proc. 28th European Symposium on Algorithms (ESA) , volume 173 of

LIPIcs , pages 79:1–79:13, 2020. doi:10.4230/LIPIcs.ESA.2020.79 . (cit. on p. 2, 3, 4, 6, 8, 14, 18)[San05] Piotr Sankowski. Shortest paths in matrix multiplication time. In

Proc. 13th Euro-pean Symposium on Algorithms (ESA) , volume 3669 of

LNCS , pages 770–778, 2005. doi:10.1007/11561071\_68 . (cit. on p. 3, 5)[Sch80] Jacob T. Schwartz. Fast probabilistic algorithms for veriﬁcation of polynomial identities.

Journal of the ACM , 27(4):701–717, 1980. doi:10.1145/322217.322225 . (cit. onp. 6)[Sei95] Raimund Seidel. On the all-pairs-shortest-path problem in unweighted undi-rected graphs.

Journal of Computer and System Sciences , 51(3):400–403, 1995. doi:10.1006/jcss.1995.1078 . (cit. on p. 2, 18)[SM50] Jack Sherman and Winifred J. Morrison. Adjustment of an inverse matrix corresponding to achange in one element of a given matrix.

The Annals of Mathematical Statistics , 21(1):124–127, 1950. URL: . (cit. on p. 6)[SP21] Karthik C. S. and Merav Parter. Deterministic replacement path covering. In

Proc. 32ndAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 704–723, 2021. doi:10.1137/1.9781611976465.44 . (cit. on p. 2)[Sto03] Arne Storjohann. High-order lifting and integrality certiﬁcation.

Journal of Symbolic Com-putation , 36(3-4):613–648, 2003. doi:10.1016/S0747-7171(03)00097-X . (cit.on p. 6) 20SYZ11] Asaf Shapira, Raphael Yuster, and Uri Zwick. All-pairs bottleneck paths in vertex weightedgraphs.

Algorithmica , 59(4):621–633, 2011. doi:10.1007/s00453-009-9328-x .(cit. on p. 16)[SZ99] Avi Shoshan and Uri Zwick. All pairs shortest paths in undirected graphs with integerweights. In

Proc. 40th Annual IEEE Symposium on Foundations of Computer Science(FOCS) , pages 605–615, 1999. doi:10.1109/SFFCS.1999.814635 . (cit. on p. 2,18)[vdBS19] Jan van den Brand and Thatchaphol Saranurak. Sensitive distance and reachability oraclesfor large batch updates. In

Proc. 60th Annual IEEE Symposium on Foundations of ComputerScience (FOCS) , pages 424–435, 2019. doi:10.1109/FOCS.2019.00034 . (cit. on p.3, 7, 18)[Woo50] Max A Woodbury. Inverting modiﬁed matrices.

Memorandum report , 42(106):336, 1950.(cit. on p. 6)[WY13] Oren Weimann and Raphael Yuster. Replacement paths and distance sensitivity oraclesvia fast matrix multiplication.

ACM Transactions on Algorithms , 9(2):14:1–14:13, 2013. doi:10.1145/2438645.2438646 . (cit. on p. 2, 18)[Zip79] Richard Zippel. Probabilistic algorithms for sparse polynomials. In

Symbolic andAlgebraic Computation, EUROSAM ’79 , volume 72 of

LNCS , pages 216–226, 1979. doi:10.1007/3-540-09519-5\_73 . (cit. on p. 6)[ZLS12] Wei Zhou, George Labahn, and Arne Storjohann. Computing minimal nullspace bases.In

Proc. 37th International Symposium on Symbolic and Algebraic Computation, (ISSAC) ,pages 366–373, 2012. doi:10.1145/2442829.2442881 . (cit. on p. 10, 11)[ZLS15] Wei Zhou, George Labahn, and Arne Storjohann. A deterministic algorithmfor inverting a polynomial matrix.

Journal of Complexity , 31(2):162–173, 2015. doi:10.1016/j.jco.2014.09.004 . (cit. on p. 3, 4, 10, 11, 12)[Zwi02] Uri Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication.

Journal of the ACM , 49(3):289–317, 2002. doi:10.1145/567112.567114 . (cit. onp. 2, 3, 14, 15, 17, 18)

A Omitted Proofs in Section 2

Lemma 2.1.

Let a, b, c, r be positive real numbers, then r + ω ( a, b, c ) ≤ ω ( a, b + r, c + r ) .Proof. The proof is adapted from [Blä13, Lemma 7.7]; readers familiar with tensor and tensor rank mayrefer to the proof of that lemma.Let g ( n ) = MM ( n a , n b + r , n c + r ) , then it is easy to see that in g ( n ) operations we can compute n r matrix multiplication instances of size n a × n b × n c . We will use induction to prove that for everyinteger k , n r matrix multiplication instances of size n ka × n kb × n kc can be computed in ⌈ g ( n ) /n r ⌉ k · n r operations.The case for k = 1 is trivial. When k > , we can compute n r matrix multiplication instances of size n ka × n kb × n kc as follows. First, we partition every size- ( n ka × n kb ) matrix into size- ( n ( k − a × n ( k − b ) blocks, and partition every size- ( n kb × n kc ) matrix into size- ( n ( k − b × n ( k − c ) blocks. Then we canreduce the problem to computing n r matrix multiplication instances of size n a × n b × n c using “bigoperations”, where each “big operation” is a matrix multiplication instance of size n ( k − a × n ( k − b × n ( k − c . It sufﬁces to perform g ( n ) “big operations”. On the other hand, by the induction hypothesis,we can perform each n r “big operations” in ⌈ g ( n ) /n r ⌉ k − · n r operations. By partitioning these g ( n ) n r , we can compute all these “big operations” in ⌈ g ( n ) /n r ⌉ ·⌈ g ( n ) /n r ⌉ k − · n r operations, and we are done.Now it is easy to see that ω ( a, b, c ) ≤ inf n,k n log n k (cid:16) ⌈ g ( n ) /n r ⌉ k · n r (cid:17)o ≤ inf n { log n ⌈ g ( n ) /n r ⌉} = ω ( a, b + r, c + r ) − r. Lemma 2.2.

Consider the function f ( τ ) = ω (1 , − τ, − τ ) , where τ ∈ [0 , . Then τ + f ( τ ) ismonotonically non-increasing in τ , and τ + f ( τ ) is monotonically non-decreasing in τ .Proof. Let ≤ τ < τ ≤ . Then:• By Lemma 2.1, ( τ − τ ) + ω (1 , − τ , − τ ) ≤ ω (1 , − τ , − τ ) , which means τ + f ( τ ) ≥ τ + f ( τ ) .• For every integer n , we can compute the product of an n × n − τ matrix and an n − τ × n − τ matrix, by using n τ − τ ) invocations of multiplication algorithms for matrices of dimension n × n − τ and n − τ × n − τ . Therefore ω (1 , − τ , − τ ) ≤ τ − τ ) + ω (1 , − τ , − τ ) ,which means τ + f ( τ ))