Deterministic Replacement Path Covering
DDeterministic Replacement Path Covering
Karthik C. S. ∗ Tel Aviv University [email protected]
Merav Parter † Weizmann Institute of Science [email protected]
Abstract
In this article, we provide a unified and simplified approach to derandomize central results inthe area of fault-tolerant graph algorithms. Given a graph G , a vertex pair ( s, t ) ∈ V ( G ) × V ( G ),and a set of edge faults F ⊆ E ( G ), a replacement path P ( s, t, F ) is an s - t shortest path in G \ F .For integer parameters L, f , a replacement path covering ( RPC ) is a collection of subgraphs of G , denoted by G L,f = { G , . . . , G r } , such that for every set F of at most f faults (i.e., | F | ≤ f )and every replacement path P ( s, t, F ) of at most L edges, there exists a subgraph G i ∈ G L,f that contains all the edges of P and does not contain any of the edges of F . The covering valueof the RPC G L,f is then defined to be the number of subgraphs in G L,f .In the randomized setting, it is easy to build an (
L, f )- RPC with covering value of O (max { L, f } min { L,f } · min { L, f } · log n ), but to this date, there is no efficient deterministic al-gorithm with matching bounds. As noted recently by Alon, Chechik, and Cohen (ICALP 2019)this poses the key barrier for derandomizing known constructions of distance sensitivity oraclesand fault-tolerant spanners. We show the following: • There exist efficient deterministic constructions of (
L, f )- RPC s whose covering values al-most match the randomized ones, for a wide range of parameters. Our time and valuebounds improve considerably over the previous construction of Parter (DISC 2019). Ouralgorithms are based on the introduction of a novel notion of hash families that we call
Hit and Miss hash families. We then show how to construct these hash families from(algebraic) error correcting codes such as Reed-Solomon codes and Algebraic-Geometriccodes. • For every
L, f , and n , there exists an n -vertex graph G whose ( L, f )- RPC covering value isΩ( L f ). This lower bound is obtained by exploiting connections to the problem of designingsparse fault-tolerant BFS structures.An applications of our above deterministic constructions is the derandomization of the alge-braic construction of the distance sensitivity oracle by Weimann and Yuster (FOCS 2010). Thepreprocessing and query time of the our deterministic algorithm nearly match the randomizedbounds. This resolves the open problem of Alon, Chechik and Cohen (ICALP 2019).Additionally, we show a derandomization of the randomized construction of vertex fault-tolerant spanners by Dinitz and Krauthgamer (PODC 2011) and Braunschvig et al. (Theor.Comput. Sci., 2015). The time complexity and the size bounds of the output spanners nearlymatch the randomized counterparts. ∗ This work was partially supported by the Israel Science Foundation (grant number 552/16) and the Len Blavatnikand the Blavatnik Family foundation. † Partially supported by the Israel Science Foundation (grant number 2084/18). a r X i v : . [ c s . D S ] A ug ontents L, f )-Replacement Path Covering . . . . . . . . . . . . . . . . 81.2.2 Derandomization of Weimann-Yuster
DSO . . . . . . . . . . . . . . . . . . . . 91.3 Gap between Det. and Randomized (
L, f )-Replacement Path Covering . . . . . . . . 9
L, f ) Covering . . . . . . . . . . . . . . . . . . 102.2 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 ( L, f ) -Replacement Path Covering 185 Lower Bounds for ( L, f ) -Replacement Path Covering 246 Derandomization of the Algebraic DSO by Weimann and Yuster 26
A Comparison with [Par19a] and [BDR20] 38B Missing Proofs 39C Improved
RPC given Input Sets 40 Introduction
Resilience of combinatorial graph structures to faults is a major requirement in the design of moderngraph algorithms and data structures. The area of fault tolerant (FT) graph algorithms is a rapidlygrowing subarea of network design in which resilience against faults is taken into consideration. Thecommon challenge addressed in those algorithms is to gain immunity against all possible fault eventswithout losing out on the efficiency of the computation. Specifically, for a given graph G and somebound f on the number of faults, the FT-algorithm is required, in principle, to address all (cid:0) | E ( G ) | f (cid:1) fault events, but (usually) using considerably less space and time. The traditional approach tomitigate these challenges is based on a combinatorial exploration of the structure of the graphunder faults. While this approach has led to many exciting results in the area, it is however limitedin two aspects. First, in many cases the combinatorial characterization is considerably harder whenmoving from a single failure event to events with two or more failures. Second, this characterizationis mostly problem specific and rarely generalizes to more than one class of problems.One of the most notable techniques in this area which overcomes the aforementioned twolimitations is the fault-tolerant sampling technique introduced by Weimann and Yuster [WY13].This technique is inspired by the color-coding technique [AYZ95], and provides a general recipefor translating a given fault-free algorithm for a given task into a fault-tolerant one while paying arelatively small overhead in terms of computation time and other complexity measures of interest(e.g., space). Indeed this approach has been applied in the context of distance sensitivity oracles[GW20, GW20, CC20b], fault-tolerant spanners [DK11, BCPS15, DR20a], fault-tolerant reachabil-ity preservers [CC20a], distributed minimum-cut computation [Par19a], and resilient distributedcomputation [PY19b, PY19a, CPT20, HP20]. The high-level idea of this technique is based onsampling a (relatively) small number of subgraphs G , . . . , G (cid:96) of the input graph G by oversam-pling edges (or nodes) to act as faulty-edges, in a way that a single sampled subgraph accounts forpotentially many fault events. An additional benefit of this approach is that it smoothly extendsto accommodate multiple edge and vertex faults.Two central applications of the above approach that we focus on are distance sensitivity oraclesand fault-tolerant spanners. An f -sensitivity distance oracle ( f - DSO ) is a data-structure thatreports shortest path distances when at most f edges of the graph fail. Weimann and Yuster[WY13] employed the above technique to provide the first randomized construction of f - DSO for n -vertex directed graphs accomodating f = O (log n/ log log n ) many number of faults. Their data-structure has subcubic preprocessing time and subquadratic query time, and these bounds are stillthe state-of-the-art results for a wide range of parameters. Recently, van-den Brand and Saranurak[vdBS19] presented a randomized monte-Carlo DSO that can handle f ≥ log n updates. For smalledge weights, their bounds improve over [WY13]. For the single failure case, Grandoni and Williams[GW20] also employed the sampling technique to provide an improved 1- DSO with subquadraticpreprocessing time and sublinear query time. Very recently, Chechik and Cohen [CC20b] improvedtheir construction and obtained subcubic preprocessing time with (cid:101) O (1) query time. Since the keyrandomized component in these DSO constructions is the sampling of the subgraphs { G i } i ∈ [ (cid:96) ] , Alon,Chechik and Cohen [ACC19] posed the following question (stated specifically here for f - DSO s): “It remains an open question if there exists a
DSO with subcubic deterministicpreprocessing algorithm and subquadratic deterministic query algorithm,matching their randomized equivalents”. n -vertex graph G , and integer parameters f and k , an f -fault-tolerant k -spanner H ⊆ G is a subgraph that contains a k -spanner in G \ F for any set F ⊆ V of atmost f vertices in G . The problem of designing sparse fault-tolerant spanners resilient to vertexfaults was introduced by Chechik et al. [CLPR10]. Using a careful combinatorial construction theyshowed that one can build such spanners while paying an additional overhead of k f in the size ofthe output spanner (when compared to the standard k -spanner). Dinitz and Krauthgamer [DK11]simplified and improved their construction. Using the sampling technique with the right settingof parameters, they provided a meta-algorithm for constructing fault-tolerant spanners where thetime and size overheads are bounded by the factor O ( k − /f ). Their approach was later extendedby Braunschvig et al. [BCPS15] to provide the first (and currently state-of-the-art) constructionsof nearly-additive fault-tolerant spanners. Very recently, Chakraborty and Choudhary [CC20a]employed this technique to provide a randomized construction of strong-connectivity preservers ofdirected graphs under f failures with (cid:101) O ( f f · n − /f ) edges. To this date, there are no known effi-cient deterministic constructions that match the size bounds of these above-mentioned randomizedconstructions.In this work we provide a unified and simplified approach for derandomizing the above men-tioned central results. We introduce the notion of replacement path covering ( RPC ) which capturesthe key properties of the collection of sampled subgraphs obtained by the FT-sampling technique.Given a graph G , a vertex pair ( s, t ) ∈ V ( G ) × V ( G ), and a set of edge faults F ⊆ E ( G ), a re-placement path P ( s, t, F ) is an s - t shortest path in G \ F . To avoid repetitive descriptions, wemostly consider in this paper the setting of edge faults. However, all our definitions of RPC andtheir constructions naturally extend to vertex faults.
Definition 1 (Replacement Path Covering (
RPC )) . A subgraph G (cid:48) ⊆ G covers a replacement path P ( s, t, F ) if P ( s, t, F ) ⊆ G (cid:48) and F ∩ E ( G (cid:48) ) = ∅ . A collection of subgraphs of G , say G L,f , is an ( L, f ) - RPC if for every s, t ∈ V and every F ⊆ E such that | F | ≤ f , we have that each P ( s, t, F ) replacement path with at most L edges iscovered by some subgraph G (cid:48) in G L,f . The covering value ( CV ) of an ( L, f ) - RPC G L,f is the numberof subgraphs in G L,f , i.e., CV ( G L,f ):= |G L,f | . In some algorithmic applications of (
L, f )- RPC , we have that L ≤ f and in others applicationswe have L > f . However, for simplicity of the discussion of this paragraph, we assume that
L > f .The FT-sampling technique provides an efficient randomized procedure for computing an (
L, f )- RPC of covering value r = c · f L f log n for some constant c (e.g., Lemma 2 in [GW20]): Sample r subgraphs G , . . . , G r where each G i ⊆ G is formed by sampling each edge e ∈ E ( G ) into G i independently with probability p = 1 − /L . By taking c to be large enough, it is easy to showthat a subgraph G i covers a fixed P ( s, t, F ) with probability of Ω(1 /L f ). Thus by using Chernoffand employing the union bound over all n O ( f ) distinct P ( s, t, F ) paths, one gets that this graphcollection is an ( L, f )- RPC , with high probability (see Lemma 7 for a formal proof). The computa-tion time of this randomized procedure is O ( r · m ) (where m := | E ( G ) | ). Alon, Chechik and Cohen[ACC19] noted that in many settings, the deterministic computation of ( L, f )- RPC poses the mainbarrier for derandomization, and raised the following question: In case there are multiple s - t shortest paths in G \ F with at most L edges, it is sufficient to cover one of them. What is the minimum r such that we can deterministically computesuch graphs G . . . , G r in (cid:101) O ( n r ) time such that for every P ( s, t, F ) on at most L nodes there is a subgraph G i that does not contain F but contains P ( s, t, F ) ?” [ACC19] also mentioned that it is not clear how to efficiently derandomize a degenerated versionof the above construction and proposed some relaxation of these requirements, for which we indeedobtain improved bounds in this paper.Independently to the work of [ACC19], Parter [Par19a] recently provided a deterministicconstruction of ( L, f )- RPC for the purposes of providing an efficient distributed computation ofsmall cuts in a graph. These
RPC s are obtained by introducing the notion of ( n, k ) universal hashfunctions. For the purpose of small cuts computation, L was taken to be the diameter of the graph,and f was considered to be constant. The goal in [Par19a] was to provide an ( L, f )- RPC of value poly ( L ). Their construction in fact yields a value of L f +1 . This value is already too large forseveral applications such as the DSO by [WY13]. Indeed, for our centralized applications, it isdesirable to improve both the computation time as well as the covering value of these (
L, f )- RPC constructions, and to match (to the extent possible) the bounds of their randomized counterparts.
We take a principled approach for efficiently computing almost optimal (
L, f )- RPC for a wide rangeof parameters of interest. Our algorithms extend the approach of [Par19a] and are based on theintroduction of a novel notion of hash families that we call
Hit and Miss ( HM ) hash families. Weshow how any Boolean alphabet HM hash family can be used to build a RPC , and in turn givenear optimal constructions of HM hash family based on (algebraic) error correcting codes such asReed-Solomon codes and Algebraic-Geometric codes. Our key result is as follows: Theorem 2 (( L, f )– RPC ) . Given a graph G on m edges, length parameter L , and fault parameter f , there is a deterministic algorithm A for computing an ( L, f ) - RPC of G denoted by G L,f suchthat, CV ( G L,f ) ≤ ( αcLf ) b +1 , if a ≥ m / c , for some constant c ∈ N , ( αLf ) b +2 · log m, if a = m o (1) and b = Ω(log m ) , ( αLf ) b +2 · log m, if a ≤ log m, ( αLf log m ) b +1 , otherwise,where a = max { L, f } , b = min { L, f } , and α ∈ N is some small universal constant. Moreover, therunning time of A denoted by T ( A ) is, T ( A ) = (cid:26) m o (1) · CV ( G L,f ) if a = m o (1) and b = Ω(log m ) ,m · (log m ) O (1) · CV ( G L,f ) , otherwise. This resolves the open problem of Alon, Chechik and Cohen [ACC19] and considerably im-proves over the bounds of the second author [Par19a] in the entire range of parameters. We furtherimprove on the parameters of Theorem 2 (see Theorem 48) when instead of accounting for all fault In [Par19a], the term (
L, f )- RPC is not used, and instead the deterministic algorithm is referred to as a deran-domization of the FT-sampling technique.
RPC s are designed to handle faults in graphs, and error correcting codes areconstructed to handle errors in messages. Both do this by adding redundancy to the underlyinginformation in some way: the encoding of a message adds many new coordinates to the messagewithout adding any new additional information, and similarly
RPC of a graph is a redundant wayto represent a graph, as we only store subgraphs of the same original graph. In this work, weformalize this meta-connection to an extent through the ideas involved in proving Theorem 2.
Lower Bound for ( L, f ) - RPC s. We also prove lower bounds on the covering value of
RPC , whichto the best of our knowledge had not been addressed before. That is, despite the ubiquity of theFT-sampling approach to build (
L, f )- RPC s, it is still unclear whether the bound that it provideson the covering value is the best possible. This question is interesting even if the items to becovered correspond to arbitrary subsets of edges. The question becomes even more acute in oursetting where the covered items are structured, i.e., correspond to shortest-paths in some underlyingsubgraphs. The optimality of the randomized procedure in this context is even more questionable,as it is totally invariant to the structure of the graph. In principle, one might hope to improvethese bounds by taking the graph structure into account.Perhaps surprisingly we show that the covering values obtained by the randomized FT-samplingprocedure are nearly optimal, at least for the setting where L ≥ f . Since our deterministic boundsalmost match the randomized ones, we obtain almost-optimality for our bounds. Theorem 3 (Lower Bound for the Covering Value of (
L, f )- RPC ) . For every integer parameters n , L, and f such that ( L/f ) f +1 ≤ n , there exists an n -vertex weighted graph G ∗ = ( V, E, w ) , suchthat any ( L, f ) - RPC of G has CV of Ω((
L/f ) f ) . Interestingly, the lower bound graph is obtained by employing slight modifications to thelower bound graphs used by [Par15] in the context of fault-tolerant
FT-BFS structures. For agiven (possibly weighted) graph G = ( V, E ) and a source vertex s ∈ S , a subgraph H ⊆ G is an f -fault-tolerant (FT)-BFS if dist( s, t, H \ F ) = dist( s, t, G \ F ) for every vertex t ∈ V and everysequence of F edge faults. The definition can be naturally extended to vertex faults as well. Thesecond author and Peleg [PP16] presented a lower-bound construction for f = 1 with Ω( n / ) edges.The second author extended this lower bound construction to any f ≥ n − / ( f +1) ) edges [Par15]. We show that a slight modification to the (unweighted) lower-boundgraph of [Par15] by means of introducing weights, naturally implies a lower bound for the coveringvalue of an ( L, f )- RPC . Derandomization of the Algebraic
DSO by Weimann-Yuster.
Our key application of theconstruction of efficient (
L, f )- RPC is for implementing the algebraic
DSO of [WY13]. [ACC19]presented a derandomization of the combinatorial f - DSO of [WY13], resulting with a preprocessingtime of (cid:101) O ( n − α ) and a query time of (cid:101) O ( n − α/f ), matching the randomized bounds of [WY13]. Inthis paper we focus on derandomizing the algebraic algorithm of [WY13] as the latter can beimplemented in subcubic preprocessing time and subquadratic query time. We show: Theorem 4.
Let G = ( V, E ) be a directed n -vertex m -edge graph with real edge weights in [ − M, M ] .There exists a deterministic algorithm that given G and parameters f = O (log n/ log log n ) and < α < , constructs an f -sensitivity distance oracle in time . O ( M n . /f − α · ( c (cid:48) f ) f +1 ) if α = 1 /c for some constant c ,2. O ( M n . /f − α · ( c (cid:48) f log n ) f +1 ) if α = o (1) ,for some constant c (cid:48) . Given a query ( s, t, F ) with s, t ∈ V and F ⊆ E ∪ V being a set of at most f edges or vertices, the deterministic query algorithm computes in O ( n − − α ) /f ) time the distancefrom s to t in the graph G \ F . Observe that for constant number of at least f ≥ O ( n − − α ) /f ).This is because our algorithm also integrates ideas and optimizations from [ACC19] and [CC20b].This resolves the open problem of [ACC19] concerning existence of deterministic DSO with sub-quadratic preprocessing time and subquadratic query time (at least with small edge weights).While the deterministic (
L, f )- RPC of Theorem 2 constitutes the key tool for the derandom-ization, the final algorithm requires additional effort. Specifically, we use the notion of FT-treesintroduced in [ACC19] for the purpose of the deterministic combinatorial
DSO . We provide animproved algebraic construction of these trees using the (
L, f )- RPC s. One obstacle that we needto handle is that the approach of [ACC19] assumed that shortest path are unique by providing analgorithm that breaks the ties in a consistent manner. In our setting, the computation time of thisalgorithm is too heavy and thus we avoid this assumption, by making more delicate arguments.
Derandomization of Fault-Tolerant Spanner Constructions.
Finally, we show that the in-tegration of the (
L, f )- RPC of Theorem 2 into the existing algorithms for (vertex) fault-tolerantspanners provide the first deterministic constructions of these structures. The running time andthe size bounds of the spanners nearly match the one obtained by the randomized counter parts.Specifically, for f -fault tolerant multiplicative spanners, we provide a nearly-optimal derandom-ization of the Dinitz and Krauthgamer’s construction [DK11]. This follows directly by using ourvertex variant of ( L, f )- RPC of Theorem 2 with L = 2. A subgraph H ⊆ G is an f -fault tolerant t -spanner if dist( s, t, H \ F ) ≤ t · dist( s, t, G \ F ) for every s, t, F ⊆ V , | F | ≤ f . We show: Theorem 5 (Derandomized of Theorem 2.1 of [DK11], Informal) . If there is a deterministic al-gorithm A that on every n -vertex m -edge graph builds a t -spanner of size s ( n ) and time τ ( n, m, t ) ,then there is an algorithm that on any such graph builds an f -fault tolerant t -spanner of size (cid:101) O ( f · s (2 n/f )) and time (cid:101) O ( f ( τ (2 n/f, m, t ) + m )) . The above derandomization matches the randomized construction of [DK11] upto a multiplica-tive factor of log n in the size and time bounds. In the same manner, we also apply derandomizationfor the nearly-additive fault-tolerant spanners of Braunschvig et al. [BCPS15]. This provides thefirst deterministic constructions of nearly additive spanners. Comparison with a recent independent work of [BDR20].
Independent to our work,[BDR20] presented a new slack version of the greedy algorithm from [BDPW18, DR20b] to obtaina (vertex) fault-tolerant spanners with optimal size bounds. Their main algorithm is randomizedwith and the emphasis there is on optimizing the size of the output spanner. To derandomize theirconstruction, [BDR20] used the notion of universal hash functions to compute deterministically an( L = 2 , f )- RPC of covering value (cid:101) O ( f ) for f ≤ n o (1) and a value of (cid:101) O ( f ) for f ≥ n c for someconstant c . Using our ( L = 2 , f )-RPC of Theorem 2 yields a covering value of (cid:101) O ( f ) for every alue f . Up to a logarithmic factor, our bounds match the value of the randomized construction.The quality of the spanner construction of [BDR20] depends, however, not only on the value ofthe covering, but rather also on additional useful properties. These properties are also addressedin our paper for the sake of the applications of derandomizing the works of [DK11, WY13]. Inparticular, we show that our ( L = 2 , f )-RPC with (cid:101) O ( f ) subgraphs also satisfies the desired prop-erties in the same manner as provided by the randomized construction. Consequently, by using our( L = 2 , f )-RPCs in the algorithm of [BDR20], we can close the gap of Theorem 1.2 of [BDR20]and get a deterministic construction which matches the randomized time bounds (of Theorem 1.1in [BDR20]) for any value of f . In Appendix A, we provide a further detailed comparison to therelated constructions of [Par19a] and [BDR20]. In addition, we provide a proof sketch for improvingThm. 1.2 of [BDR20] (see Lemma 47). In this section, we detail some of the key techniques introduced in this paper. ( L, f ) -Replacement Path Covering While the introduction of the notion of
RPC is our key conceptual contribution, we elaborate inthis subsection on our framework to construct deterministic
RPC , which we also believe will be ofindependent interest.
Hit and Miss Hash Families.
We introduce a new notion of hash families called Hit andMiss ( HM ) Hash Families. Informally, given integer parameters N, a, b, and q , a family H of hashfunctions from [ N ] to [ q ] is said to be a HM hash family if for every pair of mutually disjointsubsets of [ N ], say ( A, B ), there exists a hash function h ∈ H such that every ( x, y ) ∈ A × B donot collide under h (see Definition 13 for a formal statement). We show that every error correctingcode with relative distance greater than 1 − ab can be seen as a HM hash family. This insight yieldsa systematic way to construct HM hash family. Connection to Replacement Path Covering.
We then consider HM hash family over theBoolean alphabet and associate the domain of the hash family with the edges (or vertices) of thegraph for which we would like to design a RPC . We observe that every hash function of the Boolean HM hash family immediately gives a subgraph in RPC , where we view the function as a Booleanvector of length equal to the number of edges in the graph, and thus the hash function acts as anindicator vector of whether to pick the edge or not in the subgraph. Moreover, the property of a
RPC always avoiding faults but containing the replacement path in at least one of the subgraphs(see Definition 1) exactly coincides with the definition of a Boolean HM hash family, and thus aBoolean HM hash family yields a RPC . Overview.
We now provide a short summary of our deterministic construction of (
L, f )- RPC (assuming L ≥ f ) for a graph G with m edges. We start from an error correcting code C overalphabet of size q , block length (cid:96) , message length log q m and relative distance greater than 1 − Lf .Next, we interpret C as a HM hash family from [ m ] to [ q ] with (cid:96) hash functions. Then we applythe alphabet reduction lemma to obtain a HM hash family from [ m ] to { , } with (cid:96) · q f many hashfunctions. Finally, using the connection between Boolean HM hash family and Replacement Path8overing, we construct an ( L, f )- RPC G L,f with covering value 2 · q f · (cid:96) in time CV ( G L,f ) · (cid:101) O ( m ). Inother words the alphabet size and the block length of the starting code C directly determines thecovering number of our RPC . Depending on the relationship between L and f we use either justReed-Solomon code or a concatenation of Algebraic-Geometric code (as outer code) with Reed-Solomon code (as inner code) to obtain the parameters given in Theorem 2. DSO
Our key contribution is in utilizing the (
L, f )- RPC to compute fault-tolerant trees with improvedtime bounds compared to that of [ACC19]. Fault tolerant trees were introduced by [CCFK17,ACC19] and specifically, in [ACC19] they served the basis for implementing the combinatorial
DSO implementation of [WY13]. For a given vertex pair s, t , and integer parameters
L, f , the FT-tree FT L,f ( s, t ) consists of O ( L f ) nodes , where each node is labeled by a pair (cid:104) P, F (cid:105) where P is an s - t path in G \ F with at most L edges, where F is a sequence of at most f faults which P avoids. Let d L ( s, t, G (cid:48) ) denote the weight of the shortest s - t paths in G (cid:48) among all s - t paths with at most L edges. The key application of FT-trees is that given a query ( s, t, F ) and the FT-tree FT L,f ( s, t ),one can compute d L ( s, t, G \ F ) in time O ( f log n ). [ACC19] provided an efficient combinatorialconstruction of all the FT-trees in time (cid:101) O ( m · n · L f +1 ), thus super-cubic time for dense graphs.By using our ( L, f )- RPC family G L,f , we provide an improved (algebraic) construction of thesetrees in sub-cubic time for graphs with small integer weights. The construction of these trees boilsdown into a simple computational task which we can efficiently solve using the (
L, f )- RPC . Thetask is as follows: given a triplet ( s, t, F ), compute d L ( s, t, G \ F ). To build the trees, it is requiredto solve this task for O ( n · L f ) triplets. Our algorithm starts by applying a variant of the All-Pair-Shortest-Path (APSP) in each of the subgraph G (cid:48) ∈ G L,f . This variant, noted as
AP SP ≤ L [CC20b] restricts attention to computing only the shortest paths that contain at most L edges,which can be done in time (cid:101) O ( M Ln ω ) using matrix multiplications. Then to compute d L ( s, t, G \ F )for a given triplet ( s, t, F ), we show that it is sufficient to consider a small collection of subgraphs G F ⊆ G L,f where |G F | = O ( f L log n ), and to return the minimum d L ( s, t, G (cid:48)(cid:48) ) over every G (cid:48)(cid:48) ∈ G F .Since the d L ( s, t, G (cid:48) ) distances are precomputed by the AP SP ≤ L algorithm, each d L ( s, t, G \ F )can be computed in (cid:101) O ( L ) time. ( L, f ) -Replacement Path Covering For the sake of discussion assume that f = O (1) and L = n (cid:15) for some constant (cid:15) . Our currentdeterministic constructions provide ( L, f )- RPC with covering value (cid:101) O ( L f +1 ) whereas the random-ized constructions obtain value of (cid:101) O ( L f ). This gap is rooted in the following distinction betweenthe randomized and deterministic constructions. For the purposes of the randomized construction,the ( L, f )- RPC should cover n O ( f ) replacement paths. The reason is that there are n O ( f ) possiblefault events, and for each sequence of F faults, the subgraph G \ F contains n shortest paths(i.e., replacement paths avoiding F ). In particular, if there are multiple s - t shortest-path in G \ F ,it is sufficient for the RPC to cover one of them. Since a single sampled subgraph G i covers agiven path P ( s, t, F ) with probability of c/L , by taking r = O ( f L f log n ) subgraphs, we get that P ( s, t, F ) is covered by at least one of the subgraphs with probability of 1 − /n c · f . Applying theunion bound over all n O ( f ) replacement paths establishes the correctness of the construction. Incontrast, our deterministic construction provides a covering for any P ( s, t, F ) paths, and also for9ny arbitrary collection of L edges A and f edges B with A ∩ B = ∅ . That is, since our constructiondoes not exploit the structure of the paths, it provides a covering for n Ω( L ) paths. Note that if therandomized construction would have required to cover n Ω( L ) paths rather than n O ( f ) , we wouldhave end-up having O ( L f +1 ) subgraphs in that covering as well. In other words, the current gap inthe bounds can be explained by the number of replacement paths that the ( L, f )- RPC are requiredto cover. Since in the deterministic constructions, it is a-priori unknown what would be the set ofreplacement paths that are required to be covered, they cover all n Ω( L ) possible paths.Importantly, in Appendix C, we consider a relaxed variant of the ( L, f )-RPC problem, intro-duced by [ACC19], for which we are able to provide nearly matching bounds to the randomizedconstruction. Specifically, in that setting, we are given as input a collection of k pairs { ( P, F ) } where P is a path with at most L edges and F is a set of at most f faults which P avoids. Wethen provide an efficient deterministic construction of a restricted ( L, f )- RPC family G of value (cid:101) O (log k · L f ), i.e., of the same value as obtained by the randomized construction. The graph col-lection G then satisfies that for every pair ( P, F ) in the input set, there is a subgraph G (cid:48) ∈ G suchthat P ⊆ G (cid:48) and G (cid:48) ∩ F = ∅ . This further demonstrates that the only reason for the gap betweenour deterministic and randomized bounds is rooted in the gap in the number of replacement pathsthat those constructions are required to cover. Notations.
Throughout this paper, G denotes a (possibly weighted) graph, V ( G ) denotes thevertex set of a graph G , and E ( G ) denotes the edge set of a graph G . In case the graph is weighted,the weights are integers in [ − M, M ]. For u, v ∈ V and a subgraph G (cid:48) , let dist( u, v, G (cid:48) ) denotethe shortest u - v path distance in G . For an x - y path P and y - w path P (cid:48) , let P ◦ P (cid:48) denote theconcatenation of the two paths. Also, for any n ∈ N and j ∈ N , we denote by (cid:0) [ n ] j (cid:1) the collection ofall subsets of size exactly j , by (cid:0) [ n ] ≤ j (cid:1) the collection of all subsets of size at most j , and by (cid:0) n ≤ j (cid:1) thesum (cid:80) i ∈ [ j ] (cid:0) ni (cid:1) . ( L, f ) Covering
For a weighted graph G = ( V, E, w ) and a path P ⊆ G , let | P | be the number of edges in P and let ω ( P ) = (cid:80) e ∈ P w ( e ) be the weighted sum of the edges in P . Let SP G ( s, t, F ) be thecollection of all s - t shortest path in G \ F . Every path P G ( s, t, F ) ∈ SP G ( s, t, F ) is called a replacement path . For a given integer L , let SP LG ( s, t, F ) be the collection of all the shortest s - t paths in G \ F that contain at most L edges. A path in SP LG ( s, t, F ) is referred to as P LG ( s, t, F ).Let d L ( s, t, G \ F ) = ω ( P LG ( s, t, F )). If SP LG ( s, t, F ) = ∅ , i.e., there is no path from s to t in G \ F containing at most L edges, then define P LG ( s, t, F ) = ∅ and d L ( s, t, G \ F ) = ∞ . For F = ∅ , we abbreviate P LG ( s, t, ∅ ) = P LG ( s, t ) as the shortest s - t path with at most L edges, and d L ( s, t, G ) = ω ( P LG ( s, t )) is the length of the path. When the graph G is clear from the context, wemay omit it and write P ( s, t, F ) and P L ( s, t, F ).The following lemma is obtained via the doubling method of [YZ05], recently used in [CC20b]. The algorithm provided in [YZ05] is randomized and it is described how to derandomize it with essentially noloss in efficiency in Sec 8 of [YZ05]. emma 6. [Lemma 5 of [CC20b]] For every n -vertex subgraph G (cid:48) ⊆ G , there is an algorithm thatcomputes { d L ( s, t, G (cid:48) ) , P L ( s, t, G (cid:48) ) } s,t ∈ V in time (cid:101) O ( LM n ω ) . The next lemma summarizes the quality of the randomized (
L, f )- RPC procedures as obtainedin [WY13] and [DK11]. The proof is deferred to Appendix B.
Lemma 7 (Randomized (
L, f )- RPC ) . For every n -vertex graph G = ( V, E ) and integer parameters L, f ≤ n , one can compute a collection G = { G , . . . , G r } of r subgraph such that w.h.p. G is an ( L, f ) - RPC , where r = O ( f · max { L, f } min { L,f } · log n ) . The computation time is O ( r · | E | ) . In this subsection, we recall the definition of error correcting codes and some standard code con-structions known in literature. We define below a notion of distance used in coding theory (called
Hamming distance) and then define error correcting codes with its various parameters.
Definition 8 (Distance) . Let Σ be a finite set and (cid:96) ∈ N , then the distance between x, y ∈ Σ (cid:96) isdefined to be: ∆( x, y ) = 1 (cid:96) · |{ i ∈ [ (cid:96) ] | x i (cid:54) = y i }| . Definition 9 (Error Correcting Code) . Let Σ be a finite set. For every (cid:96) ∈ N , a subset C ⊆ Σ (cid:96) issaid to be an error correcting code with block length (cid:96) , message length k , and relative distance δ if | C | ≥ | Σ | k and for every x, y ∈ C , ∆( x, y ) ≥ δ . We denote then ∆( C ) = δ . Moreover, we say that C is a [ k, (cid:96), δ ] q code to mean that C is a code defined over alphabet set of size q and is of messagelength k , block length (cid:96) , and relative distance δ . Finally, we refer to the elements of a code C ascodewords. For the results in this article, we require codes with certain extremal properties. First, werecall Reed-Solomon codes whose codewords are simply the evaluation of univariate polynomialsover a finite field.
Theorem 10 (Reed-Solomon Codes [RS60]) . For every prime power q , and every k ≤ q , thereexists a (cid:104) k, q, − k − q (cid:105) q code. These codes achieve the best possible tradeoff between the rate of the code (i.e., the ratio ofmessage length to block length) and the relative distance of the code in the large alphabet regimeas they meet the Singleton bound [Sin64]. However, if we desire codes with alphabet size muchsmaller than the block length then, Algebraic-Geometric codes [Gop70, TVZ82] are the best knownconstruction of codes achieving a good tradeoff between rate and relative distance (but do not meetthe Singleton bound). We specify below a specific construction of such codes.
Theorem 11 (Algebraic-Geometric Codes [GS96]) . Let p be a prime square greater than or equalto 49, and let q := p c for any c ∈ N . Then for every k ∈ N , there exists a (cid:104) k, k · √ q, − √ q (cid:105) q code. Finally, we recall here a well-known fact about code concatenation (for example see Chapter10.1 of [GRS19]).
Fact 12.
Let k, (cid:96) , (cid:96) , c, q ∈ N and let δ , δ ∈ [0 , . Suppose we are given a [ k, (cid:96) , δ ] q c outercode C and a [ c, (cid:96) , δ ] q inner code C . Then the concatenation of the two codes C ◦ C is a [ k, (cid:96) · (cid:96) , δ · δ ] q code. Hit and Miss Hash Families
In this section, we show the construction of a certain class of hash families which will subsequentlybe used to design a deterministic algorithm for computing an (
L, f )- RPC with a small CV . Belowwe define the notion of Hit and Miss hash families. Definition 13 (Hit and Miss Hash Family) . For every
N, a, b, (cid:96), q ∈ N such that b ≤ a , we say that H := { h i : [ N ] → [ q ] | i ∈ [ (cid:96) ] } is a [ N, a, b, (cid:96) ] q -Hit and Miss ( HM ) hash family if for every pairof mutually disjoint subsets A, B of [ N ] , where | A | ≤ a and | B | ≤ b , there exists some i ∈ [ (cid:96) ] suchthat: ∀ ( x, y ) ∈ A × B, h i ( x ) (cid:54) = h i ( y ) . (1) In the cases when
N, a, b is clear from the context, we simply refer to H as a [ (cid:96) ] q - HM hashfamily. Moreover, the computation time of a [ (cid:96) ] q - HM hash family is defined to be the time neededto output the (cid:96) × N matrix with entries in [ q ] whose ( i, x ) th entry is simply h i ( x ) (for h i ∈ H ). We begin our discussion by noting that there exist a naive [1] N - HM hash family and a naive (cid:104)(cid:0) N ≤ b (cid:1)(cid:105) - HM hash family. Our goal is to construct a [ (cid:96) ] - HM hash family with the smallest possiblevalue for (cid:96) , as this is important for the applications in the future sections. Towards this goal weprove the below theorem. Theorem 14 (Small Boolean Hit and Miss Hash Family) . Given integers
N, a, b such that b ≤ a ,there is a deterministic algorithm A for computing an [ N, a, b, (cid:96) ] - HM hash family where: (cid:96) ≤ ( αcab ) b +1 , if a ≥ N / c , for some constant c ∈ N , ( αab ) b +2 · log N, if a = N o (1) and b = Ω(log N ) , ( αab ) b +2 · log N, if a ≤ log N, ( αab log N ) b +1 , otherwise,for some small universal constant α ∈ N . Moreover, the running time of A denoted by T ( A ) is, T ( A ) = (cid:26) N o (1) · (cid:96) if a = N o (1) and b = Ω(log N ) ,N · (log N ) O (1) · (cid:96), otherwise. Note that the above theorem significantly improves on the naive (cid:104)(cid:0) N ≤ b (cid:1)(cid:105) - HM hash familywhenever ab (cid:28) N . Before we formally prove the above theorem, let us briefly outline our proofstrategy. Our approach is to start from the naive [1] N - HM hash family and first construct a [ (cid:96) ] q - HM hash family (for some q, (cid:96) ∈ N ) where we try to minimize the quantity (cid:0) q ≤ b (cid:1) · (cid:96) . The reason forminimizing q b · (cid:96) is because we show below how to start from a [ (cid:96) ] q - HM hash family and trade offthe size of the range of the hash function for the size of the hash family, in order to obtain an (cid:104) (cid:96) · (cid:0) q ≤ b (cid:1)(cid:105) - HM hash family. The reasoning behind naming them as
Hit and Miss
Hash Family is as follows. Fix A and B . There exists ahash function h in the family and a subset S of [ q ] of size at most b such that S completely hits h ( B ) and completely misses h ( A ). All other interpretations of the name “Hit and Miss” Hash Family are for the entertainment of thereader. emma 15 (Alphabet Reduction) . Given integers
N, a, b, q, (cid:96) such that b ≤ a , and a [ N, a, b, (cid:96) ] q - HM hash family H , there exists a (cid:104) N, a, b, (cid:96) · (cid:0) q ≤ b (cid:1)(cid:105) - HM hash family H (cid:48) which can be computed intime O ( q b · T H ) , where T H is the time needed to compute H .Proof. Given H := { h i : [ N ] → [ q ] | i ∈ [ (cid:96) ] } , we define H (cid:48) := { h (cid:48) i,S : [ N ] → { , } | i ∈ (cid:96), S ⊆ [ q ] , | S | ≤ b } as follows: ∀ ( i, S ) ∈ [ (cid:96) ] × (cid:18) [ q ] ≤ b (cid:19) , ∀ x ∈ [ N ] , h (cid:48) i,S ( x ) = (cid:40) h i ( x ) ∈ S, (cid:96) · (cid:0) q ≤ b (cid:1) many hash functions in H (cid:48) , and therefore in order to show that H (cid:48) is a (cid:104) N, a, b, (cid:96) · (cid:0) q ≤ b (cid:1)(cid:105) - HM hash family, it suffices to show that (1) holds. To see this fix anydisjoint sets A, B ⊆ [ N ] such that | A | ≤ a and | B | ≤ b . Since H is a[ N, a, b, (cid:96) ] q - HM hash family,there exists some i ∗ ∈ [ (cid:96) ] such that ∀ ( x, y ) ∈ A × B, we have h i ∗ ( x ) (cid:54) = h i ∗ ( y ) . (2)Consider the subset S ∗ := { h i ∗ ( y ) | y ∈ B } . Clearly | S ∗ | ≤ | B | ≤ b . Therefore we have that forevery y ∈ B , h (cid:48) i ∗ ,S ∗ ( y ) = 0. On the other hand from (2), we have that for all x ∈ A , h i ∗ ( x ) / ∈ S ∗ .Therefore, for every x ∈ A , h (cid:48) i ∗ ,S ∗ ( x ) = 1. Thus we have established (1). The computation time of H (cid:48) follows from noting that (cid:0) q ≤ b (cid:1) ≤ (1 + q ) b .As a simple demonstration of how we will use the above lemma, notice that if we combine theabove lemma with the naive [1] N - HM hash family, then we obtain the (cid:104)(cid:0) N ≤ b (cid:1)(cid:105) - HM hash family.Following the proof strategy we mentioned before the Lemma 15 statement, we focus now onconstructing non-trivial [ (cid:96) ] q - HM hash family, with the goal of minimizing the quantity (cid:0) q ≤ b (cid:1) · (cid:96) . Asa warm up, we show below a simple construction that achieves very good parameters. Lemma 16.
Given integers
N, a, b such that b ≤ a , there exists a [ N, a, b, ab log N ] O ( ab (log N ) ) - HM hash family.Proof. The family H we consider consists of all functions h p ( x ) = x ( mod p ) for the first 1 + ab log N prime numbers p . Note that the (1 + ab log N ) th prime number is at most 1 + 2 ab log N (1 + log a +log b +log log N ) = O ( ab (log N ) ). Thus, in order to show that H is a [ N, a, b, ab log N ] O ( ab (log N ) ) - HM hash family, we just need to show (1). Fix two disjoint sets A, B ⊆ [ N ] such that | A | ≤ a and | B | ≤ b . Consider the following quantity. α A,B := (cid:89) x ∈ A,y ∈ B | y − x | . Note that since | y − x | ∈ [0 , N ] for every ( x, y ) ∈ A × B , we have that α A,B ≤ N ab . It is knownthat the product of the first m primes (called primorial function) is upper bounded e m (1+ o (1)) . Let α (cid:48) ∈ [1 , α A,B ] be the number with the most number of prime factors. It is clear then that thenumber of prime factors of α (cid:48) is the largest m , for which we have e m (1+ o (1)) ≤ α (cid:48) ≤ N ab . Thisimplies m ≤ ab log N . Thus, α A,B has at most ab log N distinct prime factors. Therefore, given13ny set of 1 + ab log N prime numbers there must exist a prime that does not divide α A,B . On theother hand note that for ( x, y ) ∈ A × B and a prime p , we have that x ( mod p ) = y ( mod p ) impliesthat p divides α A,B . Thus, there must exist a prime in the first 1 + ab log N prime numbers forwhich we have x ( mod p ) (cid:54) = y ( mod p ) for all ( x, y ) ∈ A × B .We remark the above proof strategy of using (modulo) prime numbers has been used manytimes in literature, for example [AN96]. Next, we show a systematic way to construct a HM hashfamily from error correcting codes and then use specific codes to improve on the parameters of theabove lemma. Proposition 17.
Let
N, a, b, (cid:96) ∈ N and δ ∈ [0 , such that δ > − ab . Then, every (cid:2) log q N, (cid:96), δ (cid:3) q code can be seen as a [ N, a, b, (cid:96) ] q - HM hash family.Proof. Given a (cid:2) log q N, (cid:96), δ (cid:3) q code C , where for every i ∈ [ N ], C ( i ) denotes the i th codeword (undersome canonical labeling of the codewords of C ), we define the hash family H := { h i : [ N ] → q | i ∈ [ (cid:96) ] } as follows: ∀ i ∈ [ (cid:96) ] , ∀ x ∈ [ N ] , h i ( x ) = C ( x ) i , where C ( x ) i denotes the i th coordinate of C ( x ) (i.e., the i th coordinate of the x th codeword). Tosee that H is a [ N, a, b, (cid:96) ] q - HM hash family, we need to show (1). Fix disjoint A, B ⊆ [ N ] where | A | ≤ a and | B | ≤ b . For every ( x, y ) ∈ A × B we have:Pr i ∼ [ (cid:96) ] [ h i ( x ) (cid:54) = h i ( y )] = ∆( x, y ) ≥ δ. (3)By a simple union bound we have that,Pr i ∼ [ (cid:96) ] [ ∀ ( x, y ) ∈ A × B, h i ( x ) (cid:54) = h i ( y )] ≥ − ab · (1 − δ ) . (4)Finally, (1) follows by noting that δ > − ab .By a direct application of the parameters of Reed-Solomon codes (Theorem 10) to the aboveproposition we obtain the following. Corollary 18 (Reed-Solomon Hash Family) . Given integers
N, a, b such that b ≤ a , there exists a (cid:104) N, a, b, O (cid:16) ab log N log a (cid:17)(cid:105) O (cid:16) ab log N log a (cid:17) - HM hash family. Moreover, the computation time of the HM hashfamily is O (cid:0) abN (log N ) (cid:1) .Proof. Let q be the smallest prime greater than ab log N log a (note that q ∈ (cid:16) ab log N log a , ab log N log a (cid:17) ). Let C be the (cid:104) log q N, q, − log q Nq (cid:105) q code guaranteed from Theorem 10. From Proposition 17 we canthink of C as a [ N, a, b, q ] q - HM hash family since∆( C ) = 1 − log Nq log q > − log N log aab log N log a = 1 − ab . By noting that q < ab log N log a , we may say that C is a (cid:104) N, a, b, O (cid:16) ab log N log a (cid:17)(cid:105) O (cid:16) ab log N log a (cid:17) - HM hash family.14t is known that the generator matrix of Reed Solomon codes mentioned in Theorem 10 canbe constructed in near linear time of the size of the generator matrix [RS60]. Once we are giventhe generator matrix of C , outputting any codeword can be done in O ( q log log N ) time using FastFourier Transform. Therefore the computation of the corresponding HM hash family can be donein time O ( qN log log N ) = O ( abN log N log log N ).In fact, we obtain a (cid:104) ab log N log a +log b +log log N (cid:105) ab log N log a +log b +log log N - HM hash family from Reed-Solomoncodes but chose to write a less cumbersome version in the corollary statement. Note that while thesize of the Hash families of Lemma 16 and the above corollary are the same when a (cid:28) N o (1) , buteven in that case we save a log N factor in the alphabet size of the hash function.In order to explore further savings in the alphabet size of the hash function, we apply theparameters of Algebraic-Geometric codes (Theorem 11) to Proposition 17 and obtain the following. Corollary 19 (Algebraic-Geometric Hash Family) . Given integers
N, a, b such that b ≤ a , thereexists a [ O ( ab log N )] O ( a b ) - HM hash family. Moreover, the computation time of the HM hashfamily is O (cid:0) ( ab log N ) + N ab log N (cid:1) .Proof. Let p be the smallest prime greater than 3 ab (note that p ∈ (3 ab, ab )) and let q = p . Let C be the (cid:104) log q N, √ q · log q N, − √ q (cid:105) q code guaranteed from Theorem 11. From Proposition 17we can think of C as a (cid:2) N, a, b, √ q · log q N (cid:3) q - HM hash family since∆( C ) = 1 − p > − ab . By noting that q ≤ a b , we may say that C is a (cid:104) N, a, b, O (cid:16) ab log N log a (cid:17)(cid:105) O ( a b ) - HM hash family.It is known that the generator matrix of Algebraic-Geometric codes mentioned in Theorem 11can be constructed in near cubic time of the block length of the code [SAK + HM hash family can be done in time O (( ab log N ) + N ab log N ).However these parameters are worse than the parameters of Corollary 18 whenever ab (cid:29) log N . We construct below a specific code concatenation of Reed-Solomon codes andAlgebraic-Geometric codes that does indeed improve on the parameters of Corollary 18 for thesetting when a, b are not too small. Lemma 20.
Let p be a prime square greater than or equal to 49, and let q := p c for any c ∈ N .Then for every k ∈ N , there exists a (cid:104) k, k · q, − √ q (cid:105) √ q code.Proof. We concatenate the (cid:104) k, k · √ q, − √ q (cid:105) q code from Theorem 11 (treated as the outer code)with the (cid:104) , √ q, − √ q (cid:105) √ q code from Theorem 10 (treated as the inner code). From Fact 12, thisgives us the desired code. 15t is worth noting that while concatenation codes obtained by combining Reed-Solomon codesand Algebraic-Geometric codes have appeared many times in literature, to the best of our knowl-edge, this is the first time that Algebraic-Geometric codes are the outer code and Reed-Solomoncodes are the inner code (as Algebraic-Geometric codes are typically used for their small alphabetsize).An immediate corollary of Proposition 17 and Lemma 20 is the following. Corollary 21 (Concatenated Hash Family) . Given integers
N, a, b such that b ≤ a , there existsa (cid:104) N, a, b, O (cid:16) a b log N log a (cid:17)(cid:105) O ( ab ) - HM hash family. Moreover, the computation time of the HM hashfamily is O (cid:0) N · ( ab log N ) (cid:1) .Proof. Let p be the smallest prime greater than 4 ab (note that p ∈ (4 ab, ab )). Let q := p and C be the (cid:104) log q N, q · log q N, − p (cid:105) p code guaranteed from Lemma 20. From Proposition 17 we canthink of C as a (cid:2) N, a, b, q · log q N (cid:3) p - HM hash family since∆( C ) = 1 − p > − ab . By noting that p ≤ ab , we may say that C is a (cid:104) N, a, b, O (cid:16) a b log N log a (cid:17)(cid:105) O ( ab ) - HM hash family.It is known that the generator matrix of the codes mentioned in Theorem 11 (resp. Theorem 10)can be constructed in cubic time in the block length of the code [SAK +
01] (resp. linear time inthe block length of the code [RS60] as the message length is 2). Therefore the computation of thecorresponding HM hash family can be done in time O ( N · ( ab log N ) ).We finally wrap up by noting below that the proof of Theorem 14 follows from combiningLemma 15 with Corollaries 18 and 21. Proof of Theorem 14.
Suppose a ≥ N / c , for some constant c ∈ N then consider the (cid:104) O (cid:16) ab log N log a (cid:17)(cid:105) O (cid:16) ab log N log a (cid:17) - HM hash family from Corollary 18 and note that log N log a ≤ c . Let the al-phabet of this HM hash family be βcab , for some universal constant β . Then, we invoke Lemma 15on this [ O ( cab )] βcab - HM hash family to obtain the desired Boolean HM hash family. The computa-tion time of the final HM hash family is O (cid:0) ( βcab ) b · abN (log N ) (cid:1) = O (cid:0) N · (log N ) · ( βcab ) b +1 (cid:1) .Suppose a = N o (1) and b = Ω(log N ) (or suppose a ≤ log N ) then consider the (cid:104) O (cid:16) a b log N log a (cid:17)(cid:105) O ( ab ) - HM hash family from Corollary 21 and ignore the log a term in the denomi-nator in the expression for the size of the hash family. Let the alphabet of this HM hash family be β (cid:48) ab , for some universal constant β (cid:48) . Then, we invoke Lemma 15 on this (cid:2) O (cid:0) a b log N (cid:1)(cid:3) β (cid:48) ab - HM hash family to obtain the desired Boolean HM hash family. The computation time of the final HM hash family is O (cid:0) ( β (cid:48) ab ) b · N ( ab log N ) (cid:1) = O (cid:0) N · log N · ( β (cid:48) ab ) b +2 · ( ab · (log N ) ) (cid:1) . Notice thatif a ≤ log N then the expression ( ab · (log N ) ) is O (log N ). Otherwise if a = N o (1) then theexpression ( ab · (log N ) ) is still N o (1) .In every other case, consider the (cid:104) O (cid:16) ab log N log a (cid:17)(cid:105) O (cid:16) ab log N log a (cid:17) - HM hash family from Corollary 18and ignore the log a term in the denominator in the expressions for both the size of the hash16amily and the alphabet size. Let the alphabet of this HM hash family be β (cid:48)(cid:48) ab log N , for someuniversal constant β (cid:48)(cid:48) . Then, we invoke Lemma 15 on this [ O ( ab log N )] β (cid:48)(cid:48) ab log N - HM hash familyto obtain the desired Boolean HM hash family. The computation time of the final HM hash familyis O (cid:0) ( β (cid:48)(cid:48) ab log N ) b · N ab · (log N ) (cid:1) = O (cid:0) N · log N · ( β (cid:48)(cid:48) ab log N ) b +1 (cid:1) .In order to facilitate the applications in the next section we introduce the notation HM ( C ) todenote the following: given a code C , we first interpret it as a HM hash family in accordance withProposition 17 and then apply Lemma 15 to this hash family to obtain a Boolean HM hash family,denoted by HM ( C ). Optimaility of Reed-Solomon based HM hash family. We digress for a short discussion onthe optimality of the parameters of HM hash family constructed from Reed-Solomon codes. Thereare two reasons why one might suspect that the parameters of Corollary 18 can be improved.First is the union bound applied in (4). Second is the bounding of the number of disagreementsbetween two codewords by the relative distance in (3). It seems intuitively not reasonable thatthere exists two subsets of codewords say A and B such that for every pair of codewords in A × B there is a unique set of coordinates on which they agree. Additionally, the expected fraction ofdisagreements between any two Reed-Solomon codewords is 1 − /q and instead bounding it bythe relative distance, particularly when we are taking an union bound later in (4), seems to raiseconcerns if the analysis has slacks. Therefore we ask: Open Question 1.
Let a, b, d ∈ N . What is the smallest prime q such that the following holds?For every two disjoint subsets of degree d polynomials over F q , denoted by A and B , where | A | = a and | B | = b , we have that there exists some α ∈ F q such that no pair of polynomials in A × B evaluate to the same value at α . Clearly, from Proposition 17, we have that if q is at least dab + 1 then it suffices. But can weget away with a smaller value of q ? Perfect Hash Families.
We conclude the discussion on HM hash family by noting the connectionbetween HM hash family and the notion of Perfect hash families that has received considerableattention in literature (for example see [FK84, FKS84, SS90, AAB +
92, Nil94, AYZ95, NSS95,AN96, FN01, AG10]). If we replace (1) in Definition 13 with ∀ ( x, y ) ∈ S, x (cid:54) = y, we have h i ( x ) (cid:54) = h i ( y ) , (5)where S ⊆ [ N ] then it conincides with the notion of perfect hash families. In other words, HM hash family can be as a bichromatic variant of perfect hash families. Indeed a connection betweenerror correcting codes and perfect hash families (much like Proposition 17) was already knownin literature [Alo86]. We also remark that construction of perfect hash families based on AGcodes was also known in literature [WX01], but to the best of our knowledge, construction of hashfamilies based on the concatenated AG codes (with the specific parameters of Lemma 20) is a novelcontribution of this paper.Additionally, one may see the randomized construction of RPC in Lemma 7 as coloring eachedge with a random color in [ L ] if L ≥ f (resp. in [ f ] if f ≥ L ) and then randomly choosing one ofthe colors in [ L ] (resp. [ f ]) and deleting (resp. retaining) all the edges corresponding to that color.The randomized procedure stated in the above way is very closely related to the celebrated color17oding technique [AYZ95] and a well-known way to derandomize the color coding technique is viaperfect hash functions. In order to have certain applications, we introduce the following strengthening of Definition 13.
Definition 22 (Strong Hit and Miss Hash Family) . For every
N, a, b, (cid:96), q ∈ N such that b ≤ a , wesay that H := { h i : [ N ] → [ q ] | i ∈ [ (cid:96) ] } is a [ N, a, b, (cid:96) ] q -Strong Hit and Miss ( SHM ) hash family iffor every pair of mutually disjoint subsets
A, B of [ N ] , where | A | ≤ a and | B | ≤ b , we have: Pr i ∼ [ (cid:96) ] [ ∀ ( x, y ) ∈ A × B, h i ( x ) (cid:54) = h i ( y )] ≥ . (6) In the cases when
N, a, b is clear from the context, we simply refer to H as a [ (cid:96) ] q -Strong HM hash family. Similar to Corollaries 18 and 21, we can prove the following bounds for Strong HM hash family. Lemma 23.
Given integers
N, a, b such that b ≤ a , there exists: Reed-Solomon Strong HM hash family a (cid:104) N, a, b, O (cid:16) ab log N log a (cid:17)(cid:105) O (cid:16) ab log N log a (cid:17) -Strong HM hash fam-ily whose computation time is O (cid:0) abN (log N ) (cid:1) . Algebraic-Geometric Strong HM hash family a (cid:104) N, a, b, O (cid:16) a b log N log a (cid:17)(cid:105) O ( ab ) -Strong HM hashfamily whose computation time is O (cid:0) N · ( ab log N ) (cid:1) .Proof Sketch. The proof follows by noting the following. First, Proposition 17 can be strengthenedto say that if δ ≥ − ab then every (cid:2) log q N, (cid:96), δ (cid:3) q code can be seen as a [ N, a, b, (cid:96) ] q -Strong HM hash family. Second, Corollaries 18, 19, and 21 can be modified to yield Strong HM hash family(instead of just HM hash family), by simply choosing the alphabet value of the underlying codecurrently in the proofs to be at least twice (for Reed Solomon codes) or four times (for AG codesconcatenated with Reed Solomon codes) as large as what is currently written.In order to facilitate the applications in the next section we introduce the notation SHM ( C )to denote the following: given a code C , we first interpret it as a Strong HM hash family andthen apply Lemma 15 to this hash family to obtain a Boolean Strong HM hash family, denoted by SHM ( C ). ( L, f ) -Replacement Path Covering Equipped with the construction of Boolean Hit and Miss hash families from the previous section,we show in this section how to use them in order to efficiently construct
RPC .18 roposition 24.
Given a graph G on m edges and integer parameters L, f , and a [ m, max { L, f } , min { L, f } , (cid:96) ] - HM hash family H , we can construct an ( L, f ) - RPC of G denoted by G H L,f such that CV ( G H L,f ) = 2 · (cid:96) . Moreover, the construction of G H L,f can be done in time O ( m(cid:96) + T H ) ,where T H is the computation time of H .Proof. Label the edges of G using [ m ]. For every ( i, ρ ) ∈ [ (cid:96) ] × { , } , we construct a subgraph G i,ρ of G as follows: for every x ∈ [ m ], the edge with label x in G is retained in G i if and only if h i ( x ) = ρ . Then G H L,f is simply { G i,ρ | i ∈ [ (cid:96) ] , ρ ∈ { , }} .To see that G H L,f is an (
L, f )- RPC , fix any vertex pair ( s, t ) ∈ V ( G ) × V ( G ) and fix any faultset F := { e r , . . . , e r d } ⊆ E ( G ) where | F | = d ≤ f . Let P ( s, t, F ) be a replacement path with atmost L edges, i.e., P ( s, t, F ) = { e j , . . . , e j t } ⊆ E ( G ), where t ≤ L . Consider the following twosubsets of [ m ]: A = { j , . . . , j t } and B = { r , . . . , r d } . Note that since P ( s, t, F ) is a replacementpath we have A and B are disjoint subsets of [ m ]. From (1) we have that there exists some i ∗ ∈ [ (cid:96) ]such that for all ( x, y ) ∈ A × B we have h i ∗ ( x ) (cid:54) = h i ∗ ( y ). Therefore we have that if h i ∗ ( j ) = 0(resp. if h i ∗ ( j ) = 1) then in the graph G i ∗ , (resp. G i ∗ , ), we have that all edges of P ( s, t, F ) arepresent and all edges of F are absent.In order to justify the computation time of G H L,f , we first compute the (cid:96) × m Boolean matrix M H corresponding to H where the ( i, x ) th entry of M H is simply h i ( x ). After the computation of M H we simply go over each row of the matrix to build the subgraphs. Proof of Theorem 2.
The proof follows immediately by putting together Theorem 14 with Propo-sition 24 and noting that for every [
N, a, b, (cid:96) ] - HM hash family H used in Theorem 14, we have T H > N · (cid:96) . Remark 25.
For all the applications in this paper, we never use the construction of ( L, f ) - RPC given in Theorem 2 when a = m o (1) and b = Ω(log m ) (mainly because it has a prohibitive runtime), and the result for that regime is merely of interest for bounding the covering number. Useful properties of ( L, f ) - RPC when L ≥ f . A crucial property of the (
L, f )- RPC that isneeded for applications in the future section is that for every fixed set of faults F there will be onlya very small set of subgraphs in the covering set G L,f that avoid F . As we see below, we have thatthe construction of ( L, f )- RPC of Theorem 2 gives this additional property for free.
Theorem 26.
Let L ≥ f and f = o (log m ) , then one can compute an ( L, f ) - RPC G L,f with thesame CV and time bounds as in Theorem 2 that in addition satisfies the following property. Let F be a set of d ≤ f edge failures. Then, there exist a collection G F of at most f L · polylog ( m ) subgraphs in G L,f that satisfy the following: • Every subgraph in G F does not contain any of the edges in F . • For every vertex pair ( s, t ) and every P ( s, t, F ) path of length at most L , there exists a subgraph G (cid:48) ∈ G F that contains P ( s, t, F ) .Finally, given F and G L,f , one can detect the subgraphs in G F in time f dL · polylog ( m ) . The proof of the above theorem follows by the more general statement below about code basedconstructions of HM hash family, and applying to it the parameters of specific codes.19 emma 27. Given a graph G on m edges and integer parameters L, f, q, (cid:96) , and a (cid:2) log q m, (cid:96), δ (cid:3) q code C with relative distance δ > − Lf , then, the ( L, f ) - RPC G L,f given by Proposition 24 onproviding HM ( C ) has the following property. Let F be a set of d ≤ f edge failures. Then, thereexist a collection G F of at most (cid:96) subgraphs in G L,f that satisfy the following: • Every subgraph in G F does not contain any of the edges in F . • For every vertex pair ( s, t ) and every P ( s, t, F ) path of length at most L , there exists a subgraph G (cid:48) ∈ G F that contains P ( s, t, F ) .Moreover, given F and G L,f , one can detect the subgraphs in G F in time O ( d · ( (cid:96) + ev ( C ))) , where en ( C ) is the time needed to encode a message using C .Proof. For every i ∈ [ (cid:96) ] let S i ⊆ [ q ] be defined as: S i := { C ( r j ) i | j ∈ [ d ] } , where F = { e r , . . . , e r d } , and C ( r j ) i is the i th coordinate of the r th j codeword of C . For every i ∈ [ (cid:96) ] we include the subgraph G i in G F if and only if the only edges in G removed in G i are theones mapped to an element of S i under C i . It is clear that |G F | by definition is at most (cid:96) . Moreover,the computation time of the indices of the graphs in G F is O ( d · ( (cid:96) + en ( C ))) as once we encode the d edges of F using C , we can specify the indices of the subgraphs in G F explicitly as defined above.To note that G F is a subset of G L,f , notice that for every i ∈ [ (cid:96) ] and every S i as defined above,we have in HM ( C ) a hash function h : [ m ] → { , } which maps to 0 exactly those edges (labelsof edges) whose corresponding codeword on the i th coordinate is contained in S i (see the proof ofLemma 15 to verify this). Then, whence HM ( C ) is provided to Proposition 24, the graph G i, in G HM ( C ) L,f in the proof of Proposition 24 is precisely the graph G i in G F .All that is left to show are the structural properties of G F . By definition of G i , it is clear thatall the edges in F are removed in each G i . Furthermore, for every vertex pair ( s, t ) ∈ V ( G ) × V ( G )and every replacement path P ( s, t, F ) = { e j , . . . , e j t } with at most L edges, we have from (1) thatthere is some i ∗ ∈ [ (cid:96) ] such that for all κ ∈ [ t ], we have C ( j κ ) i ∗ / ∈ S i ∗ (i.e., we apply Proposition 17 on C to obtain a HM hash family and use (1) with A = { j , . . . , j t } and B = { r , . . . , r d } ). Thereforeall the edges of P ( s, t, F ) are retained in G i ∗ . Proof of Theorem 26.
Since we have L ≥ f and f = o (log m ), the bounds in Theorem 2 follow hereas well with setting a = L and b = f , while avoiding the case when b = Ω(log m ). In order tosee that the additional property holds, we only need to verify that for the Reed Solomon code C RS and the concatenated code C AG ◦ RS (from Lemma 20) when we plug in HM ( C RS ) and HM ( C AG ◦ RS )respectively into Lemma 27, that the parameters are as claimed in the theorem statement.The block length (cid:96) of C RS is set to be at most Lf log m log L in Corollary 18. If L ≥ m /c then |G F | ≤ (cid:96) = O ( cLf ) and otherwise we have |G F | ≤ (cid:96) = O ( Lf log m ).The block length (cid:96) of C AG ◦ RS is set to be at most L f log m log L in Corollary 21. Since we applythis bound to the case where L ≤ log m then |G F | ≤ (cid:96) = O ( L f log m ) = O ( Lf log m ).Plugging in the bound on the above block lengths of the two codes into Lemma 27 gives thebounds of the additional property in the theorem statement. Note that the encoding time of C RS (cid:96) · polylog ( m ) = Lf polylog ( m ) and while the encoding time of a codeword C AG ◦ RS is O ( (cid:96) ), since f ≤ L ≤ log m , we have that the encoding time of C AG ◦ RS is also polylog ( m ). Useful properties of ( L, f ) - RPC s when L ≤ f . Parts of the next theorem maybe morally seenas the analog of Theorem 26, only that for the setting of L ≤ f , we bound the number of subgraphsthat fully contain a given path segment with at most L edges. Theorem 28.
Let L ≤ f , then one can compute an ( L, f ) - RPC G L,f with the same CV and timebounds as in Theorem 2 that in addition satisfies the following property. Let P be a replacementpath segment of at most L edges. Then, there exist a collection G P subgraphs in G L,f that satisfythe following:(I1) |G P | = f L · polylog ( m ) .(I2) Given P and G L,f , one can detect the subgraphs in G P in time f dL · polylog ( m ) .(I3) Every subgraph in G P fully contains P .(I4) For every set F ⊆ E of at most f edges, there are at least |G P | / subgraphs in G P that fullyavoid F .(I5) Every subgraph in G P has at most mf many edges.(I6) Computing the subset of edges in each G i ∈ G L,f takes (cid:101) O ( mf ) time.Additionally, (I5) and (I6) when applied to the vertex variant RPC G vL,f over a graph G on n verticeswith vertex fault parameter f yield the following: (I5v) Every subgraph in G P has at most nf manyvertices and (I6v) computing the subset of vertices in each G i ∈ G vL,f takes (cid:101) O ( nf ) time. The proofs of (I1) to (I4) of the above theorem follow by the more general statement belowabout code based constructions of Strong HM hash family, and applying to it the parameters ofspecific codes. The proofs of (I5) and (I6) follows by a nice property of linear codes. Lemma 29.
Given a graph G on m edges and integer parameters L, f, q, (cid:96) , and a (cid:2) log q m, (cid:96), δ (cid:3) q code C with relative distance δ > − Lf , then, the ( L, f ) - RPC G L,f given by Proposition 24 onproviding
SHM ( C ) has the following property. Let P be a replacement path segment of d ≤ L edges. Then, there exist a collection G P of at most (cid:96) subgraphs in G L,f that satisfy the following: • Every subgraph in G P fully contains P . • For every set F ⊆ E of at most f edges, there are at least |G P | / subgraphs in G P that fullyavoid F .Moreover, given P and G L,f , one can detect the subgraphs in G P in time O ( d · ( (cid:96) + en ( C ))) , where en ( C ) is the time needed to encode a message using C . We recall Remark 25 to say that when a = m o (1) and b = Ω(log m ) in the statement of Theorem 2, the coveringnumber we aim to achieve is ( αLf log m ) b +1 instead of ( αLf ) b +2 · log m . roof. For every i ∈ [ (cid:96) ] let S i ⊆ [ q ] be defined as: S i := { C ( r j ) i | j ∈ [ d ] } , where P = { e r , . . . , e r d } . For every i ∈ [ (cid:96) ] we include the subgraph G i in G F if and only if theonly edges in G preserved in G i are the ones mapped to an element of S i under C i . It is clear that |G F | by definition is at most (cid:96) . Moreover, the computation time of the indices of the graphs in G P is O ( d · ( (cid:96) + en ( C ))) as once we encode the d edges of P using C , we can specify the indices of thesubgraphs in G P explicitly as defined above.To note that G P is a subset of G L,f , notice that for every i ∈ [ (cid:96) ] and every S i as defined above,we have in SHM ( C ) a hash function h : [ m ] → { , } which maps to 0 exactly those edges (labelsof edges) whose corresponding codeword on the i th coordinate is contained in S i (see the proof ofLemma 15 to verify this). Then, whence SHM ( C ) is provided to Proposition 24, the graph G i, in G SHM ( C ) L,f in the proof of Proposition 24 is precisely the graph G i in G P .All that is left to show are the structural properties of G P . By definition of G i , it is clear thatall the edges in P are preserved in each G i . Furthermore, for every set F := { e j , . . . , e j t } ⊆ E ofat most f edges, we have from (6) thatPr i ∼ [ (cid:96) ] (cid:2) ∀ ( x, y ) ∈ [ d ] × [ t ] , C ( e r x ) i (cid:54) = C ( e j y ) i (cid:3) ≥ . Therefore all the edges of F are avoided in at least half the graphs in G P . Proof of Theorem 28.
Since we have L ≤ f , the bounds in Theorem 2 follow here as well withsetting a = f and b = L , while we avoid the case when b = Ω(log m ) in order to get the rightbounds (we consider this case to be covered by the ‘otherwise’ case construction in Theorem 2). Inorder to see that (I1) to (I4) holds, we only need to verify that for the Reed Solomon code C RS andthe concatenated code C AG ◦ RS (from Lemma 20) when we plug in SHM ( C RS ) and SHM ( C AG ◦ RS )respectively into Lemma 29, that the parameters are as claimed in the theorem statement.The block length (cid:96) of C RS is set to be at most Lf log m log L in Lemma 23. If L ≥ m /c then |G P | ≤ (cid:96) = O ( cLf ) and otherwise we have |G P | ≤ (cid:96) = O ( Lf log m ).The block length (cid:96) of C AG ◦ RS is set to be at most L f log m log L in Lemma 23. Since we applythis bound to the case where L ≤ log m then |G P | ≤ (cid:96) = O ( L f log m ) = O ( Lf log m ).Plugging in the bound on the above block lengths of the two codes into Lemma 29 gives (I1) to(I4) in the theorem statement. Note that the encoding time of C RS is (cid:96) · polylog ( m ) = Lf polylog ( m )and while the encoding time of a codeword C AG ◦ RS is O ( (cid:96) ), since L ≤ f ≤ log m , we have that theencoding time of C AG ◦ RS is also polylog ( m ).Thus we now look towards proving (I5) and (I6). Notice that since L ≤ f , and the Boolean HM hash family provided to Proposition 24 in the proof of Theorem 2 arises from the alphabetreduction of Lemma 15, we know that we can even exclude all the subgraphs G i, (for all i ∈ [ (cid:96) ]) inthe proof of Proposition 24, to only have (cid:96) many subgraphs in G L,f . We will use this simplificationlater in this proof.In order to see that every subgraph in G L,f has at most mf many edges (i.e., (I5)), we only need22o verify that the Reed Solomon code C RS and the concatenated code C AG ◦ RS (from Lemma 20) are1-wise independent: A code C ⊆ [ q ] (cid:96) is said to be 1-wise independent if and only if for every i ∈ [ (cid:96) ]and every ζ ∈ [ q ] we have Pr x ∼ C [ x i = ζ ] = 1 q . Let us first see why it suffices for (I5) to show that C RS and C AG ◦ RS are 1-wise independent.Given L, f , fix a code C ∈ { C RS , C AG ◦ RS } which optimizes the parameters of Theorem 2. Fix asubgraph G (cid:48) in G L,f . By construction of G L,f there exists h ∈ HM ( C ), such that the edge e i in G is retained in G (cid:48) if and only if h ( i ) = 0. Since the hash functions in HM ( C ) are indexed bythe set [ (cid:96) ] × (cid:0) [ q ] ≤ L (cid:1) (where q is the alphabet size and (cid:96) is the block length of C ), let the index of h be ( j, S ) ∈ [ (cid:96) ] × (cid:0) [ q ] ≤ L (cid:1) . Notice that the number of edges in G (cid:48) is simply the subset E (cid:48) ⊆ [ m ]defined as E (cid:48) = { x ∈ [ m ] | C ( x ) j ∈ S } . However, since C is 1-wise independent, we have thatPr x ∼ C [ x j ∈ S ] = | S | q , and thus | E (cid:48) | = m · | S | /q ≤ mL/q . If C = C RS then q ≥ Lf log m , and thus | E (cid:48) | ≤ m/ ( f log m ), and if C = C AG ◦ RS then q ≥ Lf , and thus | E (cid:48) | ≤ m/f . This proves (I5).We now return our focus to showing that C RS and C AG ◦ RS are 1-wise independent. In fact we willshow a stronger statement: every linear code C is 1-wise independent. Let A (cid:96) × log q m := ( (cid:126)a , . . . , (cid:126)a (cid:96) )be the generator matrix of C ⊆ [ q ] (cid:96) . Then we can rewrite the claim of showing 1-wise independenceas follows: for every i ∈ [ (cid:96) ] and every ζ ∈ [ q ] we havePr y ∼ [ q ] log q m [( Ay ) i = ζ ] = 1 q . We now rewrite ( Ay ) i as (cid:104) (cid:126)a i , y (cid:105) , and since (cid:126)a i is not the zero vector the claim follows (by even justa simple induction argument on the dimension).Now we show (I6). Fix some G i in G L,f . By construction of G L,f we may interpret the index i as some ( j, S ) ∈ [ (cid:96) ] × (cid:0) [ q ] ≤ L (cid:1) such that the edge e x in G is retained in G i if and only if C ( x ) j ∈ S .Let A C := ( (cid:126)a , . . . , (cid:126)a (cid:96) ) be the generator matrix of C . We can determine the subset T of [ q ] log q m defined as follows: T := { x ∈ [ q ] log q m | (cid:104) (cid:126)a j , x (cid:105) ∈ S } . Then interpretting T as a subset of [ m ] simply gives us the edge set of G i . To compute T efficiently, we first compute for every r ∈ [ q ] (log q m ) − and every z ∈ S , the value: α := z − (log q m ) − (cid:88) w =1 ( (cid:126)a j ( w ) · r w ) · ( (cid:126)a j (log q m )) − . The we include the vector ( r, α ) ∈ [ q ] log q m into T. Thus T can be computed in time ˜ O ( m | S | /q ) =˜ O ( mL/q ). And as before if C = C RS then q ≥ Lf log m , and thus ˜ O ( mL/q ) = ˜ O ( m/ ( f log m )),and if C = C AG ◦ RS then q ≥ Lf , and thus ˜ O ( mL/q ) = ˜ O ( m/f ). This proves (I6).23 Lower Bounds for ( L, f ) -Replacement Path Covering In this section we provide a lower bound construction for the covering value of (
L, f )- RPC andestablish Theorem 3. Our lower bound graph is based on a modification of the graph constructionused to obtain a lower bound on the size of f -failure FT-BFS structures, defined as follows. Definition 30 (FT-BFS Structures) . [PP16, Par15] Given a (possibly weighted) n -vertex graph G = ( V, E ) , a source vertex s ∈ V , and a bound f on the number of (edge) faults f , a subgraph H ⊆ G is an f -failure FT-BFS structure with respect to s if dist( s, t, H \ F ) = dist( s, t, G \ F ) for every t ∈ V, F ⊆ E ( G ) , | F | ≤ f . FT-BFS structures were introduced by the second author and Peleg [PP16] for the single (edgeor vertex) failure. It was shown that for any unweighted n -vertex graphs and any source node s ,one can compute an 1-failure FT-BFS subgraph with O ( n / ) edges. This was complemented by amatching lower bound graph. In [Par15], the lower bound graph construction was extended to anynumber of faults f , which would serve the basis for our ( L, f )- RPC lower bound argument.
Fact 31. [Par15] For every n ≥ o (1) , and f ≥ , there exists an n -vertex graph G ∗ f and a sourcevertex s such that any f -failure-BFS structure with respect to s has Ω( n − / ( f +1) ) edges. In the high-level, the lower bound graph G ∗ f consists of a dense bipartite subgraph B withΩ( n − / ( f +1) ) edges, and a collection of { s } × V paths, that serve as replacement paths from s to all other vertices in G . The collection of paths are defined in a careful manner in a way thatforces any f -failure FT-BFS for s to include all the edge of the bipartite graph B . To translatethis construction into one that yields an ( L, f )- RPC of large CV , our key idea is to shortcut theedge-length of { s } × V replacement paths of G ∗ f by means of introducing weights to the edges.As a result, we get a weighted graph G wf whose all { s } × V replacement paths have at most L edges for any given parameter L ≤ ( n/f ) / ( f +1) . By setting the weights carefully, one can showthat any f -failure FT-BFS for the designated source s must have Ω( L f · n ) edges. To complementthe argument, consider the optimal ( L, f )- RPC G of minimal value for G wf . Since all the { s } × V paths are of length at most L , the replacement paths are resiliently covered by G . This yields thefollowing simple construction of f -failure FT-BFS H ⊆ G : Compute a shortest-path tree in eachsubgraph G (cid:48) ∈ G , and take the union of these subgraphs as the output subgraph H . Since thisconstruction yields an f -failure FT-BFS with O ( |G| n ) edges, we conclude that |G| = Ω( L f · n ). Wenext explain this construction in details.In the next description, we use the notation of [Par15] and introduced several key adaptationsalong the way. Our lower bound graph G wf similarly to Fact 31 is based on a graph G f ( d ) which isdefined inductively. Note that whereas in [Par15], the graph G f ( d ) is unweighted, for our purposes(making all replacement paths short in terms of number of edges) some edges will be given weights.For f = 1, G ( d ) consists of three components: (i) a set of vertices U = { u , . . . , u d } connectedby a path P = [ u , . . . , u d ], (ii) a set of terminal vertices Z = { z , . . . , z d } , and (iii) a collectionof d edges e i of weight w ( e i ) = 6 + 2( d − i ) connecting u i and z i for every i ∈ { , . . . , f } . Thevertex r ( G ( d )) = u , and the terminal vertices of Z are the leaves of the graph denoted by Leaf ( G ( d )) = Z . Each leaf node z i ∈ Leaf ( G ( d )) is assigned a label based on a labeling function Label : Leaf ( G ( d )) → E ( G ( d )) . The label of the leaf corresponds to a set of edge faults underwhich the path from root to leaf is still maintained. Specifically, Label ( z i , G ( d )) = ( u i , u i +1 ) for24 ≤ d − Label e ( z i , G ( d )) = ∅ . In addition, define P ( z i , G ( d )) = P [ r ( G ( d )) , u i ] ◦ Q i to bethe path from the root u to the leaf z i .We next describe the inductive construction of the graph G f ( d ) = ( V f , E f ), for every f ≥ G f − ( d ) = ( V f − , E f − ). The weights are introduced only in this induction step,i.e., for f ≥
2. The graph G f ( d ) = ( V f , E f ) consists of the following components. First, itcontains a path P f = [ u f , . . . , u fd ], where the node r ( G f ( d )) = u f is fixed to be the root. Inaddition, it contains d disjoint copies of the graph G (cid:48) = G f − ( d ), denoted by G (cid:48) , . . . , G (cid:48) d (viewedby convention as ordered from left to right), where each G (cid:48) i is connected to u fi by a collection of d edges e fi , for i ∈ { , . . . , d } , connecting the vertices u fi with r ( G (cid:48) i ). The edge weight of each e fi is w ( e fi ) = ( d − i ) · Depth ( G f − ( d )). In the construction of [Par15], each edge e fi is replaced by a path Q fi of length w ( e fi ). This is the only distinction compared to [Par15]. Note that by replacinga path Q fi by a single edge e fi of weight | Q fi | , the weighted length of the replacement paths wouldpreserve but their length in terms in number of edges is considerably shorter. The leaf set of thegraph G f ( d ) is the union of the leaf sets of G (cid:48) j ’s, Leaf ( G f ( d )) = (cid:83) dj =1 Leaf ( G (cid:48) j ). See Fig. 1 for anillustration for the special case of f = 2.Finally, it remains to define the labels Label f ( z i ) for each z i ∈ Leaf ( G f ( d )). For every j ∈ { , . . . , d − } and any leaf z j ∈ Leaf ( G (cid:48) j ), let Label f ( z j , G f ( d )) = ( u fj , u fj +1 ) ◦ Label f − ( z j , G (cid:48) j ).Denote the size (number of nodes) of G f ( d ) by N ( f, d ), its depth (maximal weighted distancebetween two nodes) by Depth ( f, d ), and its number of leaves by nLeaf ( f, d ) = | Leaf ( G f ( d )) | . Notethat for f = 1, N (1 , d ) = 2 d + (cid:80) di =1 · ( d − i ) ≤ d , Depth (1 , d ) = 6 + 2( d −
1) (correspondingto the length of the path Q ), and nLeaf (1 , d ) = d . Since in our construction, we only shortcut thelength of the paths, the following inductive relations hold as in [Par15]. Observation 32 (Observation 4.2 of [Par15]) . (a) Depth ( f, d ) = O ( d f ) .(b) nLeaf ( f, d ) = d f .(c) N ( f, d ) = c · d f +1 for some constant c . Consider the set of λ = nLeaf ( f, d ) leaves in G f ( d ), Leaf ( G f ( d )) = (cid:83) di =1 Leaf ( G (cid:48) i ) = { z , . . . , z λ } ,ordered from left to right according to their appearance in G ( f, d ). Lemma 33 (Slight modification of Lemma 4.3 of [Par15]) . For every z j it holds that:(1) The path P ( z j , G f ( d )) is the only u f − z j path in G f ( d ) .(2) P ( z j , G f ( d )) ⊆ G \ Label f ( z j , G f ( d )) .(3) P ( z i , G f ( d )) (cid:54)⊆ G \ Label f ( z j , G f ( d )) for every i > j .(4) ω ( P ( z i , G f ( d ))) > ω ( P ( z j , G f ( d ))) for every i < j . In Lemma 4.3 of [Par15], the forth claim discusses the length of the paths P ( z i , G f ( d )). Inour case, since we shortcut the path by introducing an edge weight the equals to the length of theremoved sub-path, the same claim holds only for the weighted length of the path. We next showthat thanks to our modifications the hop-diameter (i.e., measured by number of edges) of G f − ( d )is bounded, and consequently, all { s } × V replacement paths are short. Claim 34.
The hop-diameter of G f ( d ) is O ( f · d ) .Proof. The claim is shown by induction on f . For f = 1, the hop-diameter of G ( d ) is | P | = d .Assume that the claim holds up to f − G f − ( d ) is at most ( f − d .25he graph G f ( d ) is then connected to G f − ( d ) via the path P f = [ u f , . . . , u fd ] of hop-length d .Each u fi is connected to the root of the i th copy of G f − ( d ) via an edge. Thus the hop-diameter of G f ( d ) is at most f · d .Finally, we turn to describe the graph G wf which establishes our lower bound. The graph G wf consists of three components. The first is the modified weighted graph G f ( d ) for d ≤ (cid:100) ( n/ c ) / ( f +1) (cid:101) ,where c is some constant to be determined later. By Obs. 32, n/ ≤ | V ( G f ( d )) | . Note that d ≤ (5 / / ( f +1) · ( n/ c ) / ( f +1) = (5 n/ c ) / ( f +1) for sufficiently large n , hence N ( f, d ) = c · d f +1 ≤ n/ G wf is a set of nodes X = { x , . . . , x χ } and an additional vertex v ∗ thatis connected to u fd and to all the vertices of X . The cardinality of X is χ = n − N ( f, d ) − G wf is a complete bipartite graph B connecting the nodes of X with theleaf set Leaf ( G f ( d )), i.e., the disjoint leaf sets Leaf ( G (cid:48) ) , . . . , Leaf ( G (cid:48) d ). The vertex set of the re-sulting graph is thus V = V ( G f ( d )) ∪ { v ∗ } ∪ X and hence | V | = n . By Prop. (b) of Obs. 32, nLeaf ( G (cid:48) i ) = d f = (cid:100) ( n/ c ) / ( f +1) (cid:101) f ≥ ( n/ c ) f/ ( f +1) , hence | E ( B ) | = Θ( n · d f ). The followinglemma follows the exact same proof as in [Par15]. Lemma 35. [Analogue of Theorem 4.1 in [Par15]] Every f -failure FT-BFS H w.r.t s = u f in G wf must contain all the edges of B . Thus, | E ( H ) | = Ω( n · d f ) . We are now ready to prove the lower bound on covering value of the (
L, f )- RPC . Proof of Thm. 3.
Let L = f · d and consider the graph G wf with the source node s = u f . By theconstruction of G wf it holds that ( d/f ) f +1 ≤ n . Let G L,f be the optimal (
L, f )- RPC for G wf ofminimal CV . Our goal is to show that |G L,f | = Ω(( L/f ) f ). We next claim that one can use this RPC (or any
RPC ), to compute an f -failure FT-BFS structure H with O ( |G L,f | n ) edges. Specifically,let H = (cid:83) G (cid:48) ∈G L,f
SPT ( s, G (cid:48) ) where SPT ( s, G (cid:48) ) is an shortest-path tree rooted at s in G (cid:48) . It remainto show that H is indeed an f -failure FT-BFS structure with respect to s .By Claim 34, every s - t replacement path avoiding f faults has O ( f d ) edges. Thus, for every P ( s, t, F ) for | F | ≤ f there exists a subgraph G (cid:48) ∈ G L,f such that P ( s, t, F ) ⊆ G (cid:48) and F ∩ G (cid:48) = ∅ .Therefore, the s - t path in the shortest path tree SPT ( s, G (cid:48) ) is necessarily P ( s, t, F ). We concludethat H ⊆ G wf is an f -failure FT-BFS structure w.r.t s and with O ( |G L,f | n ) edges. Combining withLemma 35, we get that |G L,f | = Ω(( L/f ) f ). DSO by Weimann and Yuster
In this section, we prove Theorem 4 by providing a derandomization of the algebraic constructionof the distance sensitivity oracle of [WY13]. This construction has sub-cubic preprocessing timeand sub-quadratic query time. We will use the following lemma from [ACC19].
Lemma 36. [Lemma 2 of [ACC19]] Let D , D , . . . , D q ⊆ V satisfy that | D i | > L for every ≤ i ≤ q , and | V | = n . One can deterministically find in (cid:101) O ( q · L ) time a set R ⊂ V such that | R | = O ( n log n/L ) and D i ∩ R (cid:54) = ∅ for every ≤ i ≤ q . ≤ 𝑛/c X 𝑂(𝑛 ) s G (𝑑 )G (𝑑 )G (𝑑 )G (𝑑 ) Figure 1: Illustration of the lower-bound graph G wf for f = 2. The bold red edges are the onlymodification compared to the construction of [Par15]. That is, in [Par15] each red line correspondto a path and in our construction, it is replaced by a weighted edge whose weigh equal to the lengthof the path. As a result the weight of all replacement paths are preserved, but their length is edgesis bounded by O ( f d ). 27e start by providing a short overview of the randomized algebraic construction of [WY13].As we will see, despite the fact that the query algorithm of [WY13] is in fact deterministic, dueto the derandomization of the preprocessing part, the query algorithm will be similar to that of[ACC19]. Following [ACC19], it will be convenient to set (cid:15) = 1 − α . Throughout, we describe theconstruction for 0 < (cid:15) < f = O (log n/ log log n ) and a bound L = n (cid:15)/f . We need the followingdefinition. Definition 37 (Long and Short ( s, t, F )) . A triplet ( s, t, F ) ∈ V × V × E ( G ) f is L - short if d L ( s, t, G \ F ) = dist( s, t, G \ F ) . That is, there exists a P ( s, t, F ) replacement path with at most L edges in G . Otherwise, ( s, t, F ) is L - long . When L is clear from the context, we may omit it and writeshort (or long) ( s, t, F ) . Outline of the Weimann-Yuster
DSO . The preprocessing algorithm starts by computing an(
L, f )- RPC G L,f = { G , . . . , G r ⊆ G } for all replacement paths with at most L edges, where r = O ( f n (cid:15) log n ). This RPC is generated randomly by sampling each edge in G into G j independentlywith probability of 1 − /L for every j ∈ { , . . . , r } . Let R be a random sample of O ( f n log n/L )vertices in G , that we call hitting set as they hit every replacement path segment with at least L edges, w.h.p.Given the ( L, f )- RPC G L,f and the hitting set R , there are two variants of the algorithm. Inone variant, a collection of matrices A , . . . , A r is computed in in time O ( r · M . · n . (cid:15) ) forstoring the all-pairs distances in G , . . . , G r . In an alternative variant, the algorithm computes forevery subgraph G j ∈ G L,f a pair of matrices B j and D j in time O ( rM n . (cid:15) ). The matrix B j stores the R × R distances in G j and it is computed based on a matrix D j in O ( | R | n ) time.For a query ( s, t, F ), the query algorithm first computes a collection of O ( f log n ) graphs G F ⊆ G L,f that avoid all edges of F . For an L -short query, the distance dist G \ F ( s, t ) is obtained bytaking the minimum s - t distance over all subgraphs G (cid:48) ∈ G F . To support L -long queries ( s, t, F ),the algorithm uses the matrices A j (or the matrix pairs D j , B j ) to compute a dense graph G F withvertex set V ( G F ) = R ∪ { s, t } . The edge weight ( x, y ) for every x, y ∈ V ( G F ) is set to be theminimum x - y distance over all the subgraphs in G F . The answer to the ( s, t, F ) query is obtainedby computing the s - t distance in G F . In the preprocessing variant that computes the A j matrices,the query algorithm takes (cid:101) O ( n − (cid:15)/f ) time. In the variant that computes the B j , D j matrices,the query time is O ( n − (cid:15)/f ). In the following subsections, we explain how to derandomize thepreprocessing algorithm and combine it with the modified query algorithm of [ACC19].The structure of the remaining of the section is as follows. In Sec. 6.1, we present an improvedconstruction of a structure called Fault-Tolerant trees. Then, in Subsec. 6.2, we provide a completedescription of the preprocessing and query time algorithms, both will be based on the constructionof the FT-trees. For a given vertex pair s, t , the FT-tree FT L,f ( s, t ) consists of O ( L f ) nodes . Each node is labeledby a pair (cid:104) P, F (cid:105) where P is an s - t path in G \ F with at most L edges, and F is a sequence of at In particular, for an L -long ( s, t, F ) triplet it holds that every s - t shortest path in G \ F has at least L + 1 edges. To avoid confusion, we call the vertices of the FT-trees nodes . f faults which P avoids. [ACC19] described a construction of FT-trees FT L,f ( s, t ) for everypair s, t and used it to implement the combinatorial DSO of [WY13]. The computation time of theFT-trees algorithm by [ACC19] is O ( m · n · L f +1 ), which is too costly for our purposes (e.g., theimplementation the algebraic DSO of [WY13]). Defining FT-Trees.
Fix a pair s, t ∈ V . For every i ∈ { , . . . , f } , and every sequence of faults F ⊆ E , | F | ≤ f − i , the tree FT L,i ( s, t, F ) is defined in an inductive manner. Throughout, thepaths P L ( s, t, F ) refer to some shortest s - t path in G \ F with at most L edges. If there are severalsuch paths, the algorithm picks one as will be described later. Base case:
The tree FT L, ( s, t, F ) for every F ⊆ E and | F | ≤ f is defined as follows. If d L ( s, t, G \ F ) = ∞ (i.e., there is no s - t path with at most L edges in G \ F ), then FT L, ( s, t, F ) is empty.Otherwise, FT L, ( s, t, F ) consists of a single node (root node) labeled by (cid:104) P L ( s, t, F ) , F (cid:105) . This rootnode is associated with a binary search tree which stores the edges of the path P L ( s, t, F ). Inductive step:
Assume the construction of FT L,j ( s, t, F ) for every j up to i , and every F ⊆ E , | F | ≤ f − j . The tree FT L,i +1 ( s, t, F (cid:48) ) is defined as follows for every set F (cid:48) of f − ( i + 1)faults in E . If d L ( s, t, G \ F (cid:48) ) = ∞ , then FT L,i +1 ( s, t, F (cid:48) ) is empty. Assume from now on that d L ( s, t, G \ F (cid:48) ) < ∞ . The root node r of FT L,i +1 ( s, t, F (cid:48) ) is labeled by (cid:104) P L ( s, t, F (cid:48) ) , F (cid:48) (cid:105) , and theedges of P L ( s, t, F (cid:48) ) are stored in a binary search tree. This root node is connected to the roots ofthe trees FT L,i ( s, t, F (cid:48) ∪{ a j } ) for every a j ∈ P L ( s, t, F (cid:48) ) satisfying that d L ( s, t, G \ ( F (cid:48) ∪{ a j } )) < ∞ .Letting, r j be the root node FT L,i ( s, t, F (cid:48) ∪ { a j } ) (if such exists), we have: FT L,i +1 ( s, t, F (cid:48) ) = { FT L,i ( s, t, F (cid:48) ∪{ a j } ) ∪{ ( r, r j ) } | a j ∈ P L ( s, t, F (cid:48) ) , d L ( s, t, G \ ( F (cid:48) ∪{ a j } )) < ∞} . For i = f , we abbreviate FT L,f ( s, t, ∅ ) = FT L,f ( s, t ). Observation 38.
Each tree FT L,f ( s, t ) has at most L f nodes (in the case of vertex faults, it hasat most ( L + 1) f nodes).Proof. The depth of the tree FT L,f ( s, t ) is at most f . For the case of edge faults, each node in FT L,f ( s, t ) has at most L children as each node is labeled by a path of ≤ L edges. In the case ofvertex faults, a path of at most L edges, has L + 1 vertices. Algebraic Construction of FT-Trees.
We now turn to provide a new algorithm for computingthe FT-Trees FT L,f ( s, t ) based on the ( L, f )- RPC of Thm. 2. This algorithm will be applied inthe preprocessing phase of the f - DSO . The next theorem improves upon the (cid:101) O ( m · n · L f +1 )-timealgorithm provided in [ACC19] for dense graphs. The key difference from [ACC19] is that thealgorithm of [ACC19] is combinatorial (e.g., uses Dijkstra for shortest path computations), and ouralgorithm is algebraic (e.g., uses matrix multiplication). Theorem 39 (Improved Computation of FT-Trees) . For every L and f = O (log n/ log log n ) , thereexists a deterministic algorithm that computes (cid:83) s,t ∈ V FT L,f ( s, t ) in time:1. (cid:101) O (( αcLf ) f +1 · LM n ω ) if L ≥ m /c for some constant c , and2. (cid:101) O (( αLf log n ) f +1 · LM n ω ) otherwise,where α is the universal constant of Theorem 2. L, f )- RPC G L,f . Then, itapplies the
AP SP ≤ L algorithm of Lemma 6 to compute in each G (cid:48) ∈ G L,f , the collection of all V ( G (cid:48) ) × V ( G (cid:48) ) shortest paths P LG (cid:48) ( s, t ) with at most L edges, for every s, t ∈ V ( G (cid:48) ).This computations serves the basis for the following key task in the construction of the FT-trees: Given a triplet s, t, F , compute d L ( s, t, G \ F ) and some path P Ls,t,F if such exists.
Lemma 40.
Consider a pre-computation of the ( L, f ) - RPC G L,f for f = O (log n/ log log n ) , andthe application of algorithm AP SP ≤ L in each of the subgraphs G (cid:48) ∈ G L,f . Then, given a triplet ( s, t, F ) , in time (cid:101) O ( L ) , one can compute the distance d L ( s, t, G \ F ) and a corresponding path P L ( s, t, F ) (if such exists).Proof. By Theorem 26, given the (
L, f )- RPC G L,f , one can compute in time (cid:101) O ( L ) a collection ofsubgraphs G F that fully avoid F . In addition, it holds that for any s - t path P in G \ F with atmost L edges, there must be exists a subgraph G (cid:48) ∈ G F that fully contain P . In particular, letting P ∗ be the shortest s - t path with at most L edges in G \ F (breaking ties in an arbitrary manner),there is a subgraph in G F that fully contains P ∗ . Since the algorithm AP SP ≤ L is applied on eachof the subgraphs G (cid:48) ∈ G F , we have that d L ( s, t, G \ F ) = min G (cid:48) ∈G F d L ( s, t, G (cid:48) ) . (7)The desired path P L ( s, t, F ) corresponds to the output path of algorithm AP SP ≤ L in the subgraph G (cid:48) ∈ G F that minimizes the distance of Eq. (7).The computation of the FT-tree FT L,f ( s, t ) for every s, t ∈ V is described as follows. Theroot node is simply P L ( s, t ) as computed by applying algorithm AP SP L in G . If d L ( s, t, G ) = ∞ ,then FT L,f ( s, t ) is empty. The computation of the binary search tree for storing P L ( s, t ) can becomputed in (cid:101) O ( L ) time. Now, for every labeled node (cid:104) P L ( s, t, F ) , F (cid:105) , the algorithm computes itschild nodes (cid:104) P L ( s, t, F ∪ { a j } ) , F ∪ { a j }(cid:105) for every a j ∈ P L ( s, t, F ). For that purpose, it appliesthe algorithm of Lemma 40 with input ( s, t, F ∪ { a j } ) for every a j ∈ P L ( s, t, F ). We are now readyto complete the proof of Theorem 39. Proof of Theorem 39.
The correctness of the algorithm follows by Lemma 40. Therefore, it remainsto bound the computation time. The computation of the (
L, f )- RPC is done in time O ( CV ( G L,f ) · m ).Applying algorithm AP SP ≤ L on every G (cid:48) ∈ G L,f takes O ( CV ( G L,f ) · LM n ω ) time by Lemma 6.The computation of each child node in the FT-tree takes O ( f L log n ) time, by Lemma 40. ByObservation 38, the total number of nodes in all the trees is bounded by O ( L f · n ). Thus, thetotal time to compute all the FT-trees is bounded by O ( CV ( G L,f ) · LM n ω ). The lemma holds byplugging the covering values of Theorem 2 (the first and last bounds).The applicability of the FT-trees in the context of DSOs is expressed in the next lemma. Lemma 41. [Lemma 17 of [ACC19]] Given the computation of the trees FT L,f ( s, t ) , s, t ∈ V , forevery triplet ( s, t, F ) one can compute d L ( s, t, G ) and a replacement path P L ( s, t, F ) (if such exists)in time O ( f log L ) . roof. Given ( s, t, F ), we query the FT-tree FT L,f ( s, t ) as follows. First check if the path P L ( s, t )labeled at the root of the tree intersects F . If no, then output P L ( s, t ). Otherwise, letting a j ∈ P L ( s, t ) ∩ F , we continue with the child node labeled by P L ( s, t, { a j } ). Again, if P L ( s, t, { a j } ) ∩ F = ∅ , we output that path and otherwise continues to its child node P L ( s, t, { a j , a j (cid:48) } ) for some a j (cid:48) ∈ P L ( s, t, { a j } ) ∩ F . Using the binary search tree at each node P L ( s, t, F (cid:48) ), finding some edge e (cid:48) ∈ P L ( s, t, F (cid:48) ) ∩ F can be done in O ( f log L ) time. Since the depth of the tree is f , the total timeis O ( f log L ). The randomized preprocessing algorithm of Weimann and Yuster has two randomized ingredients.The first is the computation of the (
L, f )- RPC given by the subgraphs G , . . . , G r . The second is acomputation of the set R which, w.h.p., hits every L -length segment of every long P ( s, t, F ) paths.Our deterministic preprocessing algorithm is presented below: Deterministic Preprocessing Algorithm • (i): Compute FT-trees . Using ( L, f )- RPC of Thm. 2, apply Theorem 39 to computethe collection of trees (cid:83) s,t FT L,f ( s, t ) with O ( n · L f ) nodes. • (ii): Compute Critical Paths. Let D L,f be the collection of all the pairs (cid:104)
P, F (cid:105) corresponding to the nodes of the FT-trees. Define the collection of critical paths D L = { P | (cid:104) P, F (cid:105) ∈ D
L,f , | P | ∈ [ L/ , L ] } which consists of all sufficiently long paths. • (iii): Compute Hitting Set for the Critical Paths. Apply the algorithm of Lemma36 to compute a hitting set R ⊆ V for the paths in D L where | R | = O ( n log n/L ).This completes the description of the preprocessing algorithm. We note that the computation ofthe FT-trees substitutes the A j , B j , D j matrices used in [WY13]. Lemma 42 (Preprocessing time) . The preprocessing time of the deterministic algorithm is boundedby 1. (cid:101) O (( αcLf ) f +1 · LM n ω ) if L ≥ m /c for some constant c ,2. (cid:101) O (( αLf log n ) f +1 · LM n ω ) otherwise, where α is the universal constant of Theorem 2.Proof. The computation time is dominated by the computation of the FT-trees, see Theorem 39.The FT-trees consists of O ( n · L f ) = O ( n (cid:15) ) labeled nodes, and thus |D L | = O ( n (cid:15) ). By Lemma36, the computation of the hitting set R takes O ( n (cid:15) + (cid:15)/f ) time, and | R | = O ( n log n/L ).By setting the matrix multiplication exponent to ω = 2 . (cid:15) = 1 − α , Lemma 42 achievesthe bound of Theorem 4. The Query Algorithm.
Once the FT-trees are computed, the query algorithm is the same asin [ACC19], for completeness we describe it here. Note that in contrast to [ACC19], we do notassume here that the shortest path ties are decided in a consistent manner. Thus the correctness31f the procedure is somewhat more delicate. Given a short query ( s, t, F ), i.e., d L ( s, t, G \ F ) =dist( s, t, G \ F ), the desired distance d L ( s, t, G \ F ) can be computed in time O ( f log L ) by usingthe query algorithm of Lemma 41. From now on assume that the query ( s, t, F ) is long. Unlike[WY13] we would not be able to show that there are few subgraphs in the ( L, f )- RPC G L,f thatfully avoid F . Nevertheless, we will still be able to efficiently compute the dense graph G F , e.g.,within nearly the same time bounds as in [WY13]. Recall that R is the hitting-set of the critical setof replacement paths. The vertex set of the graph G F is given by V F = R ∪ { s, t } , and the weightof each edge ( x, y ) ∈ V F × V F is given by w ( x, y ) = d L ( x, y, G \ F ). This weight can be computedby applying the query algorithm of Lemma 41 on the FT-tree FT L,f ( x, y ) with the query ( x, y, F ).To answer the ( s, t, F ) query it remains to compute the s - t distance in the dense graph G F .Using the method of feasible price functions and in the exact same manner as in [WY13], thiscomputation is done in (cid:101) O ( | E ( G F ) | ) = (cid:101) O ( n − (cid:15)/f ). This completes the description of the queryalgorithm. Given the computation of the FT-trees in the preprocessing step, by Lemma 41 thecomputation of the graph G F takes O ( | E ( G F ) | · f log L ) = (cid:101) O ( n − (cid:15)/f ) time. This matches thequery time of Weimann and Yuster [WY13] (up to poly-logarithmic terms). We finalize the sectionby showing the correctness of the query algorithm. Due to the fact that we do not assume uniquenessof shortest paths as in [ACC19], the argument is more delicate. Claim 43. dist( s, t, G F ) = dist( s, t, G \ F ) .Proof. The correctness for the short queries ( s, t, F ) follows by the correctness of Lemma 41. Con-sider a long query ( s, t, F ) and let P ( s, t, F ) be the s - t shortest path in G \ F with the minimalnumber of edges. If there are several such paths, pick one in an arbitrary manner. By definition, P = P ( s, t, F ) has at least L edges. Partition it into segments of length [ L/ , L/
2] and let s i - t i bethe endpoints of the i th segment. That is, P = P [ s = s, t = s ] ◦ P [ s , t ] ◦ . . . P [ s (cid:96) , t (cid:96) = t ].By the definition of P , every s i - t i shortest path in G \ F must have at least L/ s i , t i with a shorter (innumber of edges) s i - t i shortest path in G \ F . This implies that we can obtain an s - t shortest path P (cid:48)(cid:48) of the same weight but with fewer edges, contradiction to the minimality (in edges) of P . Since d L/ ( s i , t i , G \ F ) = dist( s i , t i , G \ F ) for every i ∈ { , . . . , (cid:96) } , there is an s i - t i path P L ( s i , t i , F ) oflength at most L in the FT-tree FT L,f ( s i , t i ). Specifically, this path can be found by applying thequery algorithm of Lemma 41 with the query ( s i , t i , F ). By Lemma 41, this results in the distance d L ( s i , t i , G \ F ) along with a path P L ( s i , t i , F ).Consider now an alternative s - t path P (cid:48) = P L ( s , t , F ) ◦ P L ( s , t , F ) ◦ . . . ◦ P L ( s (cid:96) , t (cid:96) , F ).Since d L/ ( s i , t i , G \ F ) = dist( s i , t i , G \ F ) for every i ∈ { , . . . , (cid:96) } , we have that P (cid:48) ∩ F = ∅ and ω ( P (cid:48) ) = ω ( P ) = dist( s, t, G \ F ).By definition, every P L ( s i , t i , F ) ∈ D L,f , and since P L ( s i , t i , F ) has at least L/ L edges, P L ( s i , t i , F ) ∈ D L . Since R is a hitting-set of all paths in D L , there existssome x i ∈ P L ( s i , t i , F ) ∩ R for every i . This implies that P (cid:48) can be written as a concatenationof replacement path segments each with at most L edges and with both endpoints in V ( G F ) = R ∪ { s, t } . Let { s = x , x , . . . , x k , x k +1 = t } be the ordered set of the representatives of the V ( G F )vertices on P (cid:48) . By the description of the query algorithm, for every i ∈ { , . . . , k } , it holds that There are (cid:101) O ( L ) such subgraphs which is too costly for our purposes. E.g., partition P ( s, t, F ) into consecutive segments of length L/
4, while the last segment have length at most L/ ( x i , x i +1 ) = d L ( x i , x i +1 , G \ F ). By the above argument, d L ( x i , x i +1 , G \ F ) = dist( x i , x i +1 , G \ F ).In addition, for every pair x, y ∈ V ( G F ), w ( x, y ) = d L ( x, y, G \ F ) ≥ dist( x, y, G \ F ). We thereforeconclude that dist( s, t, G F ) = ω ( P (cid:48) ) = dist( s, t, G \ F ). We next consider the applications of the (
L, f )- RPC to deterministic constructions of fault-tolerantspanners resilient to at most f vertex faults. For a given n -vertex (possibly) weighted graph G = ( V, E ), a subgraph H ⊆ G is an f -fault tolerant ( α, β )-spanner ifdist( s, t, H \ F ) ≤ α · dist( s, t, G \ F ) + β, for every s, t ∈ V, F ⊆ V, | F | ≤ f . When β = 0, the spanner is called multiplicative spanner, denoted by f -fault tolerant t -spanner forshort, t is the stretch factor. When α = 1, the spanner is additive. Chechik, Langberg, Peleg, and Roddity [CLPR10] presented the first non-trivial construction of f fault-tolerant multiplicative spanners resilient to vertex faults. The size overhead of their construc-tion (compared to standard spanner) is k f , that is, exponential in the number of faults. Dinitz andKrauthgamer [DK11] provided a simpler and sparser solution by using the notion of RPC s. Theyshowed:
Theorem 44 (Theorem 1.1 of [DK11]) . For every graph G = ( V, E ) with positive edge lengths andodd t ≥ , there is an f -fault tolerant t -spanner with size O ( f − / ( t +1) · n / ( t +1) log n ) . This theorem is a consequence of a general conversion scheme that turns any τ ( n, m )-timealgorithm for constructing t -spanners with size s ( n ) into an algorithm for constructing f -faulttolerant t -spanner with size O ( f log n · s (2 n/f )) and time complexity O ( f log n · τ (2 n/f, m )).Specifically, applying this conversion to the greedy spanner algorithm yields an f -fault tolerant(2 k − O ( f log n · ( n/f ) /k ) edges in time O ( f log nk · m · (2 n/f ) /k ). In thissection we provide the derandomization of Theorem 2.1 of [DK11] (which used to obtain Theorem1.1) and show: Theorem 45 (Derandomized of Theorem 2.1 of [DK11]) . If there is a deterministic algorithm A that on every n -vertex m -edge graph builds a t -spanner of size s ( n ) and time τ ( n, m, t ) , then thereis an algorithm that on any such graph builds an f -fult tolerant t -spanner of:1. size O ( f · s ( n/f )) and time O ( f ( τ ( n/f, m, t ) + m )) , if f ≥ n /c for some constant c ∈ N .2. size O (log n · s ( n/f )) and time O (log n ( τ ( n/f, m, t ) + m )) , if f ≤ log n
3. size O (( f log n ) · s ( n/f )) and time O (( f log n ) ( τ ( n/f, m, t ) + m )) , if f ∈ [log n, n o (1) ] .Proof. The algorithm applies the vertex variant of Theorem 28 to compute ( L = 2 , f ) RPC G . Then,it applies the fault-free algorithm A for computing the t -spanner H j for each subgraph G j ∈ G .The output spanner H = (cid:83) rj =1 H j is simply the union of all these spanner subgraphs.33e first consider correctness. Fix a replacement-path P ( s, t, F ). It is required to show thatdist( s, t, H \ F ) ≤ t · dist( s, t, G \ F ) and thus it is sufficient to show that dist( u, v, H \ F ) ≤ w ( u, v )for every edge ( u, v ) ∈ P ( s, t, F ), where w ( u, v ) is the weight of the edge ( u, v ) in G . Since G is an(2 , f )- RPC , there exists a subgraph G j ∈ G satisfying that ( u, v ) ∈ G j and F ∩ V ( G j ) = ∅ . Thus,the t -spanner H j ⊆ H satisfies that dist( u, v, H j \ F ) = dist( u, v, H j ) ≤ tw ( u, v ), as desired.We now turn to show that the computation time is O ( |G| · ( τ ( n/f, m, t ) + m )) and that thesize of the spanner is O ( |G| · s ( n/f, m, t )). By Theorem 28(I5v), we get that | V ( G j ) | = O ( n/f ) forevery G j ∈ G . The bounds then follows by plugging the covering value |G| and the computationtime of the covering of Theorem 2. In [BCPS15], the approach of [DK11] was extended to provide vertex fault-tolerant spanners withnearly additive stretch.
Theorem 46. [Derandomization of Theorem 3.1 of [BCPS15]] Let A be an algorithm for computing ( µ, α ) -spanner of size O ( n δ ) in time τ for an n -vertex m -edge graph G = ( V, E ) . Set L = (cid:100) α · (cid:15) − (cid:101) + 1 . Then, for any (cid:15) > and f ≤ L , one can compute an f -vertex fault-tolerant ( µ + (cid:15), α ) -spanner with:1. O (( c (cid:48) f L ) f +1 · n δ ) edges in time (cid:101) O (( f c (cid:48) L ) f +1 · τ ) , if L ≥ n /c for some constant c ∈ N .2. O (( c (cid:48) f L ) f +2 · log n · n δ ) edges in time (cid:101) O (( c (cid:48) f L ) f +2 · log n · τ ) , if L ≤ log n .3. O (( c (cid:48) f L log n ) f +1 · n δ ) edges in time (cid:101) O (( c (cid:48) f L log n ) f +1 · τ ) , otherwise,for some constant c (cid:48) .Proof. The proof follows the exact same line as Theorem 3.1 of [BCPS15] only when using Theorem2 to build an ( L + 1 , f )- RPC G = { G , . . . , G γ } . It then applies algorithm A on each of thesesubgraphs, and take the union of the output spanner as the final subgraph H . The size and timebounds are immediate by Theorem 2. To see the stretch argument, it is sufficient to show that forany path of length at most L in G \ F , there is a corresponding path in H \ F of bounded length. Thestretch argument for longer paths is obtained by decomposing it into L -length segments (exceptperhaps for the last segment), and accumulating the additive stretch from each segment. Fix an L -length path P ⊆ P ( s, t, F ), and let u, v be the endpoints of P . Since G is an ( L + 1 , f )- RPC , w.h.p.,there exists a subgraph G i ∈ G such that P ⊆ G i and F ∩ G i = ∅ . Since H i is an ( µ, α )-spannerfor G i , we have that dist( u, v, H i \ F ) = dist( u, v, H i ) ≤ µ · L + α . Partition any path P ( s, t, F ) into (cid:100) (1 /L ) · dist( s, t, G \ F ) (cid:101) segments each of length at most L . Wethen have thatdist( s, t, H \ F ) ≤ µ · dist( s, t, G \ F ) + α · (cid:100) (1 /L ) · dist( s, t, G \ F ) (cid:101) . Since 1 /L < (cid:15)/α , the stretch bound holds. 34 cknowledgment
We would like to thank Swastik Kopparty, Gil Cohen, and Amnon Ta-Shma for discussion oncoding theory, Moni Naor for discussion on universal hash functions, and Eylon Yogev for variousdiscussions.
References [AAB +
92] Mikl´os Ajtai, Noga Alon, Jehoshua Bruck, Robert Cypher, Ching-Tien Ho, Moni Naor,and Endre Szemer´edi. Fault tolerant graphs, perfect hash functions and disjoint paths.In , pages 693–702, 1992.[ACC19] Noga Alon, Shiri Chechik, and Sarel Cohen. Deterministic combinatorial replacementpaths and distance sensitivity oracles. In , pages12:1–12:14, 2019.[AG10] Noga Alon and Shai Gutner. Balanced families of perfect hash functions and theirapplications.
ACM Trans. Algorithms , 6(3):54:1–54:12, 2010.[Alo86] Noga Alon. Explicit construction of exponential sized families of k-independent sets.
Discret. Math. , 58(2):191–193, 1986.[AN96] Noga Alon and Moni Naor. Derandomization, witnesses for boolean matrix multiplica-tion and construction of perfect hash functions.
Algorithmica , 16(4-5):434–449, 1996.[AYZ95] Noga Alon, Raphael Yuster, and Uri Zwick. Color-coding.
Journal of the ACM(JACM) , 42(4):844–856, 1995.[BCPS15] Gilad Braunschvig, Shiri Chechik, David Peleg, and Adam Sealfon. Fault tolerantadditive and ( µ , α )-spanners. Theor. Comput. Sci. , 580:94–100, 2015.[BDPW18] Greg Bodwin, Michael Dinitz, Merav Parter, and Virginia Vassilevska Williams.Optimal vertex fault tolerant spanners (for fixed stretch). In
Proceedings of theTwenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 1884–1900. SIAM, 2018.[BDR20] Greg Bodwin, Michael Dinitz, and Caleb Robelle. Optimal vertex fault-tolerant span-ners in polynomial time.
CoRR , abs/2007.08401, 2020.[CC20a] Diptarka Chakraborty and Keerti Choudhary. New extremal bounds for reachabilityand strong-connectivity preservers under failures. In , pages 25:1–25:20, 2020.[CC20b] Shiri Chechik and Sarel Cohen. Distance sensitivity oracles with subcubic preprocessingtime and fast query time. In
Proccedings of the 52nd Annual ACM SIGACT Symposium n Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020 , pages1375–1388, 2020.[CCFK17] Shiri Chechik, Sarel Cohen, Amos Fiat, and Haim Kaplan. (1+eps)-approximate f-sensitive distance oracles. In Proceedings of the Twenty-Eighth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 1479–1496. SIAM, 2017.[CLPR10] Shiri Chechik, Michael Langberg, David Peleg, and Liam Roditty. Fault tolerant span-ners for general graphs.
SIAM Journal on Computing , 39(7):3403–3423, 2010.[CPT20] Julia Chuzhoy, Merav Parter, and Zihan Tan. On packing low-diameter spanning trees.In , pages 33:1–33:18,2020.[DK11] Michael Dinitz and Robert Krauthgamer. Fault-tolerant spanners: better and simpler.In
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles ofdistributed computing , pages 169–178. ACM, 2011.[DR20a] Michael Dinitz and Caleb Robelle. Efficient and simple algorithms for fault tolerantspanners. 2020.[DR20b] Michael Dinitz and Caleb Robelle. Efficient and simple algorithms for fault-tolerantspanners. In
PODC ’20: ACM Symposium on Principles of Distributed Computing,Virtual Event, Italy, August 3-7, 2020 , pages 493–500, 2020.[FK84] Michael L. Fredman and J´anos Koml´os. On the size of separating systems and familiesof perfect hash functions.
SIAM Journal on Algebraic and Discrete Methods , 5(1):61–68, 1984.[FKS84] Michael L. Fredman, J´anos Koml´os, and Endre Szemer´edi. Storing a sparse table with0(1) worst case access time.
J. ACM , 31(3):538–544, 1984.[FN01] Emanuela Fachini and Alon Nilli. Recursive bounds for perfect hashing.
Discret. Appl.Math. , 111(3):307–311, 2001.[Gop70] Valerii Denisovich Goppa. A new class of linear correcting codes.
Problemy PeredachiInformatsii , 6(3):24–30, 1970.[GRS19] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan.
Essential Coding The-ory . 2019. Available at .[GS96] Arnaldo Garcia and Henning Stichtenoth. On the asymptotic behaviour of some towersof function fields over finite fields.
Journal of Number Theory , 61(2):248 – 273, 1996.[GW20] Fabrizio Grandoni and Virginia Vassilevska Williams. Faster replacement paths anddistance sensitivity oracles.
ACM Trans. Algorithms , 16(1):15:1–15:25, 2020.[HP20] Yael Hitron and Merav Parter. Round-efficient distributed byzantine computation.
CoRR , abs/2004.06436, 2020. 36Nil94] Alon Nilli. Perfect hashing and probability.
Comb. Probab. Comput. , 3:407–409, 1994.[NSS95] Moni Naor, Leonard J. Schulman, and Aravind Srinivasan. Splitters and near-optimalderandomization. In , pages 182–191, 1995.[Par15] Merav Parter. Dual failure resilient bfs structure. In
Proceedings of the 2015 ACMSymposium on Principles of Distributed Computing , pages 481–490, 2015.[Par19a] Merav Parter. Small cuts and connectivity certificates: A fault tolerant approach. In , 2019.[Par19b] Merav Parter. Small cuts and connectivity certificates: A fault tolerant approach.
CoRR , abs/1908.03022, 2019.[PP16] Merav Parter and David Peleg. Sparse fault-tolerant BFS structures.
ACM Trans.Algorithms , 13(1):11:1–11:24, 2016.[PY19a] Merav Parter and Eylon Yogev. Low congestion cycle covers and their applications. In
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2019, San Diego, California, USA, January 6-9, 2019 , pages 1673–1692, 2019.[PY19b] Merav Parter and Eylon Yogev. Secure distributed computing made (nearly) optimal.In
Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing,PODC 2019, Toronto, ON, Canada, July 29 - August 2, 2019 , pages 107–116, 2019.[RS60] Irving S. Reed and Gustave Solomon. Polynomial codes over certain finite fields.
Journalof the Society for Industrial and Applied Mathematics (SIAM) , 8(2):300 – 304, 1960.[SAK +
01] Kenneth W. Shum, Ilia Aleshnikov, P. Vijay Kumar, Henning Stichtenoth, and VinayDeolalikar. A low-complexity algorithm for the construction of algebraic-geometriccodes better than the Gilbert-Varshamov bound.
IEEE Trans. Information Theory ,47(6):2225–2241, 2001.[Sin64] Richard C. Singleton. Maximum distance q -nary codes.
IEEE Trans. InformationTheory , 10(2):116–118, 1964.[SS90] Jeanette P. Schmidt and Alan Siegel. The spatial complexity of oblivious k-probe hashfunctions.
SIAM J. Comput. , 19(5):775–786, 1990.[TVZ82] M. A. Tsfasman, S. G. Vl˘adutx, and Th. Zink. Modular curves, shimura curves,and goppa codes, better than varshamov-gilbert bound.
Mathematische Nachrichten ,109(1):21–28, 1982.[vdBS19] Jan van den Brand and Thatchaphol Saranurak. Sensitive distance and reachabilityoracles for large batch updates. In , pages 424–435. IEEE, 2019.[WX01] Huaxiong Wang and Chaoping Xing. Explicit constructions of perfect hash familiesfrom algebraic curves over finite fields.
J. Comb. Theory, Ser. A , 93(1):112–124, 2001.37WY13] Oren Weimann and Raphael Yuster. Replacement paths and distance sensitivity oraclesvia fast matrix multiplication.
ACM Transactions on Algorithms (TALG) , 9(2):14,2013.[YZ05] Raphael Yuster and Uri Zwick. Answering distance queries in directed graphs using fastmatrix multiplication. In , pages 389–396. IEEE, 2005.
A Comparison with [Par19a] and [BDR20]
In [Par19a], the second author provided the first deterministic constructions of (
L, f )- RPC for L ≥ f . The notion of ( L, f )- RPC is introduced for the first time in the current paper, and in[Par19a] the construction is referred to as a derandomization of the FT-sampling technique . Theconstruction of [Par19a, Par19b] is based on a computation of a family of perfect hash functions H = { h : [ n ] → [2( L + f ) ] } with poly ( Lf log n ) functions. The covering subgraph family G of[Par19a, Par19b] consists of |H|· (4 Lf ) f = (4 Lf log n ) O (1)+2 f subgraphs. In the context of [Par19a],it was sufficient for the value of the covering to be polynomial in L , and for the computation timeto be polynomial in n . Also note that despite the fact that [Par19a, Par19b] explicitly considers thesetting where L ≥ f , their construction can be extended to provide a covering of value poly ( f log n )also for the case of L ≤ f . Specifically, this can be done by applying very minor modifications toLemma 17 of [Par19b]: set a = f and b = L , then let the set S h,i ,i ,...,i b of the lemma be given by S h,i ,i ,...,i b = { (cid:96) ∈ [ n ] | h ( (cid:96) ) ∈ { i , i , . . . , i b }} , ∀ h ∈ H and i , i , . . . , i b ∈ [2( L + f ) ] . (8)I.e., the only modification for L ≤ f is in replacing the / ∈ sign with ∈ in Eq. (8). The argumentthen follows in a symmetric manner as in the proof of Lemma 17 of [Par19b]. To summarize, theconstruction of [Par19a, Par19b] provides an ( L, f )- RPC of value poly (min { L, f } log n ).In this work, we considerably optimize the construction of [Par19a] in several ways. First, wealmost match the optimal values ( L, f )- RPC s for a wide range of parameters (e.g., when f = O (1)),providing a polynomial improvement in max { L, f } compared to [Par19a, Par19b]. Second, we es-tablish several key properties of ( L, f )- RPC s (e.g., Theorems 28) which have extensive applications.Those properties follow immediately by the randomized construction, and are proven in a quitenatural manner in our deterministic setting as well. For example, in order to provide a “perfect”derandomization of Weimann and Yuster DSO [WY13] as provided in the paper, we must use ournearly optimal constructions of (
L, f )- RPC s. Using the suboptimal (
L, f )-RPC constructions of[Par19a, Par19b] lead to a polynomially larger query time compared to that of [WY13]. Third, weprovide the first lower bound for the values of the (
L, f ) covering. We also note that our techniquesdiffer from [Par19a, Par19b] and are based on various coding schemes.Independent to our work, very recently [BDR20] presented a (randomized) slack version of thegreedy algorithm to obtain (vertex) fault-tolerant spanners of optimal size. To derandomize theirconstruction, [BDR20] provided a deterministic construction ( L = 2 , f )- RPC (using our terminol-ogy) with additional properties. The work of [BDR20] leaves a gap in the running time depending This is similarly to the random construction of (
L, f )-RPC, where the sampling probability also differs betweenwhen L ≤ f and L > f .
38n the value of the number of faults, f . Specifically, for f ≥ n c for some constant c , their derandom-ization matches the bounds of their randomized construction. In contrast, for smaller values of f ,there is a gap of poly ( f ) factor in the running time. In our work, using the generalized constructionof ( L, f )- RPC with L = 2 and in particular using Theorem 28 (instead of Theorem 5.3 of [BDR20])we close this gap.Elaborating, their non-optimality in derandomization stems from a not completely tight anal-ysis of some additional properties of the RPC that they construct, and in order to compensate forthis analysis, they rely on using “bulkier” objects such as almost k -wise independent families in ablack-box manner.More formally, by using Theorem 28 instead of Lemma 5.3 of [BDR20], we show: Lemma 47 (Improvement of Thm. 2.1 of [BDR20]) . There is a deterministic algorithm whichcomputes an f -(vertex) fault tolerant (2 k − spanner with at most O ( f − /k n /k ) edges in time (cid:101) O ( f − /k n /k + m · f ) (matching the bounds of the randomized construction of Theorem 1.1 of[BDR20]).Proof. Let G ,f be the (2 , f )-RPC of Theorem 28. By claim (I2) of Theorem 28, there is a collectionof (cid:101) O ( f ) subgraphs G P = e that contain both endpoints of e . This is the analogue to the set L e definedby [BDR20]. For every fixed set F of at most f vertex faults, let G e,F be the subset of subgraphsin G e that fully avoid F . To provide a spanner of optimal size in Alg. 2 of [BDR20], it is requiredthat for every P, F , the ratio |G e,F | / |G e | ≥ c , for some constant c . Indeed, by claim (I4) it holdsthat |G P,F | ≥ |G P | / F . By setting τ to 1 / |G e,F | / |G e | depends also on some parameter δ of their universal hash function).It remains to bound the running time. By 28, for every e , computing the collection G e takes (cid:101) O ( f ) time. Thus, taking (cid:101) O ( f m ) time for all the edges. Next, for every fixed edge e , computingthe vertices of each subgraph in G e takes (cid:101) O ( n/f ) time per subgraph and |G e | · (cid:101) O ( n/f ) = (cid:101) O ( f n ) intotal. The rest of the time argument works line by line as in Lemma 5.6 of [BDR20]. B Missing Proofs
Proof of Lemma 7.
First consider the case where L ≥ f . Let G = { G , . . . , G r } be a collection ofindependently sampled subgraphs for r = c · f · L f log n where c is a sufficiently large constant. Eachsubgraph G i is obtained by sampling each edge e ∈ E ( G ) into G i independently with probability p = 1 − /L . We now show that G is indeed an ( L, f )- RPC . Fix a replacement path P ( s, t, F ) oflength at most L that avoids a set of F edges. The probability that a subgraph G i covers P ( s, t, F )is at least q = p L · /L f = 1 / ( e · L f ). Thus the probability that none of the r subgraphs covers P ( s, t, F ) is at most (1 − q ) r ≤ (1 − / ( e · L f )) c · f · L f log n = 1 /n c (cid:48) f for a sufficiently large constant1 < c (cid:48) < c . By taking c to be a sufficiently large constant, and applying the union bound over all n f +2 triplets of s, t, F , we get that w.h.p. G is an ( L, f )- RPC .Next, assume that L ≤ f . The definition of G is almost the same up to a small modificationin the selection of the parameters. Set r = c · f L +1 log n and let p = 1 /f . To see the correctness,fix a replacement path P ( s, t, F ) with at most L edges. The probability that G i covers P ( s, t, F )is at least q = p L · (1 − p ) f = 1 / ( e · f L ). Thus the probability that none of the r subgraphs covers39 ( s, t, F ) is at most (1 − q ) r ≤ (1 − / ( e · f L )) c · f L +1 log n = 1 /n c (cid:48) f for a sufficiently large constant1 < c (cid:48) < c . By taking c to be a sufficiently large constant, and applying the union bound over all n f +2 triplets of s, t, F , we get that w.h.p. G is an ( L, f )- RPC . C Improved
RPC given Input Sets
In this section, we show an improved
RPC computation based on a given input set D . Specifically,we consider a relaxed notion of the problem as suggested by Alon, Chechik, and Cohen [ACC19]and provide an ( L, f )- RPC for this relaxed notion with nearly optimal covering value. The mainresult of this section is the following.
Theorem 48.
Let
L, f be integer parameters such that L ≥ f . There exists an algorithm A thattakes as input a graph G on n vertices and m edges and a list D = { ( P , F ) , . . . , ( P k , F k ) } of k pairs of L -length replacement paths P i and set of faults F i that it avoids and outputs a restricted ( L, f ) - RPC G ( D ) satisfying that for every ( P i , F i ) ∈ D , there is a subgraph G (cid:48) ∈ G ( D ) that contains P i and avoids F i . Moreover, the running time of A is ( m + k ) · (log m ) O (1) · ( αLf log m ) f , where α ∈ N is some small universal constant. Towards the goal of proving Theorem 48, we start by showing that for every a, b, N , given anexplicit set S = { ( A, B ) | A, B ⊆ [ N ] , | A | ≤ a, | B | ≤ b, A ∩ B = ∅} , there exists considerablysmaller set of hash function H S = { h : [ N ] → [ q ] } with the following property. For every ( A, B ) ∈ S ,there exists a function h ∈ H S that does not collide on ( A, B ). The next lemma should be comparedwith Corollary 18. The latter works for any pair of disjoint sets
A, B , while the next lemma satisfiesthe collision-free property for every (
A, B ) ∈ S . This allows us to obtain a considerably smallerfamily of functions. Lemma 49.
Let b ≤ a ≤ N all be integers. There is an algorithm A which given a set S = { ( A, B ) (cid:105) | A, B ⊆ [ N ] , | A | ≤ a, | B | ≤ b, A ∩ B = ∅} and a [ N, a, b, (cid:96) ] q -Strong HM hash family H as input, and outputs a collection of hash functions H S = { h : [ N ] → [ q ] } such that the followingholds: • (P1) For every ( A, B ) ∈ S , ∃ h ∈ H S such that ∀ ( x, y ) ∈ A × B , we have h ( x ) (cid:54) = h ( y ) . • (P2) |H S | = O (log |S| ) .Moreover, A runs in time O ( T H + a · (cid:96) · |S| ) , where T H is the computation time of H .Proof. For every (
A, B ) ∈ S , let H A,B = { i ∈ [ (cid:96) ] | ∀ ( x, y ) ∈ A × B, h i ( x ) (cid:54) = h i ( y ) } . Since H isa Strong HM hash family, we have |H A,B | ≥ (cid:96) / . The desired collection of hash functions H S isobtained by computing a small hitting set for the sets {H A,B | ( A, B ) ∈ S} . This can be done bythe algorithm of Lemma 36.We next analyze the computation time. First we compute the (cid:96) × N Boolean matrix M H corresponding to H where the ( i, x ) th entry of M H is simply h i ( x ). After the computation of M H we simply go over each ( A, B ) ∈ S and compute the sets H A,B . The computation time of all the In the problem statement of [ACC19], k = O ( n (cid:15) ). A,B sets takes O ( |S| · (cid:96) · ( a + b )) time. Then, the set H S is computed by applying the hitting setalgorithm of Lemma 36 with parameters n = (cid:96), L = (cid:96)/
2, and q = |S| . Thus the total computationtime is O ( T H + a · (cid:96) · |S| ).Finally, we show how to compute a covering graph family G ∗ L,f for the critical set D L,f . Theproof of the following lemma is similar to that of Theorem 2, but it is based on Lemma 23 andLemma 49 rather than on Theorem 14. For the sake of brevity, we only prove the below forReed-Solomon codes, as it suffices to give the claim in Theorem 48.
Lemma 50.
Given a critical set D , there is a deterministic algorithm for computing an ( L, f ) - RPC G ( D ) of cardinality O ((2 Lf log N ) f · log( |D| )) in time (cid:101) O ((2 Lf log N ) f +1 · m + ( n · L f ) · ( L · f ) ) .Proof. Set a = L , b = f and N = m and let S = D L,f . Note that since each pair in D is given by( P, F ) where P ∩ F = ∅ , | P | ≤ L and | F | = ≤ f , the set S is a legal input to Claim 49 combinedwith Reed-Solomon Strong HM hash family from Lemma 23.We then safely apply Claim 49 to compute a collection of hash functions H S = { h : [ N ] → [2 ab log N ] } that satisfies properties (P1) and (P2). For every h ∈ H S and for every subset i , . . . , i b ∈ [1 , ab log N ], define: G h,i ,i ,...,i b = { e (cid:96) ∈ E ( G ) | h ( (cid:96) ) / ∈ { i , i , . . . , i b }} . (9)Overall, G ( D ) = { G h,i ,i ,...,i b | h ∈ H S , i , i , . . . , i b ∈ [1 , ab log N ] } .The cardinality of G wL,f is bounded by O ( |H S | · (2 Lf log N ) b ) = O ((2 Lf log N ) f · log( |D| )),. Toshow that G ( D ) satisfies properties of Theorem 48, it is sufficient to show that it resiliently coversall the pairs in the critical set D L,f . Fix (
P, F ) ∈ D where P is a u - v path. We will show thatthere exists at least one subgraph G (cid:48) ∈ G ( D ) satisfying that P ⊆ G (cid:48) and F ∩ G (cid:48) = ∅ . Letting A = E ( P ) and B = F , we have that ( A, B ) ∈ S . By property (P1) of H S , there exists a function h that does not collide on A, B . That is, there exists a function h ∈ H such that h ( i ) (cid:54) = h ( j )for every i ∈ A and j ∈ B . Thus, letting B = { s , . . . , s b } and i = h ( s ) , . . . , i b = h ( s b ), wehave that h ( s (cid:48) j ) / ∈ { i , . . . , i b } for every s (cid:48) j ∈ A . Therefore, the subgraph G h,i ,i ,...,i b satisfies that A ⊆ S h,i ,i ,...,i b and B ∩ S h,i ,i ,...,i b = ∅ .Finally, we analyze the computation time. By Cl. 49, the computation of H S takes (cid:101) O ( Lf · m + ( n · L f ) · ( L · f ) ) time. Next, consider the evaluation all functions in H S on all the elementsin [ m ]. This takes (cid:101) O (log( |D| ) · m ) = (cid:101) O ( m · log( |D| )). Next, for a fixed hash function h ∈ H S and i , i , . . . , i b ∈ [1 , ab log N ], the computation of the subgraph G h,i ,i ,...,i b can be done in O ( m )time. Thus, the computation of all the subgraphs takes (cid:101) O (( L · f log N ) f · log( n · L f ) · m ) time.Finally, we show that for every ( P, F ) ∈ D , there are at most O (log N ) subgraphs in G ( D ) thatcontain no edge from F . Lemma 51.