Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs
Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh Saxena, Zhao Song, Huacheng Yu
aa r X i v : . [ c s . D S ] F e b Near-Optimal Two-Pass Streaming Algorithmfor Sampling Random Walks over Directed Graphs
Lijie Chen ∗ Gillat Kol † Dmitry Paramonov ‡ Raghuvansh R. Saxena § Zhao Song ¶ Huacheng Yu ‖ February 23, 2021
Abstract
For a directed graph G with n vertices and a start vertex u start , we wish to (approximately)sample an L -step random walk over G starting from u start with minimum space using an al-gorithm that only makes few passes over the edges of the graph. This problem found manyapplications, for instance, in approximating the PageRank of a webpage. If only a single passis allowed, the space complexity of this problem was shown to be e Θ( n · L ). Prior to our work,a better space complexity was only known with e O ( √ L ) passes.We settle the space complexity of this random walk simulation problem for two-pass stream-ing algorithms, showing that it is e Θ( n ·√ L ), by giving almost matching upper and lower bounds.Our lower bound argument extends to every constant number of passes p , and shows that any p -pass algorithm for this problem uses e Ω( n · L /p ) space. In addition, we show a similar e Θ( n ·√ L )bound on the space complexity of any algorithm (with any number of passes) for the relatedproblem of sampling an L -step random walk from every vertex in the graph. ∗ MIT. [email protected] † Princeton University. [email protected] ‡ Princeton University. [email protected] § Princeton University. [email protected] ¶ Institute for Advanced Study. [email protected] ‖ Princeton University. [email protected]
Introduction
Graph streaming algorithms.
Graph streaming algorithms have been the focus of extensivestudy over the last two decades, mainly due to the important practical motivation in analyzingpotentially huge structured data representing the relationships between a set of entities ( e . g ., thelink graph between webpages and the friendship graph in a social network). In the graph streamingsetting, an algorithm gets access to a sequence of graph edges given in an arbitrary order and itcan read them one-by-one in the order in which they appear in the sequence. The goal here is todesign algorithms solving important graph problems that only make one or few passes through theedge sequence, while using as little memory as possible.Much of the streaming literature was devoted to the study of one-pass algorithms and an Ω( n )space lower bound for such algorithms was shown for many fundamental graph problems. Apartial list includes: maximum matching and minimum vertex cover [FKM +
04, GKK12], s - t reachability and topological sorting [CGMV20, FKM +
04, HRR98], shortest path and diameter[FKM +
04, FKM + s - t ) minimum cut [Zel11], maximal independent set[ACK19, CDK19], and dominating set [AKL16, ER14].Recently, the multi-pass streaming setting received quite a bit of attention. For some graphproblems, allowing a few passes instead of a single pass can reduce the memory consumption of astreaming algorithm dramatically. In fact, even a single additional pass over the input can alreadygreatly enhance the capability of the algorithms. For instance, minimum cut and s - t minimumcut in undirected graphs can be solved in two passes with only e O ( n ) and O ( n / ) space, respec-tively [RSW18] (as mentioned above, any one-pass algorithm for these problems must use Ω( n )space). Additional multi-pass algorithms include an O (1)-pass algorithm for approximate match-ing [GKMS19, GKK12, Kap13, McG05], an O (log log n )-pass algorithm for maximal independentset [ACK19, CDK19, GGK + O (log n )-pass algorithms for approximate dominating set[AKL16, CW16, HPIMV16] and weighted minimum cut [MN20]. Simulating random walks on graphs.
Simulating random walks on graphs is a well-studiedalgorithmic problem with may applications in different areas of computer science, such as connectiv-ity testing [Rei08], clustering [ACL07, AP09, COP03, ST13], sampling [JVV86], generating randomspanning tree [Sch18], and approximate counting [JS89]. Since most applications of random-walksimulation are concerned with huge networks that come from practice, it is of practical interest todesign low-space graph streaming algorithms with few passes for this problem.In an influential paper by Das Sarma, Gollapudi and Panigrahy [SGP11], an e O ( √ L )-pass and e O ( n ) space algorithm for simulating L -step random walks on directed graphs was established.(Streaming algorithms with almost linear space complexity, like this one, are often referred to as semi-streaming algorithms). Using this algorithm together with some additional ideas, [SGP11]obtained space-efficient algorithms for estimating PageRank on graph streams. Recall that thePageRank of a webpage corresponds to the probability that a person that randomly clicks on weblinks arrives at this particular page . However, scanning the sequence of edges e O ( √ L ) times may Given a web-graph G = ( V, E ) representing the webpages and links between them, the PageRank of the ver-tices satisfy
PageRank ( u ) = P ( v,u ) ∈ E PageRank ( v ) /d ( v ), simultaneously for all u , where d ( · ) denotes the out-degree,[BP98].
1e time-inefficient in many realistic settings.In the one-pass streaming setting, a folklore algorithm with e O ( n · L ) space complexity for simu-lating L -step random walks is known [SGP11] (see Section 2.1 for a description of this algorithm),and it is proved to be optimal [Jin19]. We mention that the work of [Jin19] also considers randomwalks on undirected graphs , and shows that e Θ( n · √ L ) space is both necessary and sufficient forsimulating L -step random walks on undirected graphs with n vertices in one pass.Both of these known algorithms for general directed graphs have their advantages and dis-advantage (either requiring many passes or more space). A natural question is whether one caninterpolate between these two results and obtain an algorithm with pass complexity much smallerthan √ L , yet with a space complexity much smaller than n · L . Prior to our work, it was not evenknown if an o ( √ L )-pass streaming algorithm with n · L . space is possible. We answer the above question in the affirmative by giving a two-pass streaming algorithm with e O ( n · √ L ) space for sampling a random walk of length L on a directed graph with n vertices. Wecomplement this result by an almost matching e Ω( n · √ L ) lower bound on the space complexity ofevery two-pass streaming algorithm for this problem. In fact, our two-pass lower bound generalizesto an e Ω( n · L /p ) lower bound on the space consumption of any p -pass algorithm, for a constant p . For a directed graph G = ( V, E ), a vertex u start ∈ V and a non-negative integer L , we use RW GL ( u start ) to denote the distribution of L -step random walks ( v , . . . , v L ) in G starting from v = u start (see Section 3.2 for formal definitions). For a distribution D over a finite domain Ω,we say that a randomized algorithm samples from D if, over its internal randomness, it outputsan element ω ∈ Ω distributed according to D . We give a space-efficient streaming algorithm for(approximate) sampling from RW GL ( u start ) with small error: Theorem 1.1 (Two-pass algorithm) . There exists a streaming algorithm A two - pass that given an n -vertex directed graph G = ( V, E ) , a starting vertex u start ∈ V , a non-negative integer L indicat-ing the number of steps to be taken, and an error parameter δ ∈ (0 , /n ) , satisfies the followingconditions:1. A two - pass uses at most e O ( n · √ L · log δ − ) space and makes two passes over the input graph G .2. A two - pass samples from some distribution D over V L +1 satisfying kD − RW GL ( u start ) k TV ≤ δ . Our algorithm can also be generalized to the turnstile model, paying a poly log n factor in thespace usage. See Section 4.4.Observe that our algorithm A two - pass allows for a considerable saving in space compared to thefolklore single-pass algorithm ( e O ( n · √ L ) vs e O ( n · L )) and considerable saving in the number ofpasses compared to [SGP11] (2 vs e O ( √ L )), at least if we allow some small error δ . The e O hides logarithmic factors in n . We may assume without loss of generality that L ≤ n , as otherwise n · √ L > n and that algorithm can store the entire input graph.
2e mention that A two - pass can also be used to sample a random path from every vertex of G with the same storage cost of e O ( n · √ L · log δ − ) and two passes . This is because A two - pass satisfiesthe useful property of obliviousness to the starting vertex u start , meaning that it scans the inputgraph before the start vertex is revealed. More formally, we say that an algorithm A is obliviousto the starting vertex if it first runs a preprocessing algorithm P and then a sampling algorithm S ;the algorithm P reads the input graph stream without knowing the starting vertex u start (if A is a p -pass streaming algorithm, P makes p passes over the input graph stream), and outputs a string; S takes both the string outputted by P and a starting vertex u start as an input, and outputs a walkon the input graph G . Note, however, that the random walks from different vertices in the graph may be correlated. We count towards the space complexity only the space on the work tape used by the algorithm and do not countspace on the output tape (otherwise an Ω( n · L ) lower bound is trivial). .2.2 Lower Bounds We prove the following lower bound:
Theorem 1.2 (Multi-pass lower bound) . Fix a constant β ∈ (0 , and an integer p ≥ . Let n ≥ be a sufficiently large integer and let L = ⌈ n β ⌉ . Any randomized p -pass streaming algorithmthat, given an n -vertex directed graph G = ( V, E ) and a starting vertex u start ∈ V , samples from adistribution D such that kD − RW GL ( u start ) k TV ≤ − n requires e Ω( n · L /p ) space. Plugging in p = 2 in Theorem 1.2, implies that our two-pass algorithm from Theorem 1.1 isessentially optimal. Also, with p = 1, the theorem reproduces the one-pass lower bound by [Jin19].In addition, Theorem 1.2 rules out the possibility of a semi-streaming algorithm with any constantnumber of passes.Recall from Section 1.2.1, that our two-pass algorithm A two - pass utilizes e O ( n · √ L ) space and isoblivious to the starting vertex. Interestingly, we are able to show that any oblivious algorithm forrandom walk sampling (with any number of passes) requires e Ω( n · √ L ) space. Thus, any algorithmfor random walk sampling with significantly less space than ours, has to be inherently differentand have its storage depend on the starting vertex. Our lower bound for oblivious algorithms alsoimplies that A two - pass gives an almost optimal algorithm for sampling a pass from every start vertex,even if any number of passes are allowed. Theorem 1.3 (Lower bound for oblivious algorithms) . Let n ≥ be a sufficiently large integer andlet L denote an integer satisfying that L ∈ [log n, n ] . Any randomized algorithm that is obliviousto the start vertex and given an n -vertex directed graph G = ( V, E ) and a starting vertex u start ∈ V ,samples from a distribution D such that kD − RW GL ( u start ) k TV ≤ − n requires e Ω( n ·√ L ) space . Better space complexity with more passes?
Our results leave open a couple of interestingdirections for future work. The most significant open question is to understand the streamingspace complexity of sampling random walks with more than two passes. In particular, Theorem 1.2implies that a three-pass streaming algorithm has space complexity at least e Ω( n · L / ). Can one get e O ( n · L / ) space with three passes, or at least O ( n · L / − ε ) space, for some constant ε >
0? Notethat, as explained in Section 1.2.2, such an algorithm must utilize its knowledge of the startingvertex when it reads the graph stream.Theorem 1.2 does not rule out semi-streaming e O ( n ) space algorithms even when p is a moder-ately growing function of n and L . In [SGP11], it is shown that such an e O ( n ) space algorithm existswith p = e O ( √ L ) passes. Does a semi-streaming algorithm with, say, poly log( L ) passes exist? Undirected graphs?
It would also be interesting to see what is the best two-pass streamingalgorithm for simulating random walks on undirected graphs . Specifically, is it possible to combineour algorithm with the algorithm from [Jin19] to obtain an improvement over the optimal e O ( n ·√ L )space complexity of a one-pass streaming algorithm for this problem? In fact, we show that Theorem 1.3 holds even if the preprocessing algorithm P and the sampling algorithm S areallowed to use an arbitrarily large amount of memory, as long as P passes a string of length at most (roughly) n · √ L to S . nly outputting the end vertex? Finally, our lower bounds only apply to the case where thealgorithms need to output an entire random path ( v , . . . , v L ). If instead only the last vertex v L in the random walk is required, can one design better two-pass algorithms or prove a non-triviallower bound? We next overview our two-pass algorithm from Theorem 1.1, that simulates random walks withonly e O ( n · √ L ) space. The folklore one-pass algorithm.
Before discussing our algorithm, it would be instructiveto review the folklore e O ( n · L )-space one-pass algorithm for simulating L -step random walks in adirected graph G = ( V, E ) (for simplicity, we will always assume L ≤ n in the discussions). Thealgorithm is quite simple:1. For every vertex v ∈ V , sample L of its outgoing neighbors with replacement and store themin a list L save v of length L (that is, for each j ∈ [ L ], the j -th element of L save v is an independentuniformly random outgoing vertex of v ). This can be done in a single pass over input graphstream using reservoir sampling [Vit85].2. Given a starting vertex u start ∈ V , our random walk starts from u start and repeats the followingfor L steps: suppose we are currently at vertex v and it is the k -th time we visit this vertex,then we go from v to the k -th vertex in the list L save v .It is not hard to see that the above algorithm works: whenever we visit a vertex v ∈ V , the nextelement in the list L save v will always be a uniformly random outgoing neighbor of v , conditionedon the walk we have produced so far; and we will never run out of the available neighbors of v as | L save v | = L . A naive attempt and the obstacles.
Since we are aiming at only using e O ( n · √ L ) space, anaive attempt to improve the above algorithm is to just sample and store τ = O ( √ L ) outgoingneighbors instead of L neighbors, and simulate the walk starting from u start in the same way. Theissue here is that, during the simulation of an L -step walk, whenever one visits a vertex v morethan τ times, one would run out of available vertices in the list L save v , and the algorithm can nolonger produce a legit random walk. For a simple example, imagine we have a star-like graphwhere n − L -step random walkstarting at the center would require at least Ω( L ) samples from the center’s neighbors, and ournaive algorithm completely breaks. Our approach: heavy and light vertices.
Observe, however, that in the above example ofa star-like graph, we are only at risk of not storing enough random neighbors of the center node,as an L -step random walk would only visit the other non-center vertices a very small number of5imes. Thus, the algorithm may simply record all edges from the center with only O ( n ) space. Thisobservation inspires the following approach for a two-pass algorithm:1. In the first pass, we identify all the vertices that are likely to be visited many times by arandom walk (starting from some vertex). We call such vertices heavy , while all other verticesare called light .2. In the second pass, we record all outgoing neighbors of all heavy vertices, as well as O ( τ )random outgoing neighbors with replacement of each of the light vertices.Observe that the obtained algorithm is indeed oblivious to the starting vertex : the two passesdescribed above do not use the starting vertex. Still, given the set of outgoing neighbors stored bythe second pass, we are able to sample a random walk from any start vertex. First pass: how do we detect heavy vertices?
The above approach requires that we detect,in a single pass, all vertices v that with a decent probability (say, 1 / poly( n )), are visited morethan O ( τ ) times by an L -step random walk. To this end, we observe that if a random walk visitsa vertex v more than τ times, this random walk must follow more than τ − v in L steps. This, in turn, implies that a random walk that starts from v is likely to return to v inroughly L/τ = O ( √ L ) steps.The above discussion suggests the following definition of heavy vertices: a vertex v is heavy , ifa random walk starting from v is likely (say, with probability at least 1 /
3) to revisit v in O ( √ L )steps. Indeed, this property is much easier to detect: we can run O (log n ) independent copies ofthe folklore one-pass streaming algorithms to sample O (log n ) O ( √ L )-step random walks startingfrom v , and count how many of them return to v at some step. Second pass: can we afford to store the neighbors?
In Lemma 4.11, we show that fora light vertex v , an L -step random walk starting at any vertex visits v O ( √ L ) times with highprobability. Therefore, in the second pass, we can safely record only O ( √ L ) outgoing neighbors forall light vertices. Still, we have to record all the outgoing neighbors for heavy vertices.The crux of our analysis is a structural result about directed graphs, showing that the totaloutgoing degree of all heavy vertices is bounded by O ( n · √ L ), and therefore we can simply storeall of their outgoing neighbors. This is proved in Lemma 4.3, which may also be of independentinterest. Intuition behind the structure lemma.
Finally, we discuss the insights behind the abovestructure lemma for directed graphs. We will use d out ( v ) to denote the number of outgoing neighborsof v . For concreteness, we now say a vertex v is heavy if a random walk starting from v revisits v in √ L steps with probability at least 1 / V heavy ⊆ V be the set of heavy vertices and let v ∈ V heavy . By a simple calculation, onecan see that for at least a 1 / u of v , a random walk starting from u visits v in √ L steps with probability at least 1 /
6. The key insight is to consider the number ofpairs ( u, v ) ∈ V such that a random walk starting from u visits v in √ L steps with probability atleast 1 /
6. We will use S to denote this set. 6 By the previous discussions, we can see that for each heavy vertex v , it adds at least 1 / d out ( v ) pairs to the set S . Hence, we have |S| ≥ · X v ∈ V heavy d out ( v ) . (1) • On the other hand, it is not hard to see that for each vertex v , there are at most O ( √ L )many pairs of the form ( v, u ) ∈ S , since a √ L -step walk can visit only √ L vertices. So wealso have |S| ≤ O ( n √ L ) . (2)Putting the above (Equation 1 and Equation 2) together, we get the desired bound X v ∈ V heavy d out ( v ) ≤ O ( n √ L ) . p -Pass Algorithms We now describe the ideas behind the proof of Theorem 1.2, our e Ω( n · L /p ) space lower boundfor p -pass randomized streaming algorithms for sampling random walks. We mention that manyof the tools developed for proving space lower bounds are not directly applicable when one wishesto lower bound the space complexity of a sampling task and are more suitable for proving lowerbounds on the space required to compute a function or a search problem . From sampling to function computation.
Our way around this is to first prove a reductionfrom streaming algorithms that sample a random walk from u start to streaming algorithms thatcompute the ( p + 1)-neighborhood of the vertex u start . This is done by considering a graph where arandom walk returns to the vertex u start every p + 2 steps. If p is a constant, then a random walkof length L on such a graph can be seen as L/ ( p + 2) = O ( L ) copies of a random walk of length p + 2. Observe that if the ( p + 1)-neighborhood of the vertex u start has (almost) L vertices (andthe probability of visiting each vertex is more or less uniform), then a random walk of length L islikely to visit all the vertices in the neighborhood and an algorithm that samples a random walkalso outputs the entire neighborhood with high probability. A lower bound for computing the ( p + 1) -neighborhood via pointer-chasing. Havingreduced sampling a random walk to outputting the ( p + 1)-neighborhood, we now need to provethat a space efficient p -pass streaming algorithms cannot output the ( p + 1)-neighborhood of u start ,if this neighborhood has roughly L vertices. This is reminiscent of the “ pointer-chasing ” lowerbounds found in the literature.Pointer-chasing results are typically concerned with a graph with p +1 layers of vertices ( p layersof edges) and show that given a vertex in the first layer, finding a vertex that is reachable from One such tool that cannot be used directly for our purpose is the very useful
Yao’s minimax principle [Yao77] thatallows proving randomized communication lower bounds by proving the corresponding distributional (deterministic)communication lower bounds. V V V p +2 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· Figure 1: A depiction of our hard instance for p -pass streaming algorithms. Some edges omitted.it in the last layer cannot be done with less than p passes, unless the memory is huge. Classicalpointer-chasing lower bounds ( e.g. , [NW91]), consider graphs where the out-degree of each vertexis 1, thus the start vertex reaches a unique vertex in the last layer. Unfortunately, this type ofpointer-chasing instances are very sparse and a streaming algorithm can simply remember the entiregraph in one pass using e O ( n ) memory.Since we wish to have roughly L vertices in a ( p + 1)-neighborhood of u start , the out-degree ofeach vertex should be roughly Ω( L p +1 ) (assuming uniform degrees). Pointer-chasing lower boundsfor this type of dense graphs were also proved ( e.g. , [GO16] and [FKM + p -passalgorithms essentially need to store an entire layer of edges, which is Ω( n · L p +1 ) in our case.However, this still does not give us the Ω( n · L p ) lower bound we aspire for (and which is tight, atleast for two passes). Towards a tight lower bound: combining dense and sparse.
To get a better lower bound,we construct a hard instance that is a combination of the two above mentioned types of pointer-chasing instances, the dense and the sparse. Specifically, for a p -pass lower bound, we construct alayered graph with p + 2 layers of vertices V , . . . , V p +2 , where the first layer has only one vertex u start and all the other layers are of equal size (see Figure 1). To ensure that vertex u start is reachedevery p +2 steps, we connect all vertices in the last layer to u start . Every vertex in layers V , . . . , V p +1 connects to a random set of roughly L p vertices in the next layer. Using Guruswami and Onakstyle arguments ([GO16]), it can be shown that when the edges are presented to the algorithmfrom right to left, finding a vertex in layer V p +2 that is reachable form a given vertex in V with a( p − n · L p ) space. We “squeeze out” an extra pass in the algorithmby connecting the start vertex u start in V to a single random vertex in V . Note that with thisconstruction, it is indeed the case that a ( p + 1)-neighborhood of u start consists of only roughly L vertices, but still, the out-degrees of vertices in V , . . . , V d +1 are roughly L p instead of only L p +1 .8 .3 Lower bounds for Oblivious Algorithms Finally, we discuss the intuitions behind the proof of Theorem 1.3, showing that any algorithm thatis oblivious to the starting vertex must use e Ω( n ·√ L ) space space. Our proof is based on a reductionfrom a multi-output generalization of the well-studied INDEX problem for one-way communicationprotocols, denoted by
INDEX m,ℓ . In
INDEX m,ℓ , Alice gets ℓ strings X , . . . , X ℓ ∈ { , } m and Bobgets an index i ∈ [ ℓ ]. Alice sends a message to Bob and then Bob is required to output the string X i . (Note that when m = 1 it becomes the original INDEX problem).It is not hard to show that any one-way communication protocol solving
INDEX m,ℓ with non-trivial probability (say, 1 / poly log( m )) requires Alice to send at least e Ω( mℓ ) bits to Bob (see Lemma C.3).Our key observation here is that if there is a starting vertex oblivious algorithm A = ( P , S )with S space for approximate simulation of an L = e O ( m )-step random walk on a graph with n = O ( √ m · ℓ ) vertices, then it implies a one-way communication protocol for INDEX m,ℓ with com-munication complexity S and a decent success probability. Recall the lower bound for INDEX m,ℓ ,we immediately have S = e Ω( mℓ ) = e Ω( n √ L ).In more detail, given an m -bit string X , we will build an O ( √ m )-vertex graph H ( X ) by encodingall bits of X as existence/non-existence of edges in H (this is possible since there are more than m potential edges in H ). We also add some artificial edges to H to make sure it is strongly connected.Our construction will make sure that an L = e O ( m ) steps random walk in H will reveal all edges in H with high probability, which in turn reveals all bits of X (see the proof of Theorem 1.3 for moredetails).Now the reduction can be implemented as follows: given ℓ strings X , . . . , X ℓ ∈ { , } m , Aliceconstructs a graph G = F ℓi =1 H ( X i ), as the joint union of ℓ graphs. Note that G has n = O ( √ m · ℓ )vertices. Alice then runs the preprocessing algorithm P on G to obtain a string M , and sends it toBob. Given an index i ∈ [ ℓ ], Bob simply runs S with M together with a suitable starting vertexinside the H ( X i ) component of G . By previous discussions, this reveals the string X i with highprobability and proves the correctness of this reduction. Hence, the space complexity of A must be e Ω( mℓ ) = e Ω( n √ L ). Organization of this paper
In Section 3 we introduce the necessary preliminaries for this paper. In Section 4 we present ournearly optimal two-pass streaming algorithm for simulating random walks and prove Theorem 1.1.In Section 5 we prove our lower bounds against general multi-pass streaming algorithms for sim-ulating random walks (Theorem 1.2). In Appendix A we present some additional preliminaries ininformation theory. In Appendix B we provide some missing proofs in Section 5. In Appendix Cwe prove Theorem 1.3.
Let n ∈ N . We use [ n ] to denote the set { , . . . , n } . We often use sans-serif letters ( e . g ., X ) todenote random variables, and calligraphic font letters ( e . g ., X ) to denote distributions. For two9andom variables X and Y , and for Y ∈ supp( Y ), we use ( X | Y = Y ) to denote X conditioned on Y = Y . For two lists a and b , we use a ◦ b to denote their concatenation.For two distributions D and D on set X and Y respectively, we use D ⊗ D to denote theirproduct distribution over X × Y , and kD − D k TV to denote the total variation distance betweenthem. In this paper we will always consider directed graphs without multi-edges. A directed G is a pair( V, E ), where V is the vertex set and E ⊆ V × V is the set of all edges.For a vertex u in a graph G = ( V, E ), we let N G out ( u ) := { v : ( u, v ) ∈ E } and N G in ( u ) := { v :( v, u ) ∈ E } . We also use d G out ( u ) and d G in ( u ) to denote its out and in degrees ( i . e ., | N G out ( u ) | and | N G in ( u ) | ). For an edge ( u, v ) ∈ E , we say v is the out-neighbor of u and u is the in-neighbor of v . Random walks on directed graphs.
For a vertex u in a graph G = ( V, E ) and an non-negativeinteger L , an L -step random walk ( v , v , . . . , v L ) starting at u is generated as follows: set v = u ,for each i ∈ [ L ], we draw v i uniformly random from N G out ( u ). We say that v = u is the 0-th vertexon the walk, and v i is the i -th vertex for each i ∈ [ L ]. We use RW GL ( u ) to denote the distributionof an L -step random walk starting from u in G .We use visit G [ a,b ] ( u, v ) to denote the probability of a b -step random walk starting from u visits v between the a -th vertex and b -th vertex on the walk.We often omit the superscript G when the graph G is clear from the context. Starting vertex oblivious algorithms.
Now we formally define a starting vertex obliviousstreaming algorithm for simulating random walks.
Definition 3.1.
We say a p -pass S -space streaming algorithm A for simulating random walks isstarting vertex oblivious, if A can be decomposed into a preprocessing subroutine P and a samplingsubroutine S , such that:1. ( Starting vertex oblivious preprocessing phase ) P makes p passes over the input graphstream, using at most S words of space. After that, P outputs at most S words, denoted as M .2. ( Sampling phase ) S takes both the starting vertex u start and M as input, and outputs adesired walk starting from u start , using at most S words of space. The following standard concentration bounds will be useful for us.
Lemma 3.2 (Multiplicative Chernoff bound, [Che52]) . Suppose X , · · · , X n are independent ran-dom variables taking values in [0 , . Let X denote their sum and let µ = E [ X ] denote the sum’sexpected value. Then, Pr ( X ≥ (1 + δ ) µ ) ≤ e − δ µ δ , ∀ ≤ δ,
10r ( X ≤ (1 − δ ) µ ) ≤ e − δ µ , ∀ ≤ δ ≤ . In particular, we have that:
Pr ( X ≥ (1 + δ ) µ ) ≤ e − δµ · min( δ, , ∀ ≤ δ, Pr ( | X − µ | ≥ δµ ) ≤ · e − δ µ , ∀ ≤ δ ≤ . We also need the following Azuma-Hoeffding inequality.
Lemma 3.3 (Azuma-Hoeffding inequality, [Azu67, Hoe94]) . Let Z , . . . , Z n be random variablessatisfying (1) E [ | Z i | ] < ∞ for every i ∈ { , . . . , n } and E [ Z i | Z , . . . , Z i − ] ≤ Z i − for every i ∈ [ n ] (i.e., { Z i } forms a supermartingale) and (2) for every i ∈ [ n ] , | Z i − Z i − | ≤ , then for all λ > ,we have Pr[ Z n − Z ≥ λ ] ≤ exp( − λ / n ) . In particular, the following corollary will be useful for us.
Corollary 3.4 (Azuma-Hoeffding inequality for Boolean random variables, [Azu67, Hoe94]) . Let X , . . . , X n be random variables satisfying X i ∈ { , } for each i ∈ [ n ] . Suppose that E [ X i | X , . . . , X i − ] ≤ p i for all i . Then for any λ > , Pr " n X i =1 X i ≥ λ + X i =1 p i ≤ exp( − λ / n ) . Proof.
For i ∈ { , . . . , n } , let Z i = P ij =1 ( X j − p j ). From the assumption one can see that all the Z i form a supermartingale and | Z i − Z i − | ≤
1, hence the corollary follows directly from Lemma 3.3.
We will need (a weak form of) Stirling’s approximation. We include a proof for completeness.
Lemma 3.5.
For all n > , we have e · (cid:16) n e (cid:17) n ≤ n ! ≤ e n · (cid:16) n e (cid:17) n . Proof.
We have: n n n ! = n n − ( n − n − Y i =1 (cid:18) i + 1 i (cid:19) i = n − Y i =1 (cid:18) i (cid:19) i ≤ n − Y i =1 e = e n − . We also have: n ! n n +1 = ( n − n n = n − Y i =1 (cid:18) ii + 1 (cid:19) i +1 ≤ n − Y i =1 (cid:18) − i + 1 (cid:19) i +1 ≤ n − Y i =1 e − = e − n . Rearranging gives the result. 11he following bound on binomial coefficients follows:
Lemma 3.6.
For all < k < n , we have n · (cid:16) nk (cid:17) k · (cid:18) nn − k (cid:19) n − k ≤ (cid:18) nk (cid:19) ≤ n · (cid:16) nk (cid:17) k · (cid:18) nn − k (cid:19) n − k . Proof.
We have: (cid:18) nk (cid:19) = n !( n − k )! · k ! ≥ e · (cid:0) n e (cid:1) n e k · (cid:0) k e (cid:1) k · e( n − k ) · (cid:0) n − k e (cid:1) n − k (Lemma 3.5) ≥ n · (cid:16) nk (cid:17) k · (cid:18) nn − k (cid:19) n − k . We also have: (cid:18) nk (cid:19) = n !( n − k )! · k ! ≤ e n · (cid:0) n e (cid:1) n e · (cid:0) k e (cid:1) k · e · (cid:0) n − k e (cid:1) n − k (Lemma 3.5) ≤ n · (cid:16) nk (cid:17) k · (cid:18) nn − k (cid:19) n − k . In this section, we present our two-pass streaming algorithms for simulating random walks ondirected graphs.
We first define the notion of heavy and light vertices.
Definition 4.1 (Heavy and light vertices) . Given a directed graph G = ( V, E ) with n vertices and ℓ ∈ N . • ( Heavy vertices .) We say a vertex u is ℓ -heavy in G , if visit [1 ,ℓ ] ( u, u ) ≥ / (i.e., if arandom walk starting from u will revisit u in at most ℓ steps with probability at least / .) • ( Light vertices .) We say a vertex u is ℓ -light in G , if visit [1 ,ℓ ] ( u, u ) ≤ / (i.e., if a randomwalk starting from u will revisit u in at most ℓ steps with probability at most / .)We also let V ℓ heavy ( G ) and V ℓ light ( G ) be the sets of ℓ -heavy and ℓ -light vertices in G . When G and ℓ are clear from the context, we simply refer to them as V heavy and V light . emark 4.2. Note that if the revisiting probability is between [1 / , / , then the vertex is consid-ered to be both heavy and light. The following lemma is crucial for the analysis of our algorithm.
Lemma 4.3 (Upper bounds on the total out-degrees of heavy vertices) . Given a directed graph G with n vertices and ℓ ∈ N , it holds that X u ∈ V ℓ heavy ( G ) d out ( u ) ≤ O ( n · ℓ ) . Proof.
We define a set S of pairs of vertices as follows: S := { ( u, v ) ∈ V : visit [0 ,ℓ ] ( u, v ) ≥ / } . That is, a pair of vertices u and v belongs to S if and only if an ℓ -step random walk starting from u visits v with probability at least 1 / u , we further define S u := (cid:8) v ∈ N out ( u ) | visit [0 ,ℓ ] ( v, u ) ≥ / (cid:9) , and H u := (cid:8) v ∈ V | visit [0 ,ℓ ] ( u, v ) ≥ / (cid:9) . The following claim will be useful for the proof.
Claim 4.4.
The following two statements hold:1. For every u ∈ V , it holds that |H u | ≤ O ( ℓ ) .2. For every u ∈ V heavy , it holds that |S u | ≥ / · d out ( u ) .Proof. Fixing u ∈ V , the first item follows from the simple fact that X v ∈ V visit [0 ,ℓ ] ( u, v ) ≤ ℓ + 1 . Now we move to the second item, and fix u ∈ V heavy . For the sake of contradiction, supposethat |S u | < / · d out ( u ). We have visit [1 ,ℓ ] ( u, u ) = E v ∈ N out ( u ) [ visit [0 ,ℓ − ( v, u )] ≤ E v ∈ N out ( u ) [ visit [0 ,ℓ ] ( v, u )] < Pr v ∈ N out ( u ) [ v ∈ S u ] · v ∈ N out ( u ) [ v / ∈ S u ] · / < / / < / , a contradiction to the assumption that u is heavy.Finally, note that by definition of H u and S u we immediately have |S| = X u ∈ V |H u | ≥ X u ∈ V |S u | .
13y Claim 4.4, we have X u ∈ V heavy d out ( u ) ≤ · X u ∈ V |S u | ≤ · X u ∈ V |H u | ≤ O ( n · ℓ ) , which completes the proof. We first describe a simple one-pass algorithm for simulating random walks, which will be usedas a sub-routine in our two-pass algorithm. Moreover, this one-pass algorithm is starting vertexoblivious, which will be crucial for us later.
Reservoir sampling in one pass.
Before describing our one-pass subroutine, we need thefollowing basic reservoir sampling algorithm.
Lemma 4.5 ([Vit85]) . Given input access to a stream of n items such that each item can bedescribed by O (1) words, we can uniformly sample m of them without replacement using O ( m ) words of space. Using m independent reservoir samplers each with capacity 1, one can also sample m itemsfrom the stream with replacement in a space-efficient way. Corollary 4.6.
Given input access to a stream of n items such that each item can be described by O (1) words, we can uniformly sample m of them with replacement using O ( m ) words of space. Description of the one-pass algorithm.
Now we describe our one-pass algorithm for simu-lating random walks. Our algorithm A one - pass is starting vertex oblivious, and can be describedby a preprocessing subroutine P one - pass and a sampling subroutine S one - pass . Recall that as definedin Definition 3.1, P one - pass takes a single pass over the input graph streaming without knowing thestarting vertex u start , and S one - pass takes the output of P one - pass together with u start , and outputs adesired sample fo the random walk. Algorithm 1
Preprocessing phase of A one - pass : P one - pass ( G, τ, V full ) Input:
One pass streaming access to a directed graph G = ( V, E ). A parameter τ ∈ N . A subset V full ⊆ V , and we also let V samp = V \ V full . For each vertex v ∈ V full , we record all its out-neighbors in the list L save v . (That is, V full standsfor the set of vertices that we keep all its edges.) For each vertex v ∈ V samp , using Corollary 4.6, we sample τ of its out-neighbors uniformly atrandom with replacement in the list L save v . (That is, V samp stands for the set of vertices that wesample some of its edges.) For a big enough constant c >
1, whenever the number of out-neighbors stored exceeds c · τ · n ,the algorithm stops recording them. If this happens, we say the algorithm operates incorrectly and otherwise we say it operates correctly . Output:
A collection of lists ~L save = { L save v } v ∈ V .14 lgorithm 2 Sampling phase of A one - pass : S one - pass ( V, u start , L, V full , ~L save = { L save v } v ∈ V ) Input:
A starting vertex u start . The path length L ∈ N . A subset V full ⊆ V , and we also let V samp = V \ V full . Let v = u start . For each v ∈ V , we set k v = 1. for i := 1 → L do if v i − ∈ V full then v i is set to be a uniformly random element from L save v i − else if k v i − > | L save v i − | then return failure else v i ← ( L save v i − ) k vi − . k v i − ← k v i − + 1. end if end forOutput: The walk ( v , v , . . . , v L ). Analysis of the one-pass algorithm.
Now we analyze the correctness of our one-pass algo-rithm. We first observe its space complexity can be easily bounded.
Observation 4.7 (Space complexity of A one - pass ) . Given a directed graph G = ( V, E ) with n vertices. For every τ ∈ N and subset V full ⊆ V , P one - pass ( G, τ, V full ) always takes at most O ( τ · n ) words of space. Next we bound the statistical distance between its output distribution and the correct distri-bution of the random walk by the following lemma.
Lemma 4.8 (Correctness of A one - pass ) . Given a directed graph G = ( V, E ) with n vertices. Forevery integers τ, L ∈ N and subset V full ⊆ V such that τ · ( n − | V full | ) + P v ∈ V full d out ( v ) ≤ c · τ · n ,let ~ L save be random variable of the output of P one - pass ( G, τ, V full ) . For every u start ∈ V , the outputdistribution of S one - pass ( V, u start , L, V full , ~ L save ) has statistical distance β to RW GL ( u start ) , where β isthe probability that S one - pass ( V, u start , L, V full , ~ L save ) outputs failure.Proof. Conclude from τ · ( n − | V full | ) + P v ∈ V full d out ( v ) ≤ c · τ · n that P one - pass ( G, τ, V full ) alwaysoperates correctly.To bound the statistical distance between the distribution of S one - pass ( V, u start , L, V full , ~ L save ) and RW GL ( u start ). We construct another random variable ( ~ L save ) ′ , in which for every vertex u , we sampleanother L out-neighbors of u uniformly at random with replacement, and add them to the end ofthe list L save v in ~ L save .Note that S one - pass ( V, u start , L, V full , ( ~ L save ) ′ ) never outputs failure, and distributes exactly thesame as RW GL ( u start ). On the other hand, S one - pass ( V, u start , L, V full , ( ~ L save ) ′ ) and S one - pass ( V, u start , L, V full , ~ L save )are the same as long as S one - pass ( V, u start , L, V full , ~ L save ) does not output failure, which completes theproof.The following corollary follows immediately from the lemma above. (Note that this special caseexactly corresponds to the folklore one-pass streaming algorithm for simulating random walks.)15 orollary 4.9. Given a directed graph G = ( V, E ) with n vertices and an integer L ∈ N . Let ~ L save be random variable of the output of P one - pass ( G, L, ∅ ) . For every u start ∈ V , the output distributionof S one - pass ( V, u start , L, ∅ , ~ L save ) distributes identically as RW GL ( u start ) . Description of the two-pass algorithm.
Now we are ready to describe our two pass algorithm A two - pass , which is also starting vertex oblivious, and can be described by the following two sub-routines P two - pass and S two - pass . Algorithm 3
Preprocessing phase of A two - pass : P two - pass ( G, L, δ ) Input:
A directed graph G = ( V, E ) with n vertices. An integer L ∈ N . A failure parameter δ ∈ (0 , /n ). We also let ℓ = √ L , and γ = c · log δ − where c ≥ First pass: estimation of heavy and light vertices.
1. Run γ independent instances of P one - pass ( G, ℓ, ∅ ) and let ( L save ) (1) , . . . , ( L save ) ( γ ) be thecorresponding collections of lists.2. For each vertex u ∈ V , by running S one - pass ( V, u, ℓ, ∅ , ( L save ) ( j ) ) for each j ∈ [ γ ], we take γ independent samples from RW Gℓ . Let w u be the fraction of these random walks thatrevisit u in ℓ steps.3. Let e V heavy be the set of vertices with w u ≥ .
5, and e V light be the set of vertices with w u < . Second Pass: heavy-light edge recording
1. Let V full = e V heavy .2. Run P one - pass ( G, γ · ℓ, V full ) to obtain a collection of lists ~L save . Output:
The set V full and the collection of lists ~L save . Algorithm 4
Sampling phase of A two - pass : S two - pass ( V, u start , L, V full , ~L save = { L save v } v ∈ V ) Input:
A starting vertex u start . The path length L ∈ N . A subset V full ⊆ V , and a collection oflists ~L save . Output:
Simulate S one - pass ( V, u start , L, V full , ~L save ) and return its output.
Analysis of the algorithm.
We first show that with high probability, e V light and e V heavy are subsetsof V light and V heavy respectively. Lemma 4.10.
Given a directed graph G = ( V, E ) with n vertices, L ∈ N and δ ∈ (0 , /n ) , letting ℓ = √ L , with probability at least − δ/ over the internal randomness of P two - pass ( G, L, δ ) , it holdsthat e V light ⊆ V light and e V heavy ⊆ V heavy .Proof. Setting c in Algorithm 3 to be a large enough constant and applying Corollary 4.9 and theChernoff bound, with probability at least 1 − n · δ ≥ − δ/ | w u − visit [1 ,ℓ ] ( u, u ) | ≤ . u ∈ V . The lemma then follows from the definition of heavy and light vertices.16ext, we show that with high probability, a random walk does not visit a light vertex too manytimes. Lemma 4.11.
Given a directed graph G = ( V, E ) with n vertices, L ∈ N and δ ∈ (0 , /n ) , letting ℓ = √ L and γ = c · log δ − , where c > is the sufficiently large constant, for every vertex u start ∈ V and vertex v ∈ V ℓ light ( G ) , an L -step random walk starting from u start visits v more than γ · ℓ times with probability at most δ/ n .Proof. Suppose we have an infinite random walk W starting from u start in G . Letting τ = γℓ , thegoal here is to bound the probability that during the first L steps, W visits v more than τ times.We denote this as the bad event E bad .Let Z i be the random variable representing the step at which W visits v for the i -th time (if W visits v less than i times in total, we let Z i = ∞ ). E bad is equivalent to that Z τ +1 ≤ L . Z τ +1 ≤ L further implies that for at least ( τ − ℓ ) i ∈ [ τ ], Z i +1 − Z i ≤ ℓ and Z i < ∞ . In thefollowing we denote this event as E and bounds its probability instead.For each i ∈ [ τ ], let Y i be the random variable which takes value 1 if both Z i < ∞ and Z i +1 − Z i ≤ ℓ hold, and 0 otherwise. Letting Y
For every i ∈ [ τ ] and every possible assignments Y
By the Markov property of the random walk, and noting that Y i is always 0 when Z i = ∞ ,we have. E [ Y i | Y
3, which completes the proof.Then by the Azuma-Hoeffding inequality (Corollary 3.4),Pr W [ E bad ] ≤ Pr W [ E ]= Pr W " τ X i =1 Y i ≥ ( τ − ℓ ) exp( − Ω( τ − ℓ − / · τ )) ≤ δ/ n, the last inequality follows from the fact that γ = c · log δ − for a sufficiently large constant c .The correctness of the algorithm is finally completed by the following theorem. Theorem 4.13 (Formal version of Theorem 1.1) . Given a directed graph G = ( V, E ) with n ver-tices, L ∈ N and δ ∈ (0 , /n ) . Let ~ L save and V full be the two random variables of the output of P two - pass ( G, L, δ ) . For every u start ∈ V , the following hold: • The output distribution of S two - pass ( V, u start , L, V full , ~ L save ) has statistical distance at most δ from RW GL ( u start ) . • Both of P two - pass ( G, L, δ ) and S two - pass ( V, u start , L, V full , ~ L save ) use at most O ( n · √ L · log δ − ) words of space.Proof. Note that we can safely assume L ≤ n , since otherwise one can always use O ( n ) words tostore all the edges in the graph. In this case, we have that L ≤ n · √ L and the space for restoringthe L -step output walk can be ignored.Let e V heavy = V full and e V light = V \ e V heavy . Let E good be the event that e V light ⊆ V light and e V heavy ⊆ V heavy . By Lemma 4.10, we have that Pr[ E good ] ≥ − δ/ E good . In this case, it follows from Lemma 4.3 that P one - pass ( G, γ · ℓ, e V heavy ) operates correctly (by setting the constant c in Algorithm 1 to be sufficiently large).By Lemma 4.11 and a union bound, the probability of S two - pass ( V, u start , L, e V heavy , ~ L save ) outputsfailure is at most δ/
2. By Lemma 4.8, it follows that the statistical distance between the outputdistribution of S two - pass ( V, u start , L, e V heavy , ~ L save ) and RW GL ( u start ) is at most δ/ E good ] ≥ − δ/ Similar to the algorithm in [Jin19], our algorithms can also be easily adapted to work for the turnstilegraph streaming model , where both insertions and deletions of edges are allowed. Note that ourtwo-pass algorithm A two - pass only accesses the input graph stream via the one-pass preprocessingsubroutine P one - pass . Hence, it suffices to implement P one - pass in the turnstile model as well. Thereare two distinct tasks in P one - pass : (1) for light vertices, we need to sample their outgoing neighborswith replacement and (2) for heavy vertices, we need to record all their outgoing neighbors. Uniformly sampling via ℓ sampler. For light vertices, uniformly sampling some out-neighborsfrom each vertex without replacement can be implemented via the following ℓ sampler in theturnstile model. Lemma 4.14 ( ℓ sampler in the turnstile model [JW18]) . Let n ∈ N , failure probability δ ∈ (0 , / and f ∈ R n be a vector defined by a streaming of updates to its coordinates of the form f i ← f i + ∆ , here ∆ ∈ {− , } . There is a randomized algorithm which reads the stream, and with probabilityat most δ it outputs FAIL, otherwise it outputs an index i ∈ [ n ] such that: Pr( i = j ) = | f j |k f k + O ( n − c ) , ∀ j ∈ [ n ] where c ≥ is some arbitrarily large constant.The space complexity of this algorithm is bounded by O (log ( n ) · log(1 /δ )) bits in the randomoracle model, and O (log ( n ) · (log log n ) · log(1 /δ )) bits otherwise. Remark 4.15.
To get error in the statistical distance also to be at most δ , one can simply set n to be larger than /δ . And in that case the space complexity can be bounded by O (log ( n/δ )) . Recording all outgoing neighbors via ℓ heavy hitter. For heavy vertices, recording alltheir outgoing neighbors can be implemented using the following ℓ heavy hitter in the turnstilemodel. (Recall that we assumed our graphs is a simple graph without multiple edges.) Lemma 4.16 ( ℓ heavy hitter in the turnstile model [CCFC02]) . Let n, k ∈ N , δ ∈ (0 , . and f ∈ R n be a vector defined by a streaming of updates to its coordinates of the form f i ← f i +∆ , where ∆ ∈ {− , } . There is an algorithm which reads the stream and returns a subset L ⊂ [ n ] such that i ∈ L for every i ∈ [ n ] such that | f i | ≥ k f k /k and i L for every i ∈ [ n ] such that | f i | ≤ k f k / k .The failure probability is at most δ , and the space complexity is at most O ( k · log( n ) · log( n/δ )) . Algorithm in the turnstile model.
Modifying P one - pass with Lemma 4.14 and Lemma 4.16,we can generalize our two-pass streaming algorithm to work in two-pass turnstile model. Remark 4.17 (Two-pass algorithm in the turnstile model) . There exists a streaming algorithm A turnstile that given an n -vertex directed graph G = ( V, E ) via a stream of both edge insertions andedge deletions, a starting vertex u start ∈ V , a non-negative integer L indicating the number of stepsto be taken, and an error parameter δ ∈ (0 , /n ) , satisfies the following conditions:1. A turnstile uses at most e O ( n · √ L · log δ − ) space and makes two passes over the input graph G .2. A turnstile samples from some distribution D over V L +1 satisfying kD − RW GL ( u start ) k TV ≤ δ . Reminder of Theorem 1.2.
Fix a constant β ∈ (0 , and an integer p ≥ . Let n ≥ bea sufficiently large integer and let L = ⌈ n β ⌉ . Any randomized p -pass streaming algorithm that,given an n -vertex directed graph G = ( V, E ) and a starting vertex u start ∈ V , samples from adistribution D such that kD − RW GL ( u start ) k TV ≤ − n requires e Ω( n · L /p ) space.Proof. We show Theorem 1.2 in two steps, that are captured in Lemma 5.1 and Theorem 5.2 below.Theorem 1.2 is a direct corollary of Lemma 5.1 and Theorem 5.2. In more details, for each light vertex u , we run τ independent copies of the ℓ sampler to obtain τ samples fromits outgoing neighbors with replacement. We also let k = c · τ · n and use the ℓ heavy hitter to record all outgoingneighbors for all heavy vertices in e O ( n · √ L · log(1 /δ )) space. L when it is clear from context. Hard Input Distribution D n,p,L • Setup:
Define ∆ = (cid:18) L (log n ) p (cid:19) p and ∆ ′ = ∆ /n . • Vertices:
We construct a layered graph G with p + 2 layers V , V , · · · , V p +2 satisfying | V | = 1 and | V i | = n for all i ∈ [ p + 2] \ { } . We use s to refer to the unique vertex in V . • Edges:
There are p + 2 sets of edges E , E , · · · , E p +2 where edges E i are between V i and layer V i +1 (indices taken modulo p + 2). These are constructed as follows: – The set E is a singleton { ( s, v ) } where v is a vertex sampled uniformly at randomfrom V . – For all 1 < i < p + 2, and all v ∈ V i , sample a a subset S v ⊆ V i +1 uniformly andindependently such that | S v | = ∆. Set E i = { ( v, v ′ ) | v ∈ V i , v ′ ∈ S v } . – Define the set E p +2 = { ( v, s ) | v ∈ V p +2 } . • Edge ordering:
The edges are revealed to the streaming algorithm in the order E p +2 , E p +1 , · · · , E . Lemma 5.1.
Suppose there exists a constant β ∈ (0 , , an integer p ≥ , integers n, L thatare sufficiently large and satisfy L = ⌈ n β ⌉ such that there exists a (randomized) p -pass streamingalgorithm A that takes space n · L /p (log n ) p and, given an n -vertex directed graph G = ( V, E ) and astarting vertex u start ∈ V , can sample from a distribution D such that kD − RW GL ( u start ) k TV ≤ − n ) .Then, there exists another randomized p -pass streaming algorithm A ′ that takes space n · L /p (log n ) p and satisfies: Pr G ∼D n,p (cid:0) A ′ ( G ) = P p +1 ( s ) (cid:1) ≥ n ) . Proof.
Let A ′ be the algorithm that first runs A on its input and s to get as output a walk W = ( v , v , . . . , v L ). Define E ( W ) = { ( v i − , v i ) | i ∈ [ L ] } to be the set of edges witnessed by W .The algorithm A ′ then outputs all paths P p +1 ( s, W ) of length p + 1 starting for s using only theedges E ( W ).Let N ≤ p ( s ) = S pi ′ =0 N i ( s ). Observe that if a walk W satisfies P p +1 ( s, W ) = P p +1 ( s ), theneither E ( W ) E or P ( N ≤ p ( s )) E ( W ). Thus, we have, for all G ∈ supp( D n,p ) that:Pr (cid:0) A ′ ( G ) = P p +1 ( s ) (cid:1) = Pr W ∼A ( G ) (cid:0) P p +1 ( s, W ) = P p +1 ( s ) (cid:1) ≤ Pr W ∼ RW GL ( s ) (cid:0) P p +1 ( s, W ) = P p +1 ( s ) (cid:1) + 1 − n ) Pr W ∼ RW GL ( s ) (cid:0) E ( W ) E ∨ P ( N ≤ p ( s )) E ( W ) (cid:1) + 1 − n ) . (Union bound)As E ( W ) ⊆ E for all W ∼ RW GL ( s ), we have:Pr (cid:0) A ′ ( G ) = P p +1 ( s ) (cid:1) ≤ Pr W ∼ RW GL ( s ) (cid:0) P ( N ≤ p ( s )) E ( W ) (cid:1) + 1 − n ) . Thus, to finish the proof, it suffices to show that Pr W ∼ RW GL ( s ) (cid:0) P ( N ≤ p ( s )) E ( W ) (cid:1) ≤ n ) .This is done in the rest of the proof. First, observe from the definition of D n,p that P ( N ≤ p ( s )) isa collection of at most p · ∆ p ≪ L (log n ) p edges. We get by a union bound:Pr W ∼ RW GL ( s ) (cid:0) P ( N ≤ p ( s )) E ( W ) (cid:1) ≤ L · max e ∈ P ( N ≤ p ( s )) Pr W ∼ RW GL ( s ) ( e / ∈ E ( W )) . (3)Fix e ∈ P ( N ≤ p ( s )) and observe that v i = s for every i that is a multiple of p + 2. Using the Markovproperty of random walks, we get:Pr W ∼ RW GL ( s ) ( e / ∈ E ( W )) ≤ Pr W ∼ RW GL ( s ) ( ∀ i ∈ [ p + 2] : e = ( v i − , v i )) ! L p . As the out-degree of s is 1 and the out-degree of every other vertex is at most ∆ (Observation 5.3),we conclude that: Pr W ∼ RW GL ( s ) ( e / ∈ E ( W )) ≤ (cid:18) − p (cid:19) L p ≤ e − L p · ∆ p ≤ n , as p · ∆ p ≪ L (log n ) p . Plugging into Equation 3 finishes the proof. Theorem 5.2.
Let a constant β ∈ (0 , and an integer p ≥ be given. Let n ≥ be sufficientlylarge and L = ⌈ n β ⌉ . For all (randomized) p -pass streaming algorithms A that takes space n · L /p (log n ) p ,we have that: Pr G ∼D n,p (cid:0) A ( G ) = P p +1 ( s ) (cid:1) ≤ n ) , The proof of Theorem 5.2 spans the rest of this section. We start with some notation and someproperties of the distribution D n,p . Fix p ≥ n large enough (as a function of p ) for the restof this subsection. D n,p Notation.
As the set E p +2 is fixed, we shall sometimes view D n,p as a distribution over the sets E , E , · · · , E p +1 . We shall use V to denote the set of all vertices and E to denote the set of alledges, thus G = ( V, E ). For i ∈ [ p + 1] we define E − i to be E \ E i .21or a vertex v ∈ V and i ≥
0, define the set P i ( v ) to be the set of paths of length i startingfrom v . Also define P i ( S ) for a subset S ⊆ V of vertices as P i ( S ) = S v ∈ S P i ( v ). We drop thesuperscript i when i = 1. Observe that for all i ∈ [ p + 2], we have P ( V i ) = E i . Similarly, define N i ( v ) to be the set of all vertices that can be reached by a path of length exactly i from v , i.e. , avertex v ′ ∈ N i ( v ) if and only if there is a path ending at v ′ in P i ( v ).Throughout, we shall use h ( x ) = − x log( x ) − (1 − x ) log(1 − x ) to denote the binary entropyfunction. Observe that h ( · ) is concave and monotone increasing for 0 < x < .In this section, we collect some useful properties of the distribution D n,p defined above. Allthese properties can be proved by straightforward but tedious calculations, so we defer their proofsto Section B.1. N k ( s )We state without proof the following observation: Observation 5.3.
It holds that:1. For all v ∈ V ∪ V p +2 , we have | N ( v ) | = 1 .2. For all v ∈ V \ ( V ∪ V p +2 ) , we have | N ( v ) | = ∆ .3. For all k ∈ [ p + 1] , we have (cid:12)(cid:12) N k ( s ) (cid:12)(cid:12) ≤ ∆ k − . Owing to item 3 above, it shall be useful to define, for all k ∈ [ p + 1], the notation∆ k = ∆ k − and ∆ ′ k = ∆ k n . (4)Also recall that we used ∆ ′ to denote ∆ /n . Lemma 5.4.
For all k ∈ [ p + 1] , we have: Pr (cid:12)(cid:12)(cid:12) N k ( s ) (cid:12)(cid:12)(cid:12) ≤ ∆ k · − k (log n ) p !! ≤ kn . Corollary 5.5 (Corollary of Observation 5.3 and Lemma 5.4) . For all k ∈ [ p + 1] , we have: Pr ∆ k · − n ) p − ! ≤ (cid:12)(cid:12)(cid:12) N k ( s ) (cid:12)(cid:12)(cid:12) ≤ ∆ k ! ≥ − n . N k ( s ) Lemma 5.6.
For all v ∈ V \ ( V ∪ V p +2 ) and any event E , we have: H ( N ( v ) | E ) ≤ n · h (cid:0) ∆ ′ (cid:1) · n ) p ! and H ( N ( v )) ≥ n · h (cid:0) ∆ ′ (cid:1) · − n ) p ! . orollary 5.7. For all < k ≤ p + 1 and any event E , we have: H ( E k | E ) ≤ n · h (cid:0) ∆ ′ (cid:1) · n ) p ! and H ( E k ) ≥ n · h (cid:0) ∆ ′ (cid:1) · − n ) p ! . Lemma 5.8.
We have H ( N ( s )) = log n and, for all < k ≤ p + 1 : H (cid:16) N k ( s ) (cid:17) ≥ n · h (cid:0) ∆ ′ k (cid:1) · − n ) p ! . P k ( s ) Lemma 5.9.
For all events E , we have H ( P ( s ) | E ) ≤ log n and, for all < k ≤ p + 1 : H (cid:16) P k ( s ) | E (cid:17) ≤ ∆ ′ k − · n ) p ! · H ( E k ) . Lemma 5.10.
We have H ( P ( s )) = log n and, for all < k ≤ p + 1 : H (cid:16) P k ( s ) (cid:17) ≥ ∆ ′ k − · − n ) p ! · H ( E k ) . Lemma 5.11.
For all events E , it holds that: − H ∞ ( P p +1 ( s ) | E ) ≤ − H (cid:0) P p +1 ( s ) | E (cid:1) − ′ p · (cid:18) n ) p (cid:19) · H ( E p +1 ) . Reminder of Theorem 5.2.
Let a constant β ∈ (0 , and an integer p ≥ be given. Let n ≥ be sufficiently large and L = ⌈ n β ⌉ . For all (randomized) p -pass streaming algorithms A that takesspace n · L /p (log n ) p , we have that: Pr G ∼D n,p (cid:0) A ( G ) = P p +1 ( s ) (cid:1) ≤ n ) , Communication game.
To show our lower bound, we consider a communication game Π with p + 1 players. The edges E p +2 are known to all players. In addition, player i ∈ [ p + 1] also knowsthe edges E p +2 − i . Define T = p ( p + 1). The communication takes place in T − i , player i sends a message M i to player i + 1 (indices taken modulo p + 1) based on itsinput and all the messages received so far. After these T − p + 123utputs an answer based on its input and all the received messages. We treat the output as the T th message and denote it by M T . We use k Π k to denote the maximum (over all inputs) totalcommunication in Π (excluding the output). Proof of Theorem 5.2.
To start, note that we can assume that A is deterministic without loss ofgenerality. The algorithm A implies a deterministic communication game Π as above satisfying k Π k ≤ T · n · L /p (log n ) p ≤ n · ∆(log n ) p . For j ∈ [ T ], We shall use M j to denote the random variable corresponding to the j th message ofthe protocol. This random variable is over the probability space defined by D n,p . Also, define M For all ≤ j ≤ T and i ∈ [ p + 1] , we have: I ( E i : E − i | M ≤ j ) = 0 . Proof. We repeatedly apply Lemma A.13 to remove the conditioning on M ≤ j . This is possible asfor all j ′ ∈ [ j ], either message j ′ is not sent by player p + 2 − i in which case M j ′ is independent of E i given E − i and M For all ≤ k ≤ p + 1 , we have: H (cid:16) P k ( s ) | M ≤ kp (cid:17) = log n, if k = 1 H (cid:16) P k ( s ) | M ≤ kp (cid:17) ≥ (1 − ε k ) · ∆ ′ k − · H ( E k ) , if k > . Before proving Lemma 5.13, we need the following two important technical lemmas, whoseproofs can be found in Section B.2. Let v (1) k , v (2) k , · · · , v ( n ) k be the vertices in layer V k and, for i ∈ [ n ], define the set V For all < k ≤ p + 1 , assuming Lemma 5.13 holds for k − , and let B be the setpromised by Lemma 5.14. For all M ≤ t ′ / ∈ B , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:26) i ∈ [ n ] | Pr (cid:16) v ( i ) k ∈ N k − ( s ) | M ≤ t ′ (cid:17) ≤ ∆ ′ k − ε k (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε k · n. Now we are ready to prove Lemma 5.13. Proof of Lemma 5.13. Induction on k . For the base case, we have k = 1. Recall that M ≤ p isdetermined by E − while P ( s ) is determined by E . As these two are independent, we have that: H ( P ( s ) | M ≤ p ) = H ( P ( s )) = log n, as desired. For the induction step, we show the lemma holds for k > k − B be the set promised by Lemma 5.14. Define, for all M ≤ t ′ , the set: S ( M ≤ t ′ ) = (cid:26) i ∈ [ n ] | Pr (cid:16) v ( i ) k ∈ N k − ( s ) | M ≤ t ′ (cid:17) ≤ ∆ ′ k − ε k (cid:27) . (5)By Lemma 5.15, for all M ≤ t ′ / ∈ B we have | S ( M ≤ t ′ ) | ≤ ε k · n . Letting ∇ := H (cid:16) P k ( s ) | M ≤ t (cid:17) for simplicity, we have: ∇ ≥ H (cid:16) P ( N k − ( s )) | M ≤ t E − k (cid:17) (Lemma A.4 and P k ( s ) determines P ( N k − ( s ))) ≥ H (cid:16) P ( N k − ( s )) | M ≤ t ′ E − k (cid:17) (As ( M ≤ t ′ , E − k ) and ( M ≤ t , E − k ) determine each other) ≥ X M ≤ t ′ Pr( M ≤ t ′ ) · X E − k Pr( E − k | M ≤ t ′ ) · H (cid:16) P ( N k − ( s )) | M ≤ t ′ , E − k (cid:17) . (Definition A.2 and E − k determines N k − ( s ))To continue, note that P ( N k − ( s )) is determined by E k and is independent of E − k conditionedon M ≤ t ′ by Lemma 5.12. We get: ∇ ≥ X M ≤ t ′ Pr( M ≤ t ′ ) · X E − k Pr( E − k | M ≤ t ′ ) · H (cid:16) P ( N k − ( s )) | M ≤ t ′ (cid:17) X M ≤ t ′ Pr( M ≤ t ′ ) · X E − k Pr( E − k | M ≤ t ′ ) · X i ∈ [ n ] v ( i ) k ∈ N k − ( s ) H (cid:16) P ( v ( i ) k ) | P ( V There exists a set B ∗ ⊆ supp( M ≤ T ) such that Pr( M ≤ T ∈ B ∗ ) ≤ √ ε p +1 and for all M ≤ T / ∈ B ∗ , we have: H (cid:0) P p +1 ( s ) | M ≤ T (cid:1) ≥ H (cid:0) P p +1 ( s ) (cid:1) − · √ ε p +1 · ∆ ′ p · H ( E p +1 ) . Let B ∗ be the set from Lemma 5.16. We apply Lemma 5.11 for all M ≤ T / ∈ B ∗ to get:2 − H ∞ ( P p +1 ( s ) | M ≤ T ) ≤ − H (cid:0) P p +1 ( s ) (cid:1) − · √ ε p +1 · ∆ ′ p · H ( E p +1 )∆ ′ p · (1 + ǫ p +1 ) · H ( E p +1 ) ≤ − − ǫ p +1 − · √ ε p +1 ǫ p +1 (Lemma 5.10) ≤ · √ ε p +1 . It follows that: Pr G ∼D n,p (cid:0) A ( G ) = P p +1 ( s ) (cid:1) ≤ · √ ε p +1 ≤ n ) . Acknowledgments Lijie Chen is supported by an IBM Fellowship. Zhao Song is supported in part by Schmidt Founda-tion, Simons Foundation, NSF, DARPA/SRC, Google and Amazon AWS. We would like to thankRajesh Jayaram for discussions on ℓ heavy hitters.27 eferences [ACK19] Sepehr Assadi, Yu Chen, and Sanjeev Khanna. Sublinear algorithms for (∆ + 1) vertexcoloring. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms (SODA) , pages 767–786. SIAM, 2019.[ACL07] Reid Andersen, Fan Chung, and Kevin Lang. Using pagerank to locally partition agraph. Internet Mathematics , 4(1):35–64, 2007.[AKL16] Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streamingcomplexity of the set cover problem. In , pages 698–711. Association for Computing Machinery,2016.[AP09] Reid Andersen and Yuval Peres. Finding sparse cuts locally using evolving sets. In Proceedings of the forty-first annual ACM symposium on Theory of computing , pages235–244, 2009.[Azu67] Kazuoki Azuma. Weighted sums of certain dependent random variables. TohokuMathematical Journal, Second Series , 19(3):357–367, 1967.[BP98] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web searchengine. Computer networks and ISDN systems , 30(1-7):107–117, 1998.[CCFC02] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items indata streams. In International Colloquium on Automata, Languages, and Programming(ICALP) , pages 693–703. Springer, 2002.[CDK19] Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex-arrival streams. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.[CGMV20] Amit Chakrabarti, Prantar Ghosh, Andrew McGregor, and Sofya Vorotnikova. Vertexordering problems in directed graph streams. In Proceedings of the Fourteenth AnnualACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1786–1802. SIAM,2020.[Che52] Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis basedon the sum of observations. The Annals of Mathematical Statistics , pages 493–507,1952.[COP03] Moses Charikar, Liadan O’Callaghan, and Rina Panigrahy. Better streaming algo-rithms for clustering problems. In Proceedings of the thirty-fifth annual ACM sympo-sium on Theory of computing , pages 30–39, 2003.[CT06] Thomas M. Cover and Joy A. Thomas. Elements of information theory (2. ed.) . Wiley,2006. 28CW16] Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexityof semi-streaming set cover. In Proceedings of the twenty-seventh annual ACM-SIAMsymposium on Discrete algorithms (SODA) , pages 1365–1373. SIAM, 2016.[ER14] Yuval Emek and Adi Ros´en. Semi-streaming set cover. In International Colloquiumon Automata, Languages, and Programming (ICALP) , pages 453–464. Springer, 2014.[FKM + 04] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and JianZhang. On graph problems in a semi-streaming model. In International Colloquiumon Automata, Languages, and Programming (ICALP) , pages 531–543. Springer, 2004.[FKM + 09] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and JianZhang. Graph distances in the data-stream model. SIAM Journal on Computing ,38(5):1709–1727, 2009.[GGK + 18] Mohsen Ghaffari, Themis Gouleakis, Christian Konrad, Slobodan Mitrovi´c, and RonittRubinfeld. Improved massively parallel computation algorithms for mis, matching, andvertex cover. In Proceedings of the 2018 ACM Symposium on Principles of DistributedComputing (PODC) , pages 129–138, 2018.[GKK12] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication andstreaming complexity of maximum bipartite matching. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms (SODA) , pages 468–485.SIAM, 2012.[GKMS19] Buddhima Gamlath, Sagar Kale, Slobodan Mitrovic, and Ola Svensson. Weightedmatchings via unweighted augmentations. In Proceedings of the 2019 ACM Symposiumon Principles of Distributed Computing (PODC) , pages 491–500, 2019.[GO16] Venkatesan Guruswami and Krzysztof Onak. Superlinear lower bounds for multipassgraph processing. Algorithmica , 76(3):654–683, 2016.[Hoe94] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding , pages 409–426. Springer, 1994.[HPIMV16] Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tightbounds for the streaming set cover problem. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS) , pages 371–383, 2016.[HRR98] Monika Rauch Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. Computingon data streams. External memory algorithms , 50:107–118, 1998.[Jin19] Ce Jin. Simulating random walks on graphs in the streaming model. In Avrim Blum,editor, , volume 124 of LIPIcs , pages 46:1–46:15. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik, 2019.29JS89] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM journal oncomputing , 18(6):1149–1178, 1989.[JVV86] Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. Random generation of combi-natorial structures from a uniform distribution. Theoretical computer science , 43:169–188, 1986.[JW18] Rajesh Jayaram and David P. Woodruff. Perfect lp sampling in a data stream. InMikkel Thorup, editor, , pages 544–555. IEEE Com-puter Society, 2018.[Kap13] Michael Kapralov. Better bounds for matchings in the streaming model. In Proceedingsof the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (SODA) ,pages 1679–1697. SIAM, 2013.[McG05] Andrew McGregor. Finding graph matchings in data streams. In Approximation,Randomization and Combinatorial Optimization. Algorithms and Techniques , pages170–181. Springer, 2005.[MN20] Sagnik Mukhopadhyay and Danupon Nanongkai. Weighted min-cut: sequential, cut-query, and streaming algorithms. In Proceedings of the 52nd Annual ACM SIGACTSymposium on Theory of Computing (STOC) , pages 496–509, 2020.[NW91] Noam Nisan and Avi Wigderson. Rounds in communication complexity revisited. In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, May 5-8,1991, New Orleans, Louisiana, USA , pages 419–429. ACM, 1991.[Rei08] Omer Reingold. Undirected connectivity in log-space. Journal of the ACM (JACM) ,55(4):1–24, 2008.[RSW18] Aviad Rubinstein, Tselil Schramm, and Seth Matthew Weinberg. Computing exactminimum cuts without knowing the graph. In , page 39. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH,Dagstuhl Publishing, 2018.[Sch18] Aaron Schild. An almost-linear time algorithm for uniform random spanning treegeneration. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theoryof Computing (STOC) , pages 214–227, 2018.[SGP11] Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank ongraph streams. J. ACM , 58(3):13:1–13:19, 2011.[ST13] Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massivegraphs and its application to nearly linear time graph partitioning. SIAM Journal oncomputing , 42(1):1–26, 2013.[Vit85] Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw. ,11(1):37–57, 1985. 30Yao77] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of com-plexity. In ,pages 222–227. IEEE Computer Society, 1977.[Zel11] Mariano Zelke. Intractability of min-and max-cut in streaming graphs. InformationProcessing Letters , 111(3):145–150, 2011. AppendixA Preliminaries in Information Theory Throughout this subsection, we use sans-serif letters to denote random variables and reserve E todenote an arbitrary event. All random variables will be assumed to be discrete and we shall adoptthe convention 0 log = 0. All logarithms are taken with base 2. A.1 Entropy Definition A.1 (Entropy) . The (binary) entropy of X is defined as: H ( X ) = X x ∈ supp( X ) Pr( x ) · log 1Pr( x ) . The entropy of X conditioned on E is defined as: H ( X | E ) = X x ∈ supp( X ) Pr( x | E ) · log 1Pr( x | E ) . Definition A.2 (Conditional Entropy) . We define the conditional entropy of X given Y and E as: H ( X | Y , E ) = X y ∈ supp( Y ) Pr( y | E ) · H ( X | Y = y, E ) . Henceforth, we shall omit writing the supp( · ) when it is clear from context. Lemma A.3 (Chain Rule for Entropy) . It holds for all X , Y and E that: H ( XY | E ) = H ( X | E ) + H ( Y | X , E ) . Proof. We have: H ( XY | E ) = X x,y Pr( x, y | E ) · log 1Pr( x, y | E )= X x,y Pr( x, y | E ) · log 1Pr( x | E ) · Pr( y | x, E )= X x,y Pr( x, y | E ) · log 1Pr( x | E ) + X x,y Pr( x, y | E ) · log 1Pr( y | x, E )31 H ( X | E ) + X x Pr( x | E ) · X y Pr( y | x, E ) · log 1Pr( y | x, E )= H ( X | E ) + H ( Y | X , E ) . Lemma A.4 (Conditioning reduces Entropy) . It holds for all X , Y and E that: H ( X | Y , E ) ≤ H ( X | E ) . Equality holds if and only if X and Y are independent conditioned on E .Proof. We have: H ( X | Y , E ) = X y Pr( y | E ) · H ( X | Y = y, E )= X x,y Pr( y | E ) · Pr( x | y, E ) · log 1Pr( x | y, E )= X x,y Pr( x | E ) · Pr( y | x, E ) · log Pr( y | E )Pr( x | E ) · Pr( y | x, E ) ≤ X x Pr( x | E ) · log 1Pr( x | E ) (Concavity of log( · ))= H ( X | E ) . Lemma A.5. It holds for all X and E that: ≤ H ( X | E ) ≤ log( | supp( X ) | ) . The second inequality is tight if and only if X conditioned on E is the uniform distribution over supp( X ) .Proof. The first inequality is direct. For the second, we have by the concavity of log( · ) that: H ( X | E ) = X x Pr( x | E ) · log 1Pr( x | E ) ≤ log( | supp( X ) | ) . A.2 Min-Entropy Definition A.6 (Min-Entropy) . The min-entropy of a discrete random variable X is H ∞ ( X ) = min x :Pr( x ) > log 1Pr( x ) . Fact A.7. If the random variable X takes values in the set Ω , it holds that ≤ H ∞ ( X ) ≤ H ( X ) ≤ log | Ω | Recall that h ( x ) = − x log( x ) − (1 − x ) log(1 − x ) for x ∈ [0 , 1] is the binary entropy function.32 emma A.8. If the random variable X takes values in the set Ω and | Ω | > , it holds that − H ∞ ( X ) ≤ − H ( X ) − | Ω | ) . Proof. If X is a point mass, there is nothing to show. Otherwise, let x ∗ be such that Pr( x ∗ ) is thelargest possible (breaking ties arbitrarily). We have: H ( X ) = Pr( x ∗ ) · log 1Pr( x ∗ ) + X x = x ∗ Pr( x ) · log 1Pr( x )= h (Pr( x ∗ )) − (1 − Pr( x ∗ )) · log 11 − Pr( x ∗ ) + X x = x ∗ Pr( x ) · log 1Pr( x )= h (Pr( x ∗ )) + (1 − Pr( x ∗ )) · X x = x ∗ Pr( x )1 − Pr( x ∗ ) · log 1 − Pr( x ∗ )Pr( x )= h (Pr( x ∗ )) + (1 − Pr( x ∗ )) · H ( X | X = x ∗ ) . Using the fact that h ( · ) ≤ H ( X ) ≤ − Pr( x ∗ )) · log( | Ω | ) . Rearranging gives: 2 − H ∞ ( X ) = Pr( x ∗ ) ≤ − H ( X ) − | Ω | ) . A.3 Mutual Information Definition A.9 (Mutual Information) . The mutual information between X and Y is defined as: I ( X : Y ) = H ( X ) − H ( X | Y ) = H ( Y ) − H ( Y | X ) . The mutual information between X and Y conditioned on Z is defined as: I ( X : Y | Z ) = H ( X | Z ) − H ( X | YZ ) = H ( Y | Z ) − H ( Y | XZ ) . Fact A.10. We have ≤ I ( X : Y | Z ) ≤ H ( X ) . Fact A.11 (Chain Rule for Mutual Information) . If A , B , C , D are random variables, then I ( AB : C | D ) = I ( A : C | D ) + I ( B : C | AD ) . The following lemmas are standard. Lemma A.12. For random variables A , B , C , D , if A is independent of D given C , then, I ( A : B | C ) ≤ I ( A : B | C , D ) . roof. Since A and D are independent conditioned on C , by Lemma A.4, H ( A | C ) = H ( A | C , D )and H ( A | B , C , D ) ≤ H ( A | B , C ). We have, I ( A : B | C ) = H ( A | C ) − H ( A | B , C )= H ( A | C , D ) − H ( A | B , C ) ≤ H ( A | C , D ) − H ( A | B , C , D ) = I ( A : B | C , D ) . Lemma A.13. For random variables A , B , C , D , if A is independent of D given B , C , then, I ( A : B | C , D ) ≤ I ( A : B | C ) . Proof. Since A and D are independent conditioned on B , C , by Lemma A.4, H ( A | B , C ) = H ( A | B , C , D ). Moreover, since conditioning can only reduce the entropy (again by Lemma A.4), I ( A : B | C , D ) = H ( A | C , D ) − H ( A | B , C , D )= H ( A | C , D ) − H ( A | B , C ) ≤ H ( A | C ) − H ( A | B , C ) = I ( A : B | C ) . B Missing Proofs in Section 5 In this section we provide the missing proofs in Section 5. B.1 Missing Proofs in Section 5.1 Reminder of Lemma 5.4. For all k ∈ [ p + 1] , we have: Pr (cid:12)(cid:12)(cid:12) N k ( s ) (cid:12)(cid:12)(cid:12) ≤ ∆ k · − k (log n ) p !! ≤ kn . Proof. Proof by induction on k . The base case k = 1 follows from Observation 5.3. We show theresult for k > k − 1. Letting z k = ∆ k · (cid:18) − k (log n ) p (cid:19) for convenience,we have:Pr (cid:16) | N k ( s ) | ≤ z k (cid:17) ≤ Pr (cid:16) | N k − ( s ) | ≤ z k − (cid:17) + Pr (cid:16) | N k ( s ) | ≤ z k (cid:12)(cid:12)(cid:12) | N k − ( s ) | > z k − (cid:17) ≤ k − n + Pr (cid:16) | N k ( s ) | ≤ z k (cid:12)(cid:12)(cid:12) | N k − ( s ) | > z k − (cid:17) . (Induction Hypothesis)34hus, it is sufficient to show that the second term is at most n . To show this we fix an set S such that | S | > z k − and show that:Pr (cid:16) | N k ( s ) | ≤ z k (cid:12)(cid:12)(cid:12) N k − ( s ) = S (cid:17) ≤ n . (8)To see why this holds, first note that conditioned on N k − ( s ) = S , the set N k ( s ) is just the set ofvertices in later V k +1 that can be reached from vertices in S (which itself is a subset of V k ). Thus,we have: Pr (cid:16) | N k ( s ) | ≤ z k (cid:12)(cid:12)(cid:12) N k − ( s ) = S (cid:17) ≤ Pr (cid:16) |{ v ∈ V k +1 | ∃ u ∈ S s . t . ( u, v ) ∈ E k }| ≤ z k (cid:12)(cid:12)(cid:12) N k − ( s ) = S (cid:17) = Pr( |{ v ∈ V k +1 | ∃ u ∈ S s . t . ( u, v ) ∈ E k }| ≤ z k ) , (9)where the last step is because N k − ( s ) = S is determined by E − k , which is independent of E k .We now want to upper bound the probability of an event defined by E k but instead of analyzingit directly, we first define two auxiliary random variables E ′ k and E ′′ k . The values taken by therandom variables E ′ k and E ′′ k are just a set of edges between V k and V k +1 . Let z denote z =∆ ′ · (cid:18) − n ) p (cid:19) . In the random variable E ′ k , each edge ( u, v ) for u ∈ V k , v ∈ V k +1 is includedindependently with probability z . In the random variable E ′′ k , we first sample edges as in E ′ k andthen, if the number of edges coming out of any vertex u ∈ V k is d ′ u ≤ ∆, we make it equal to ∆by sample ∆ − d ′ u edges uniformly at random (and do nothing if d ′ u > ∆). Denoting by dist ( X ) thedistribution of the random variable X , note first that dist (cid:0) E ′′ k | ∀ u ∈ V k s . t . d ′ u ≤ ∆ (cid:1) = dist ( E k ) = ⇒ k dist (cid:0) E ′′ k (cid:1) − dist ( E k ) k TV ≤ Pr E ′ k (cid:0) ∃ u ∈ V k s . t . d ′ u > ∆ (cid:1) . (10)Plugging Equation 10 into Equation 9 and noting that E ′ k samples at most as many edges as E ′′ k ,we have:Pr (cid:16) | N k ( s ) | ≤ z k (cid:12)(cid:12)(cid:12) N k − ( s ) = S (cid:17) ≤ Pr E ′ k (cid:0)(cid:12)(cid:12)(cid:8) v ∈ V k +1 | ∃ u ∈ S s . t . ( u, v ) ∈ E ′ k (cid:9)(cid:12)(cid:12) ≤ z k (cid:1) + Pr E ′ k (cid:0) ∃ u ∈ V k s . t . d ′ u > ∆ (cid:1) ≤ Pr E ′ k (cid:0)(cid:12)(cid:12)(cid:8) v ∈ V k +1 | ∃ u ∈ S s . t . ( u, v ) ∈ E ′ k (cid:9)(cid:12)(cid:12) ≤ z k (cid:1) + X u ∈ V k Pr E ′ k (cid:0)(cid:12)(cid:12)(cid:8) v ∈ V k +1 | ( u, v ) ∈ E ′ k (cid:9)(cid:12)(cid:12) > ∆ (cid:1) . (11)Now, for v ∈ V k +1 , define the indicator random variable X v to be 1 if and only if ∃ u ∈ S : ( u, v ) ∈ E ′ k .Also, for u ∈ V k , define the indicator random variable Y u,v to be 1 if and only if ( u, v ) ∈ E ′ k . Clearly,the random variable X v are mutually independent for all v ∈ V k +1 and so are the random variables Y u,v for u ∈ V k , v ∈ V k +1 . Moreover, we have, for all u ∈ V k and v ∈ V k +1 that: E [ X v ] = 1 − (1 − z ) | S | ≥ z · z k − · − . n ) p ! and E [ Y u,v ] = z, (12)35s 1 − x ≤ e − x ≤ − x + x , ∀ x ∈ (0 , / 10) and z k − < | S | ≤ ∆ k − by Observation 5.3. We nowcontinue Equation 11 using a Chernoff bound (Lemma 3.2).Pr (cid:16) | N k ( s ) | ≤ z k (cid:12)(cid:12)(cid:12) N k − ( s ) = S (cid:17) ≤ Pr E ′ k X v ∈ V k +1 X v ≤ z k + X u ∈ V k Pr E ′ k X v ∈ V k +1 Y u,v > ∆ ≤ exp (cid:16) − . n ) p · ∆10 (cid:17) + X u ∈ V k exp (cid:16) − ∆10 · n ) p (cid:17) (Lemma 3.2) ≤ n , as required for Equation 8. Reminder of Lemma 5.6. For all v ∈ V \ ( V ∪ V p +2 ) and any event E , we have: H ( N ( v ) | E ) ≤ n · h (cid:0) ∆ ′ (cid:1) · n ) p ! and H ( N ( v )) ≥ n · h (cid:0) ∆ ′ (cid:1) · − n ) p ! . Proof. As v / ∈ V ∪ V p +2 , we get from Lemma 3.6 that: H ( N ( v ) | E ) ≤ log (cid:18) n ∆ (cid:19) ≤ log n + n · h (cid:0) ∆ ′ (cid:1) ≤ n · h (cid:0) ∆ ′ (cid:1) · n ) p ! . For the furthermore part, note that Lemma 3.6 also says that: H ( N ( v ) | E ) = log (cid:18) n ∆ (cid:19) ≥ − n + n · h (cid:0) ∆ ′ (cid:1) ≥ n · h (cid:0) ∆ ′ (cid:1) · − n ) p ! . Reminder of Corollary 5.7. For all < k ≤ p + 1 and any event E , we have: H ( E k | E ) ≤ n · h (cid:0) ∆ ′ (cid:1) · n ) p ! and H ( E k ) ≥ n · h (cid:0) ∆ ′ (cid:1) · − n ) p ! . roof. From Lemma A.3 and Lemma A.4, we have that: H ( E k | E ) ≤ X v ∈ V k H ( N ( v ) | E ) ≤ n · h (cid:0) ∆ ′ (cid:1) · n ) p ! . (Lemma 5.6)From Lemma A.3 and independence of N ( v ) for all v ∈ V k , we have that: H ( E k ) = X v ∈ V k H ( N ( v )) ≥ n · h (cid:0) ∆ ′ (cid:1) · − n ) p ! . (Lemma 5.6) Reminder of Lemma 5.8. We have H ( N ( s )) = log n and, for all < k ≤ p + 1 : H (cid:16) N k ( s ) (cid:17) ≥ n · h (cid:0) ∆ ′ k (cid:1) · − n ) p ! . Proof. That H ( N ( s )) = log n follows from Observation 5.3. For the rest, fix k > z k =∆ k · (cid:18) − n ) p (cid:19) . From Lemma A.4, we can conclude that H (cid:0) N k ( s ) (cid:1) ≥ H (cid:16) N k ( s ) (cid:12)(cid:12)(cid:12) | N k ( s ) | (cid:17) .By Definition A.2, this implies: H (cid:16) N k ( s ) (cid:17) ≥ X z ≥ z k Pr (cid:16) | N k ( s ) | = z (cid:17) · H (cid:16) N k ( s ) (cid:12)(cid:12)(cid:12) | N k ( s ) | = z (cid:17) . By symmetry, conditioned (cid:12)(cid:12) N k ( s ) (cid:12)(cid:12) = z , N k ( s ) is just a uniformly random subset of size z . Thus,we have from Lemma A.5 that: H (cid:16) N k ( s ) (cid:17) ≥ X z ≥ z k Pr (cid:16) | N k ( s ) | = z (cid:17) · log (cid:18) nz (cid:19) ≥ X z ≥ z k Pr (cid:16) | N k ( s ) | = z (cid:17) · ( n · h ( z/n ) − n ) (Lemma 3.6) ≥ ( n · h ( z k /n ) − n ) · X z ≥ z k Pr (cid:16) | N k ( s ) | = z (cid:17) (As h ( · ) is increasing on 0 < x < ) ≥ n · h ∆ ′ k · − n ) p !! − n ! · X z ≥ z k Pr (cid:16) | N k ( s ) | = z (cid:17) . (Definition of z k )As k > 1, the first factor is non-negative. Bounding the second by Corollary 5.5, we have: H (cid:16) N k ( s ) (cid:17) ≥ n · h ∆ ′ k · − n ) p !! − n ! · (cid:18) − n (cid:19) ≥ n · − n ) p ! · h (cid:0) ∆ ′ k (cid:1) − n ! · (cid:18) − n (cid:19) (Concavity of h ( · ))37 n · h (cid:0) ∆ ′ k (cid:1) · − n ) p ! . (As k > Reminder of Lemma 5.9. For all events E , we have H ( P ( s ) | E ) ≤ log n and, for all < k ≤ p + 1 : H (cid:16) P k ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) ≤ ∆ ′ k − · n ) p ! · H ( E k ) . Proof. Proof by induction. The base case k = 1 follows from Observation 5.3. We show the resultfor k > k − 1. As P k ( s ) is determined by P k − ( s ) and P ( N k − ( s )), we have: H (cid:16) P k ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) ≤ H (cid:16) P k − ( s ) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) E (cid:17) ≤ H (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) + H (cid:16) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) P k − ( s ) , E (cid:17) (Lemma A.3) ≤ H (cid:16) P k − ( s ) | E (cid:17) + X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) · H (cid:16) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) P k − ( s ) , E (cid:17) (Definition A.2) ≤ H (cid:16) P k − ( s ) | E (cid:17) + X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) · H (cid:16) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) P k − ( s ) , E (cid:17) . (As P k − ( s ) determines N k − ( s ))To continue, we again use Lemma A.3 followed by Lemma A.4. H (cid:16) P k ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) ≤ H (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) + X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) · X v ∈ N k − ( s ) H (cid:16) P ( v ) (cid:12)(cid:12)(cid:12) P k − ( s ) , E (cid:17) ≤ H (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) + X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) · | N k − ( s ) | · n h (cid:0) ∆ ′ (cid:1) n ) p ! (Lemma 5.6) ≤ H (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) + X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) · ∆ k − · n h (cid:0) ∆ ′ (cid:1) n ) p ! (Observation 5.3) ≤ H (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) + ∆ k − · n h (cid:0) ∆ ′ (cid:1) n ) p ! ≤ H (cid:16) P k − ( s ) (cid:12)(cid:12)(cid:12) E (cid:17) + ∆ ′ k − · H ( E k ) · n ) p ! . (Corollary 5.7)Finally, we bound the term H (cid:0) P k − ( s ) | E (cid:1) using the induction hypothesis. When k = 2,this term is at most log n ≤ H ( E k ) n · (log n ) p = ∆ ′ k − (log n ) p · H ( E k ). Otherwise, when k > 2, we have38 ( E k − ) = H ( E k ) and this term is at most ∆ ′ k − · (cid:18) n ) p (cid:19) · H ( E k ) ≤ ∆ ′ k − (log n ) p · H ( E k ).Plugging in, we have: H (cid:16) P k ( s ) (cid:17) ≤ ∆ ′ k − · n ) p ! · H ( E k ) . Reminder of Lemma 5.10. We have H ( P ( s )) = log n and, for all < k ≤ p + 1 : H (cid:16) P k ( s ) (cid:17) ≥ ∆ ′ k − · − n ) p ! · H ( E k ) . Proof. The case k = 1 follows from Observation 5.3. We show the result for k > 1. As P k ( s )determines P ( N k − ( s )), we have: H (cid:16) P k ( s ) (cid:17) ≥ H (cid:16) P ( N k − ( s )) (cid:17) ≥ H (cid:16) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) P k − ( s ) (cid:17) (Lemma A.4) ≥ X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:17) · H (cid:16) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) P k − ( s ) (cid:17) (Definition A.2) ≥ X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:17) · H (cid:16) P ( N k − ( s )) (cid:12)(cid:12)(cid:12) P k − ( s ) (cid:17) . (As P k − ( s ) determines N k − ( s ))Note that P ( N k − ( s )) is determined by E k and P k − ( s ) is determined by E − k . As these are inde-pendent, we have: H (cid:16) P k ( s ) (cid:17) ≥ X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:17) · H (cid:16) P ( N k − ( s )) (cid:17) . As P ( v ) are mutually independent for all v ∈ N k − ( s ), we have: H (cid:16) P k ( s ) (cid:17) ≥ X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:17) · X v ∈ N k − ( s ) H ( P ( v )) ≥ X P k − ( s ) Pr (cid:16) P k − ( s ) (cid:17) · (cid:12)(cid:12)(cid:12) N k − ( s ) (cid:12)(cid:12)(cid:12) · H ( E k ) n . Using Lemma 5.4, we get: H (cid:16) P k ( s ) (cid:17) ≥ ∆ k − · − n ) p ! · H ( E k ) n ≥ ∆ ′ k − · − n ) p ! · H ( E k ) . eminder of Lemma 5.11. For all events E , it holds that: − H ∞ ( P p +1 ( s ) | E ) ≤ − H (cid:0) P p +1 ( s ) | E (cid:1) − ′ p · (cid:18) n ) p (cid:19) · H ( E p +1 ) . Proof. Let Ω denote the support of P p +1 ( s ) and note that Lemma 5.9 implies that log( | Ω | ) ≤ ∆ ′ p · (1 + ǫ p ) · H ( E p +1 ). Applying Lemma A.8 on the random variable P p +1 ( s ) | E , we have:2 − H ∞ ( P p +1 ( s ) | E ) ≤ − H (cid:0) P p +1 ( s ) | E (cid:1) − | Ω | ) ≤ − H (cid:0) P p +1 ( s ) | E (cid:1) − ′ p · (1 + ǫ p ) · H ( E p +1 ) . B.2 Missing Proofs in Section 5.2 B.2.1 Proof of Lemma 5.14Reminder of Lemma 5.14. For all < k ≤ p + 1 , assuming Lemma 5.13 holds for k − , thereexists a set B ⊆ supp (cid:0) M ≤ ( k − p (cid:1) such that Pr (cid:0) M ≤ ( k − p ∈ B (cid:1) ≤ √ ε k − and for all M ≤ ( k − p / ∈ B ,we have: H (cid:16) N k − ( s ) | M ≤ ( k − p (cid:17) ≥ H (cid:16) N k − ( s ) (cid:17) − · √ ε k − · ∆ ′ k − · H ( E k − ) , if k > H (cid:16) N k − ( s ) | M ≤ ( k − p (cid:17) = H (cid:16) N k − ( s ) (cid:17) = log n, if k = 2 . Proof. If k = 2, then we have from the assumption that Lemma 5.13 holds for k − H ( P ( s ) | M ≤ t ′ ) = H ( P ( s )) = log n . Thus, we have I ( N ( s ) : M ≤ t ′ ) ≤ I ( P ( s ) : M ≤ t ′ ) = 0 and the result follows.If k > 2, we have from the assumption that Lemma 5.13 holds for k − H (cid:0) P k − ( s ) | M ≤ t ′ (cid:1) ≥ (1 − ǫ k − ) · ∆ ′ k − · H ( E k − ). Combining with Lemma 5.9, we have that: I (cid:16) N k − ( s ) : M ≤ t ′ (cid:17) ≤ I (cid:16) P k − ( s ) : M ≤ t ′ (cid:17) ≤ ǫ k − · ∆ ′ k − · H ( E k − ) . Using Definition A.9 and Definition A.2, we get that: E M ≤ t ′ h H (cid:16) N k − ( s ) (cid:17) − H (cid:16) N k − ( s ) | M ≤ t ′ (cid:17)i ≤ ǫ k − · ∆ ′ k − · H ( E k − ) . (13)We claim that: 40 laim B.1. It holds for all M ≤ t ′ that: H (cid:16) N k − ( s ) (cid:17) − H (cid:16) N k − ( s ) | M ≤ t ′ (cid:17) ≥ − ǫ k − · ∆ ′ k − · H ( E k − ) . Proof. We derive: H (cid:16) N k − ( s ) (cid:12)(cid:12)(cid:12) M ≤ t ′ (cid:17) ≤ H (cid:16) | N k − ( s ) | (cid:12)(cid:12)(cid:12) M ≤ t ′ (cid:17) + H (cid:16) N k − ( s ) (cid:12)(cid:12)(cid:12) | N k − ( s ) | , M ≤ t ′ (cid:17) (Lemma A.3) ≤ log n + log (cid:18) n ∆ k − (cid:19) (Lemma A.5 and (cid:12)(cid:12) N k − ( s ) (cid:12)(cid:12) ≤ ∆ k − by Observation 5.3) ≤ n + n · h (cid:0) ∆ ′ k − (cid:1) (Lemma 3.6) ≤ H (cid:16) N k − ( s ) (cid:17) · (1 + 2 ǫ k − ) (Lemma 5.8) ≤ H (cid:16) N k − ( s ) (cid:17) + H (cid:16) P k − ( s ) (cid:17) · ǫ k − ( H (cid:0) N k − ( s ) (cid:1) ≤ H (cid:0) P k − ( s ) (cid:1) ) ≤ H (cid:16) N k − ( s ) (cid:17) + 4 ǫ k − · ∆ ′ k − · H ( E k − ) . (Lemma 5.9)Conclude from Equation 13 that: E M ≤ t ′ h H (cid:16) N k − ( s ) (cid:17) − H (cid:16) N k − ( s ) (cid:12)(cid:12)(cid:12) M ≤ t ′ (cid:17) + 4 ǫ k − · ∆ ′ k − · H ( E k − ) i ≤ ǫ k − · ∆ ′ k − · H ( E k − ) . As the left hand side above is always non-negative by Claim B.1, we can apply Markov’s inequalityto conclude that:Pr M ≤ t ′ (cid:16) H (cid:16) N k − ( s ) (cid:17) − H (cid:16) N k − ( s ) (cid:12)(cid:12)(cid:12) M ≤ t ′ (cid:17) ≥ · √ ǫ k − · ∆ ′ k − · H ( E k − ) (cid:17) ≤ √ ǫ k − . B.2.2 Proof of Lemma 5.15Reminder of Lemma 5.15. For all < k ≤ p + 1 , assuming Lemma 5.13 holds for k − , andlet B be the set promised by Lemma 5.14. For all M ≤ t ′ / ∈ B , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:26) i ∈ [ n ] | Pr (cid:16) v ( i ) k ∈ N k − ( s ) (cid:12)(cid:12)(cid:12) M ≤ t ′ (cid:17) ≤ ∆ ′ k − ε k (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε k · n. Proof. Fix M ≤ t ′ / ∈ B and let S = (cid:26) i ∈ [ n ] (cid:12)(cid:12)(cid:12) Pr (cid:16) v ( i ) k ∈ N k − ( s ) | M ≤ t ′ (cid:17) ≤ ∆ ′ k − ε k (cid:27) for convenience. For k = 2, our definitions imply S = ∅ , so we assume k > | S | > ǫ k · n . For i ∈ [ n ], define the indicator random variable X i to be41 if and only if v ( i ) k ∈ N k − ( s ) and define α i = Pr (cid:16) v ( i ) k ∈ N k − ( s ) | M ≤ t ′ (cid:17) = Pr( X i = 1 | M ≤ t ′ ). ByLemma 5.14, we have: H ( X · · · X n | M ≤ t ′ ) ≥ H (cid:16) N k − ( s ) | M ≤ t ′ (cid:17) ≥ H (cid:16) N k − ( s ) (cid:17) − · √ ǫ k − · ∆ ′ k − · H ( E k − ) ≥ n · h (cid:0) ∆ ′ k − (cid:1) · (1 − ǫ k − ) − · √ ǫ k − · ∆ ′ k − · H ( E k − ) . (Lemma 5.8)By Lemma A.3 and Lemma A.4, we also have H ( X · · · X n | M ≤ t ′ ) ≤ P ni =1 h ( α i ) implying that: n · h (cid:0) ∆ ′ k − (cid:1) · (1 − ǫ k − ) − · √ ǫ k − · ∆ ′ k − · H ( E k − ) ≤ n X i =1 h ( α i ) . (14)Using the notation ∆ ′′ k − = ∆ ′ k − ǫ k , we use the following claim to upper bound P ni =1 h ( α i ). Claim B.2. It holds that: n X i =1 h ( α i ) ≤ ǫ k · n · h (cid:0) ∆ ′′ k − (cid:1) + n · (cid:0) − ǫ k (cid:1) · h (cid:18) ∆ ′′ k − − ǫ k (cid:19) . Proof. We break the proof into two cases. The easy case is when P ni =1 α i ≤ n ∆ ′′ k − . In this case,we simply use the concavity and monotonicity of h ( · ) on 0 < x < to get: n X i =1 h ( α i ) ≤ n · h (cid:0) ∆ ′′ k − (cid:1) ≤ ǫ k · n · h (cid:0) ∆ ′′ k − (cid:1) + n · (cid:0) − ǫ k (cid:1) · h (cid:18) ∆ ′′ k − − ǫ k (cid:19) , and the claim follows. We now deal with the hard case P ni =1 α i > n ∆ ′′ k − . In this case, by thedefinition of S , there exists a λ ∈ [0 , 1] satisfying P i ∈ S α i + λ · P i/ ∈ S α i | S | + λ · | S | = ∆ ′′ k − . This is equivalent to: n X i =1 α i − ∆ ′′ k − · | S | = (1 − λ ) · X i/ ∈ S α i + λ · ∆ ′′ k − · (cid:12)(cid:12) S (cid:12)(cid:12) . (15)Using the concavity of h multiple times, we have: n X i =1 h ( α i ) ≤ X i ∈ S h ( α i ) + λ · X i/ ∈ S h ( α i ) + (1 − λ ) · X i/ ∈ S h ( α i ) ≤ (cid:0) | S | + λ · (cid:12)(cid:12) S (cid:12)(cid:12)(cid:1) · h (cid:0) ∆ ′′ k − (cid:1) + (1 − λ ) · (cid:12)(cid:12) S (cid:12)(cid:12) · h P i/ ∈ S α i (cid:12)(cid:12) S (cid:12)(cid:12) ! ≤ | S | · h (cid:0) ∆ ′′ k − (cid:1) + (cid:12)(cid:12) S (cid:12)(cid:12) · h P ni =1 α i − ∆ ′′ k − · | S | (cid:12)(cid:12) S (cid:12)(cid:12) ! (Equation 15) ≤ ǫ k · n · h (cid:0) ∆ ′′ k − (cid:1) + n · (cid:0) − ǫ k (cid:1) · h P ni =1 α i − n · ǫ k · ∆ ′′ k − n · (cid:0) − ǫ k (cid:1) ! . (As | S | > ǫ k · n )42o continue, note by Observation 5.3 that P ni =1 α i = E (cid:2)(cid:12)(cid:12) N k − ( s ) (cid:12)(cid:12) | M ≤ t ′ (cid:3) ≤ ∆ k − . This gives: n X i =1 h ( α i ) ≤ ǫ k · n · h (cid:0) ∆ ′′ k − (cid:1) + n · (cid:0) − ǫ k (cid:1) · h (cid:18) ∆ ′′ k − − ǫ k (cid:19) . Combining Claim B.2 and Equation 14 and rearranging, we get: h (cid:0) ∆ ′ k − (cid:1) − ǫ k · h (cid:0) ∆ ′′ k − (cid:1) − (cid:0) − ǫ k (cid:1) · h (cid:18) ∆ ′′ k − − ǫ k (cid:19) ≤ ǫ k − · h (cid:0) ∆ ′ k − (cid:1) + 10 n · √ ǫ k − · ∆ ′ k − · H ( E k − ) . (16)To derive a contradiction, we show that Equation 16 cannot hold. For this, we first lower boundthe left hand side. Recall that h ( x ) = − x log( x ) − (1 − x ) log(1 − x ) and observe that both theseterms are concave. Thus, we can lower bound: h (cid:0) ∆ ′ k − (cid:1) − ǫ k · h (cid:0) ∆ ′′ k − (cid:1) − (cid:0) − ǫ k (cid:1) · h (cid:18) ∆ ′′ k − − ǫ k (cid:19) ≥ ∆ ′ k − · log 1∆ ′ k − − ǫ k · ∆ ′′ k − · log 1∆ ′′ k − − ∆ ′′ k − · log 1 − ǫ k ∆ ′′ k − ≥ ∆ ′′ k − · (cid:18) ǫ k · log 11 + ǫ k + log 11 − ǫ k (cid:19) (As ∆ ′′ k − = ∆ ′ k − ǫ k ) ≥ log e · ∆ ′′ k − · (cid:18) ǫ k ǫ k (cid:19) . (As log(1 + x ) ≤ log e · ( x − x / x / 3) and log − x ≥ log e · ( x + x / h (cid:0) ∆ ′ k − (cid:1) − ǫ k · h (cid:0) ∆ ′′ k − (cid:1) − (cid:0) − ǫ k (cid:1) · h (cid:18) ∆ ′′ k − − ǫ k (cid:19) ≥ ∆ ′ k − · ǫ k . (17)We now upper bound the right hand side of Equation 16. ǫ k − · h (cid:0) ∆ ′ k − (cid:1) + 10 n · √ ǫ k − · ∆ ′ k − · H ( E k − ) ≤ ǫ k · h (cid:0) ∆ ′ k − (cid:1) + 10 n · ǫ k · ∆ ′ k − · H ( E k ) (Definition of ǫ k and k > ≤ ǫ k · h (cid:0) ∆ ′ k − (cid:1) + ǫ k · ∆ k − · h (cid:0) ∆ ′ (cid:1) . (Corollary 5.7)Now, note that, for n < x < , we have h ( x ) ≤ − x log x ≤ x log n . We get: ǫ k − · h (cid:0) ∆ ′ k − (cid:1) + 10 n · √ ǫ k − · ∆ ′ k − · H ( E k − ) ≤ n · ǫ k · ∆ ′ k − . (18)Equation 16, Equation 17, and Equation 18 cannot all hold together, a contradiction.43 eminder of Lemma 5.16. There exists a set B ∗ ⊆ supp( M ≤ T ) such that Pr( M ≤ T ∈ B ∗ ) ≤√ ε p +1 and for all M ≤ T / ∈ B ∗ , we have: H (cid:0) P p +1 ( s ) | M ≤ T (cid:1) ≥ H (cid:0) P p +1 ( s ) (cid:1) − · √ ε p +1 · ∆ ′ p · H ( E p +1 ) . Proof. As H (cid:0) P p +1 ( s ) | M ≤ T (cid:1) ≥ (1 − ε p +1 ) · ∆ ′ p · H ( E p +1 ), we can conclude from Lemma 5.9 that: I (cid:0) P p +1 ( s ) : M ≤ T (cid:1) ≤ · ε p +1 · ∆ ′ p · H ( E p +1 ) . Using Definition A.9 and Definition A.2, we get that: E M ≤ T (cid:2) H (cid:0) P p +1 ( s ) (cid:1) − H (cid:0) P p +1 ( s ) | M ≤ T (cid:1)(cid:3) ≤ · ε p +1 · ∆ ′ p · H ( E p +1 ) . (19)Using Lemma 5.9 and Lemma 5.10, we get that, for all M ≤ T : H (cid:0) P p +1 ( s ) (cid:1) − H (cid:0) P p +1 ( s ) | M ≤ T (cid:1) ≥ − · ε p +1 · ∆ ′ p · H ( E p +1 ) . (20)Conclude from Equation 19 that: E M ≤ T (cid:2) H (cid:0) P p +1 ( s ) (cid:1) − H (cid:0) P p +1 ( s ) | M ≤ T (cid:1) + 2 · ε p +1 · ∆ ′ p · H ( E p +1 ) (cid:3) ≤ · ε p +1 · ∆ ′ p · H ( E p +1 ) . As the left hand side above is always non-negative by Equation 20, we can apply Markov’s inequalityto conclude that:Pr M ≤ T (cid:2) H (cid:0) P p +1 ( s ) (cid:1) − H (cid:0) P p +1 ( s ) | M ≤ T (cid:1) ≥ · √ ε p +1 · ∆ ′ p · H ( E p +1 ) (cid:3) ≤ √ ε p +1 . The lemma follows. C Lower Bounds against Starting Vertex Oblivious Streaming Al-gorithms In this section we prove Theorem 1.3 (restated below). See also Definition 3.1 for a formal definitionof starting vertex oblivious streaming algorithm for simulating random walks. Reminder of Theorem 1.3. Let n ≥ be a sufficiently large integer and let L be a integersatisfying that L ∈ [log n, n ] . Any randomized algorithm that is oblivious to the start vertex and given an n -vertex directed graph G = ( V, E ) and a starting vertex u start ∈ V , samples from adistribution D such that kD − RW GL ( u start ) k TV ≤ − n requires e Ω( n · √ L ) space. The following inequality will be useful for the proof. Lemma C.1 (Fano’s inequality (see, e . g ., [CT06, Page 38])) . Let Z and Z ′ be two jointly distributed andom variable over the same set Z , it holds that Pr[ Z = Z ′ ] ≥ H ( Z | Z ′ ) − |Z| and Pr[ Z = Z ′ ] ≤ log |Z| − H ( Z | Z ′ ) + 1log |Z| . We will also need the following variant of the standard INDEX problem. Definition C.2 (Multi-output generalization of INDEX ) . In the INDEX m,ℓ problem, Alice gets ℓ strings X , . . . , X ℓ ∈ { , } m and Bob gets an index i ∈ [ ℓ ] . Alice sends a message to Bob and thenBob is required to output the string X i . The lower bound for INDEX m,ℓ below will be crucial for our proof of Theorem 1.3. Lemma C.3 (One-way communication lower bound for INDEX m,ℓ ) . Let D m,ℓ INDEX be the input distri-bution that Alice gets ℓ independent random strings each is uniformly distributed over { , } m andBob gets a uniformly random index from [ ℓ ] that is independent of Alice’s input. Solving INDEX m,ℓ over D INDEX with success probability at least / log m requires Alice to send at least mℓ/ log m bits to Bob.Proof. Over the input distribution D m,ℓ INDEX , Alice gets ℓ strings X , . . . , X ℓ , all distributed uniformlyover { , } m . Bob gets a uniformly random index I from [ ℓ ]. By Yao’s minimax theorem, to provethe theorem it suffices to bound the success probability of all deterministic one-way commutationprotocols between Alice and Bob in which Alice sends at most mℓ/ log m bits. In the followingwe fix such a protocol.Let M = M ( X , . . . , X ℓ ) be the message sent from Alice to Bob. We need the following claim. Claim C.4. It holds that E i ∈ [ ℓ ] I ( X i : M ) ≤ m/ log m. Proof. For i ∈ [ ℓ ], let X
Let τ = √ L/ log n . For a string X ∈ { , } τ and ( i, j ) ∈ [ τ ] , we use X i,j to denote the (( i − τ + j )-th bit in X . We will also need the following construction of gadgetgraphs. Gadget Construction H τ ( X ) • Setup: Given a string X ∈ { , } τ . • Vertices: We construct a layered graph G with 2 layers V , V satisfying | V | = τ and | V | = τ + 1. For convenience, we always use V i,j to denote the j -th vertex in the layer V i .We will call V ,τ +1 as the starting vertex of H τ ( X ). • Edges: For every ( i, j ) ∈ [ τ ] , we first add an edge from V ,j to V ,i , and then alsoadd an edge from V ,i to V ,j if X i,j = 1. For every i ∈ [ τ ], we add an edge from V ,i to V ,τ +1 and another edge from V ,τ +1 back to V ,i . • Edge ordering: The edges are given in the lexicographically order. (Note that theedge ordering is not important for the lower bound, and we specify it only for concrete-ness.)Now, let m = τ and ℓ = n/ (2 τ + 1). Suppose there is a starting vertex oblivious streamingalgorithm A for simulating L -step random walks with space complexity n · √ L/ log n and statis-tical distance at most 1 − / log n , we show it implies a one-way communication protocol solving INDEX m,ℓ over D m,ℓ INDEX that contradicts Lemma C.3.Let P and S be the preprocessing subroutine and the sampling subroutine of A , respectively.The protocol is described as follows: Protocol Π for INDEX m,ℓ 1. Given input strings X , . . . , X ℓ ∈ { , } τ , Alice generates a graph G = F ℓi =1 H τ ( X i ).That is, G is an n -vertex graphs consists of ℓ clusters with the i -th cluster being H τ ( X i ).From now on, we will use H τ ( X i ) to denote the corresponding subgraph of G .2. Alice then simulates the preprocessing subroutine P on the graph G , and sends itsoutput to Bob.3. Bob gets an index i ∈ [ ℓ ] and sets the starting vertex in H τ ( X i ) to be u start . Bob then46imulates S with Alice’s message and u start as the input to obtain a walk W .4. Given an L -step walk W starting from the starting vertex of H τ ( X i ), we define a string X rec ( W ) ∈ { , } τ as follows: let V , V be the two layers in H τ ( X i ), X rec ( W ) i,j = 1 ifand only if W passes an edge from V ,i to V ,j .5. Bob outputs X rec ( W ).We first show that with high probability, an L -step random walk starting from the startingvertex in H τ ( X i ) determines the string X i . Formally, The following claim captures what we need. Claim C.5. For every X ∈ { , } τ , letting u start be the starting vertex of H τ ( X ) , it holds that Pr W ∼ RW Hτ ( X ) L ( u start ) [ X rec ( W ) = X ] ≥ − exp( − log n ) . Proof. We will first bound the probability that X rec ( W ) i,j = X i,j for each ( i, j ) ∈ [ τ ] and thenapply a union bound. Fix ( i, j ) ∈ [ τ ] , if X i,j = 0, since there is no edge from V ,i to V ,j , clearly X rec ( W ) i,j is always 0 as well. Hence we only need to consider the case that X i,j = 1.In this case, one can observe that for every two steps, the random walk visits the edge between V ,i to V ,j with probability at least 1 / ( τ + 1) . And moreover, all these events are independent.Hence, a random walk with L = τ · log n steps visits the edge between V ,i to V ,j withprobability 1 − (1 − / ( τ + 1) ) L/ ≥ − exp( − Ω(1 /τ · L )) ≥ − exp( − Ω(log n )) . The claim then follows from a union bound.Finally, since our streaming algorithm A has space complexity n · √ L/ log n < mℓ/ log m and sampling error at most 1 − / log n . Protocol Π also has communication complexity less than mℓ/ log m , and success probability at least 1 / log n − exp( − log nn