Local Access to Random Walks
LLocal Access to Random Walks
Amartya Shankha Biswas
CSAIL, MIT, Cambridge MA, [email protected]
Edward Pyne
Harvard University, Cambridge MA, [email protected]
Ronitt Rubinfeld
CSAIL, MIT, Cambridge MA, [email protected]
Abstract
For a graph G on n vertices, naively sampling the position of a random walk of at time t requireswork Ω( t ). We desire local access algorithms supporting position ( G, s, t ) queries, which returnthe position of a random walk from some start vertex s at time t , where the joint distribution ofreturned positions is 1 / poly( n ) close to the uniform distribution over such walks in ℓ distance.We first give an algorithm for local access to walks on undirected regular graphs with e O ( − λ √ n )runtime per query, where λ is the second-largest eigenvalue in absolute value. Since random d -regulargraphs are expanders with high probability, this gives an e O ( √ n ) algorithm for G ( n, d ), whichimproves on the naive method for small numbers of queries.We then prove that no that algorithm with subconstant error given probe access to random d -regular graphs can have runtime better than Ω( √ n/ log( n )) per query in expectation, obtaining anearly matching lower bound. We further show an Ω( n / ) runtime per query lower bound evenwith an oblivious adversary (i.e. when the query sequence is fixed in advance).We then show that for families of graphs with additional group theoretic structure, dramaticallybetter results can be achieved. We give local access to walks on small-degree abelian Cayley graphs,including cycles and hypercubes, with runtime polylog( n ) per query. This also allows for efficientlocal access to walks on polylog degree expanders. We extend our results to graphs constructedusing the tensor product (giving local access to walks on degree n ϵ graphs for any ϵ ∈ (0 , Theory of computation → Streaming, sublinear and near lineartime algorithms
Keywords and phrases sublinear time algorithms, random generation, local computation
Funding
Amartya Shankha Biswas : Big George Ventures Fund, MIT-IBM Watson AI Lab andResearch Collaboration Agreement No. W1771646, NSF awards CCF-1733808 and IIS-1741137
Ronitt Rubinfeld : NSF awards CCF-2006664, CCF-1740751, IIS-1741137, Fintech@CSAIL. Part ofthis work was done while the author was participating in the program on Probability, Geometry,and Computation in High Dimensions at the Simons Institute for the Theory of Computing.
Acknowledgements
The second author thanks Andrew Lu for valuable comments on a draft of thispaper.
Given some huge random object that an algorithm would like to query, is it always necessaryto generate the entire object up front? For sublinear time algorithms, generating such alarge object would dominate the runtime. Recent works [3, 10, 15, 5] demonstrated this wasnot always necessary, giving incremental query access to random objects such as randomgraphs, Dyck paths and graph colorings. These local access algorithms answer queries in a a r X i v : . [ c s . D S ] F e b Local Access to Random Walks manner consistent with an instance of the random object sampled from the true distribution(or close to it).In this work we explore the question of implementing local access to random walks. Givena graph G on n vertices, taking a random walk of length t requires time Ω( t ). Random walksare a critical primitive in many algorithms [13, 12, 7], including sublinear ones [11, 16, 2].But since t can be large, one may want to generate only the required segments of the walkthat are needed at the present time, while ensuring the joint distribution of the returnedsegments is close to the true distribution of random walks.As is common in the setting of sublinear and local algorithms, we assume that we aregiven access to a graph G on n vertices through query oracles. This allows us to work withgraphs that are too large to fit in main memory, and also results in running times that arenot dominated by the size of the input. Our goal is to implement position ( G, s, t ) queries,which return the position of a random walk starting from vertex s at time t , such that, givena sequence of queries, the joint distribution of returned positions is 1 / poly( n )-close to thetrue uniform distribution of those positions over random walks (in ℓ distance). We desireper query runtime that is sublinear in n and t , and preferably polylogarithmic in both. Inthat case, locally generating all vertices in a walk of length t (in an arbitrary order) hastotal work within a polylog factor of the naive runtime.Obtaining efficient random access for arbitrary graphs without knowing the entire structureseems to be a very difficult problem, and therefore in this paper we restrict our attentionto regular graphs. However, regular graphs include widely studied families such as randomregular and Cayley graphs, both of which which we analyze. We begin by presenting a ˜ O ( − λ √ n ) algorithm that provides local access to undirected d regular graphs with spectral expansion λ . This algorithm maintains a collection of revealed time values and the associated positions of the random walk, where the revealed positionsare a superset of the queried positions, specifically the positions that were either querieddirectly, or were “determined” by the local access algorithm in order to answer a query. Thekey idea in this algorithm is to handle queries in three different ways, based on the queriedtime relative to all other revealed times. The algorithm maintains the invariant that alldetermined times are either directly adjacent or separated by at least twice the mixing time.Given query time t , if the new query time is further than twice the mixing time away fromany other revealed time, the algorithm simply returns a random vertex as the correspondingposition and appends t to the list of revealed locations. Alternately, if t is close to exactlyone revealed time (either smaller or greater) and far from the other one, then the algorithmsimply simulates the entire walk between t and the closer revealed time, which takes ˜ O ( − λ )steps. Finally, the most interesting case is when there are two revealed positions on eitherside that are close to t , but not too close. In this setting, the algorithm samples ˜ O ( √ n )random walks from both revealed locations, each of length half the interval, until a collision isfound. Since walks of this length are well mixed, a collision occurs with high probability. Thetwo colliding walks are then stitched together to interpolate the walk between the nearestrevealed locations, and then the corresponding position at time t can be returned.Moving forwards, we demonstrate that such a runtime is optimal in general. Specifically,our lower bound holds for the case of random d -regular graphs, which provides some evidencethat obtaining fast query algorithms for “large” classes is challenging. Our lower boundspresent adaptively chosen query sequences, and demonstrate that for the vast majority ofthese random graphs, any algorithm making ˜ O ( √ n/ log n ) random-neighbor and random- . S. Biswas, E. Pyne and R. Rubinfeld 3 vertex probes to the underlying graph G will fail to answer the queries in a consistent manner.The main structural result used here is Lemma 4.8 which states that as long as the algorithmmakes fewer than Θ( √ n ) probes, the revealed edges and vertices of the graph will form aforest, and additionally, no trees will ever be merged, with probability at least 0 . d ( · , · ) where d ( u, v ) is the distance between vertices u and v using only the revealed edges , and is defined to be ∞ if no such path has been revealed.The high level strategy in the lower bound is to first query the positions v , v e of the walkat time t = 0 and t = √ n respectively, and then adaptively query O (log n ) intermediatepositions (where the query times may depend on the internal state of the algorithm), until aninconsistency is found. The hypothesis at this point is that the algorithm does not actuallyknow of a path of the correct length between the two returned vertices. Specifically, we showthat either the revealed edges fail to connect the vertices in the limited number of availableprobes, or the known path between them is shorter than √ n/
20. In the first case, we canperform binary search for a location such that we end up with two reported positions whichare adjacent in time, but do not have an edge between them, thus yielding the inconsistency.The latter case is more complicated, and requires some case analysis, but we are able to queryadaptively and always find two positions v i and v j (revealed at times t i and t j ), such thatone of the following two outcomes hold: either the distance is too large d ( v i , v j ) > | t i − t j | or the distance is too small d ( v i , v j ) < | t i − t j | /
2. In the first outcome, if the distance isgreater, we can again perform binary search to find adjacent positions in the walk that arenot connected by an edge. For the second outcome, we again perform binary search to finda short segment with unusually short distance, and then query all intermediate locationsto find a segment of the walk σ , σ , · · · , σ l of length Θ(log n ), such that d ( σ , σ l ) < l/ O (log n ). The fact that this segment has much smaller distance than the timeinterval implies that there is a significant amount of backtracking, and we demonstrate thatthe probability of significant backtracking over a truly random walk is o (1).We also prove an oblivious lower bound of Ω( n / ), for the case when the queries do notdepend on the internal state of our algorithm. In this case, we present the sequence of querytimes ( n / , , , · · · , n / − O ( n / ) graph probes, then the totalnumber of probes is bounded above by Θ( √ n ), and therefore we use the same structuralLemma 4.8 mentioned above in order to derive a contradiction.Finally, motivated by the lack of efficient local access to walks on general classes ofgraphs, we turn to algorithms for local access on families of graphs with additional algebraicstructure. We give efficient local access to walks on small-degree abelian Cayley graphs (forinstance, cycles and hypercubes). This also allows for efficient local access to walks on a classof polylog degree expanders. We extend our results to graphs constructed using the tensorproduct (giving local access to walks on degree n ϵ graphs for any ϵ ∈ (0 , The problem of providing local access to huge random objects was first proposed in [10, 9].Subsequent work in [15] presented algorithms that provide access to sparse Erdos-Renyi G ( n, p ) graphs through All-Neighbors queries, as long as the number of queries is smalland p = O (poly(log n )). Many of the results in these earlier works only guarantee that thegenerated random objects appear to look random, as long as the number of queries arebounded, usually by O (poly(log n )). More recently, in [5], an implementation of randomrecursive trees and BA preferential attachment graphs are presented. Further, local access is Local Access to Random Walks given for the
Next-Neighbor query that returns the neighbors of a vertex in lexicographicorder, which is useful for accessing graphs where the degree is not bounded. Subsequently, [3]presented implementations for random G ( n, p ) graphs for any value of n , while supporting Next-Neighbor as well as the newly introduced
Random-Neighbor queries. In [3],algorithms are provided for accessing random walks on the line, random Dyck paths, andrandom colorings of a graph. Implementing access to random walks on the line graph wasmotivated by the implementation of interval summable functions in [10, 8].
In Section 2 we introduce notation and basic sampling tools. In Section 3 give a local accessalgorithm for undirected regular graphs with runtime in terms of expansion. In Section 4,we first apply the previous algorithm to random d -regular graphs. We then prove a nearlymatching lower bound with respect to an adaptive or non-oblivious adversary (one whohas access to the internal state of our algorithm), and a weaker bound with respect to anoblivious adversary. In Section 5 we give local access algorithms for small degree abelianCayley graphs, such as hypercubes and cycles. In Appendix B, we give local access algorithmsfor the tensor and Cartesian graph products. We first define terminology and introduce basic tools for sampling. We characterize thecloseness of query responses to true random walks via ℓ distance, and use ℓ distance forspectral arguments. ▶ Notation 2.1.
Given distributions
A, B over a set [ S ] , the ℓ distance between A and B is defined as || A − B || = P Si =1 | A i − B i | . The ℓ distance is defined as || A − B || = qP Si =1 ( A i − B i ) .For some set S , let U S denote the uniform distribution over S . Let s ← U S be an elementdrawn from this distribution. Next, we define notation for the distribution of random walks on fixed graphs. ▶ Notation 2.2.
Given regular G = ( V, E ) where V = [ n ] , v , v ∈ V and t ∈ N :Let λ ( G ) = max x ∈ R n : x ⊥ || W x || / || x || where W is the random walk matrix of G .Let D C ( G, v , v , t ) be the distribution over random walks of length t from v that end at v . As G is regular, this is the uniform distribution over all satisfying walks.Let U ℓG be the unconditional distribution of random walks from vertex of length ℓ .For any finite set of times S ∈ N k , let P S ( U G ) be the distribution of the positions at times S of random walks from vertex 1. We will measure the accuracy of an algorithm giventime queries S by bounding the ℓ distance of its responses to P S ( U G ) . For notationalconvenience, let P i = P { i } . We can then define the class of algorithms we consider. ▶ Definition 2.3. A local access algorithm A for a graph G is an algorithm that, given ϵ > and B ∈ N at initialization and a sequence of queries T = t , . . . , t r for r ≤ B , returnsvertices ( v t , . . . , v t r ) ← D such that || D − P T ( U G ) || ≤ ϵ . We believe the useful regime to be setting B = n c and ϵ = n − c − c ′ for desired constants c, c ′ ,giving a polynomial approximation in ℓ distance. Moreover, one can implicitly restrict allquery times to be below some polynomial threshold for the remainder of the paper. . S. Biswas, E. Pyne and R. Rubinfeld 5 ▶ Definition 2.4.
We say a local access algorithm A is efficient if for ϵ = 1 / poly( n ) ,B = poly( n ) and t i ≤ poly( n ) for all i , the algorithm runs in time polylog( n ) per query. Our definition of efficiency is motivated by the fact that taking a random walk of length t requires time Ω( t ), so an efficient algorithm allows one to incrementally construct a randomwalk in an arbitrary query order with total runtime within a polylog factor of the naivealgorithm. ▶ Definition 2.5.
A sequence of queries T to a local access algorithm is considered to be adaptive with respect to a local access algorithm A , if the i th query is allowed to dependon the internal state of the algorithm after query i − . Additionally, we call a local accessalgorithm robust if it is able to answer adaptive queries according to the correct distribution. For the remainder of the paper all presented algorithms will be robust (as they will succeedwith high probability over any sequence of queries), and we only consider the weaker notionin the context of lower bounds.There are a few subtleties with the definition. The first is that even in the non-adaptivecase, future queries may depend on the vertices returned by the algorithm. The i th queryof T is thus a function of v t , . . . , v t i − (in the non-adaptive case) and the state of A afterquery i − D is defined recursively by conditioning onthe first i query responses. However, at the end of any query sequence with queried times T , the projection P T is clearly independent of the order elements in T were queried, so thedistribution P T ( U G ) is still well defined.Finally, we state a basic result on partial sampling. ▶ Proposition 2.6.
Let G be a graph and T an ordered list of determined times in a walkon G . Let V T be the associated set of determined positions. Suppose V T has been sampledto within ϵ of the true distribution in ℓ distance. For any new query t , let t − < t < t + bethe closest low and high previously determined times. These are denoted the bracketingqueries . Then: The distribution of v t conditioned on v t − , v t + is equal to the distribution conditioned onall previously determined vertices. If v t is sampled from a distribution D where || D − P t − t − ( D C ( v t − , v t + , t + − t − )) || ≤ δ ,then ( V T , v t ) is ϵ + δ close to the true distribution. Furthermore, if the true distributionof v t is some deterministic function of k distributions, an equivalent result holds forsampling each distribution to within δ/k and returning the deterministic function appliedto these samples. In effect, this gives us the ability to only focus on bracketing queries while analyzing thecloseness of a local access algorithm to uniform.
We first give an algorithm for undirected regular graphs with e O ( − λ √ n ) work per query.This is sublinear for small numbers of queries on graphs with good expansion, but is far frompolylog work per query. ▶ Theorem 3.1.
Fix ϵ > and B ∈ N and λ ≥ . Given rand_neighbor and rand_vertex probe access to an undirected d -regular graph G on n vertices with λ ( G ) ≤ λ , there is adeterministic local access algorithm which uses O (log( nt ) − λ log( nB/ϵ )) additional spaceand O ( √ n · log( nt ) − λ log ( nB/ϵ )) time and working space per query. Local Access to Random Walks
Proof.
Let k = O ( − λ log( Bn/ϵ )) be the smallest integer such that λ ( G k ) ≤ ϵ/n B .The algorithm maintains a sorted list of previously determined positions (a superset ofprevious queries) and associated vertices T = t < · · · < t r , V T = v t , . . . , v t r . Betweenqueries, we maintain a constraint that for all t i , t i +1 it is either the case that t i +1 − t i = 1(so the queries are direct neighbors) or t i +1 − t i ≥ k .For a new query t , let t − ≤ t < t + be the bracketing queries, where t + = ∞ ifthe constraint is unidirectional. For notational convenience, let rand_path ( G, v, d ) be asequence of vertices obtain from making d successive rand_neighbor calls starting atvertex v . Furthermore let l = t − t − and r = t + − t . If l > k and r > k , set v t = rand_vertex ( G ). If l ≤ k and r > k , determine the vertices at [ t − + 1 , t ] as rand_path ( G, v t − , l ). If l > k and r ≤ k , determine the vertices at { t + − , t + − , . . . , t } as rand_path ( G, v t + , r ). If neither condition is satisfied, we have 2 k < | t + − t − | ≤ k . Let d = ⌊ ( t + − t − ) / ⌋ andlet L, R be empty sets of walks of length d from v t − and v t + respectively. Let COL bethe event a path from L and R share an endpoint. a. Let L ← L ∪ rand_path ( G, d, v t − ). b. Let R ← R ∪ rand_path ( G, d, v t + ). c. If COL , go to Phase II. d. After 2 √ n log( B/ϵ ) iterations determine the vertices at [ t − , t + ] as an arbitrary path,else repeat.In Phase II we have paths p l , p r sharing an endpoint. If there are multiple colliding paths,choose the first to occur. Let the determined vertices [ t − , t + ] be p l ¯ p r where ¯ p is thereverse of path p .Since the algorithm determines at most 2 k timesteps per query (in Case III), the incrementalpersistent storage is at most (log( t ) + log( n ))2 k = O (log( nt ) − λ log( Bn/ϵ )). The runtime isimmediate from the description.We now show the algorithm is ϵ -close to uniform. Slightly abusing notation, let T = t , . . . , t s be the ordered list of determined times after query s . Let the s + 1st query bedenoted t and write the bracketing vertices of t as t − < t < t + . As before, let l = t − t − and r = t + − t . By Proposition 2.6, showing the distribution of the vertices decided at query s + 1are ϵ/B -close to the true conditional distribution given the bracketing vertices { v t − , v t + } = Φsuffices to show the algorithm is ϵ close via a union bound over the B queries. If t was decided in Case 1, we have Pr( v t = v | Φ) = 1 /n . Let W be the random walkmatrix of G . ThenPr( P t ( D C ( G, Φ)) = v ) = W lv t − ,v W rv,v t + P w ∈ V W lv t − ,w W rw,v t + ≤ (1 /n + ϵ/Bn ) n (1 /n − ϵ/Bn ) ≤ n (1 + O ( ϵ/B )) . With a nearly identical lower bound. Taking a union bound over all n potential verticesand adjusting ϵ by a constant factor completes the proof. If t was decided in Case 2, we first decide the position at the end of the random walk(abusing notation assume t was this), and then assign the connecting walk. This isbecause fixing the endpoint v t , the distribution of the decided path v t − → v t is clearlyequal to the true distribution, since it was sampled via an unconstrained random walk.In the case where the walk was sampled v t + → v t , since G is undirected and regular theprobability of a walk is equal to that of the reversed walk, so the decided path remainstruly uniform. Then the analysis of deciding v t is nearly identical to Case I. . S. Biswas, E. Pyne and R. Rubinfeld 7 If t was decided in Case 3, since the distribution of left and right endpoints are 1 /n close to uniform in ℓ distance, by a simple collision probability argument A fails to finda collision with probability O ( ϵ/B ) and loses an equal amount in ℓ distance. Otherwise,we first decide the position at the midpoint of the random walk (abusing notation assume t was this), and then assign the connecting walks. This is because fixing a collision (andthus endpoint) at v t , the distribution of the decided paths v t − → v t and v t → v t + areequal to the true distribution, since they were sampled via unconstrained random walks.Then the analysis of deciding v t is nearly identical to Case I. ◀ Next, we study the question of implementing access to random regular graphs , which have theproperty that for all d ≥
3, the probability a random d -regular graph is an expander tends to1. This implies that Theorem 3.1 composed with the set of random regular graphs achievesruntime e O ( √ n ) per query. In fact, this is nearly the best possible runtime, as we prove no localaccess algorithm given probe access to random regular graphs making o ( √ n/ log( n )) probesper query achieves achieves subconstant error on adaptive query sequences. Furthermore,no local access algorithm making o ( n / ) probes per query achieves subconstant error on non-adaptive (in fact fixed in advance) query sequences. We first introduce notation forrandom d -regular graphs. ▶ Definition 4.1.
Let G ( n, d ) be the uniform distribution over d -regular graphs on n vertices.For d odd, we implicitly restrict to even n when taking limits.For a set of edges S = { ( v , w ) , . . . , ( v k , w k ) } , let G ( n, d ) ∩ S be the uniform distributionover d -regular graphs on n vertices containing all edges in S . Note that for certain S (forinstance, any containing a self-loop), this set is empty. For the remainder of the section we treat d as a constant while n trends to infinity, so O notation sometimes hides factors dependent on d . Furthermore we assume d ≥
3, since theother two cases are degenerate. We now state informal versions of the main results. First, asublinear algorithm for G ( n, d ) obtained as a consequence of Theorem 3.1. ▶ Corollary 4.2.
There exists a deterministic local access algorithm A with time per query O ( √ n log ( n )) that, given rand_neighbor and rand_vertex probe access to G ( n, d ) ,satisfies for any adaptive query sequence Q with | Q | ≤ poly( n ) , E G ← G ( n,d ) || D G, A ,Q − P Q ( U G ) || = o n (1) . where D G, A ,Q is the distribution of A ’s responses given probe access to G over sequence Q . Next, an e Ω( √ n ) lower bound against robust local access algorithms (i.e. those that faceadaptive sequences). ▶ Theorem 4.3 (Informal Statement of Theorem 4.16) . There is a constant n and anadaptive sequence Q such that any robust local access algorithm A given rand_neighbor and rand_vertex probe access to random d -regular graphs for n ≥ n with parameters ( ϵ, B ) = ( . , O (log( n )) makes Ω( √ n/ log( n )) graph probes per time query of Q in expectation. Finally, an Ω( n / ) lower bound that does not rely on adaptive query sequences. ▶ Theorem 4.4 (Informal Statement of Theorem 4.20) . There is a constant n and afixed query sequence Q such that any local access algorithm A given rand_neighbor and rand_vertex probe access to random d -regular graphs for n ≥ n with parameters ( ϵ, B ) = ( . , n / ) makes Ω( n / ) graph probes per time query of Q in expectation. Local Access to Random Walks
It is impossible to prove lower bounds for all subfamilies in G ( n, d ) (in fact we giveefficient local access algorithms for some later), but any possible algorithm being Ω(1) fromuniform on at least 99% of random regular graphs effectively rules out a unified approach.We begin by proving the e O ( √ n ) upper bound using the algorithm from Section 3. To doso, we recall the famous result that almost all random regular graphs are good expanders. ▶ Lemma 4.5 ([6]) . For all d ≥ , Pr( λ ( G ( n, d )) ≤ .
95) = 1 − o n (1) . Then the proof follows directly.
Proof of Corollary 4.2.
Choose B = poly( n ), ϵ = 1 / poly( n ) and compose the algorithm ofTheorem 3.1 with G ( n, d ), where we promise that λ ≤ .
95. In the case of poorly expandinggraphs this will result in walks that are arbitrarily far from truly random, but the runtimeper query will still be as claimed.For G such that λ ( G ) ≤ .
95 we obtain that for any (potentially adaptive) query sequence Q , || D G, A ,Q − P Q ( U G ) || ≤ / poly( n ). Then taking the expectation over G ( n, d ) we obtain E G ← G ( n,d ) || D G, A ,Q − P Q ( U G ) || ≤ / poly( n ) + Pr[ λ ( G ( n, d )) > .
95] = o n (1) . ◀ To prove the lower bounds, we first give three structural results which establish any algorithmmust succeed even when the first Ω( √ n ) graph probes define disjoint forests, and give testsfor closeness of walks to the uniform distribution supported on only a few queries.Our first goal is to show no algorithm making rand_neighbor probes to G ← G ( n, d )can efficiently find cycles. This is essential, as the entire lower bound rests on the probes madeby the algorithm defining a tree with Ω(1) probability. To do so, we first show conditioningon a small number of edges (e.g. those already known by the algorithm) does not increasethe conditional probabilities of non-revealed edges by more than a constant factor. ▶ Lemma 4.6.
For all d ∈ N there is a constant c d depending only on d such that for anarbitrary set of edges S with | S | ≤ √ n and v, w ∈ V arbitrary vertices where ( v, w ) / ∈ S , wehave Pr G ← G ( n,d ) ∩ S [( v, w ) ∈ G ] ≤ c d /n . We defer the proof to Appendix A. We use the configuration model of Bollobas [4] and astrengthening to handle degree sequences with small amounts of variation by [14].Furthermore, probe access to G ( n, d ) is equivalent to successively generating edgesuniformly at random over the set of regular graphs satisfying the existing constraint - ineffect, we can only determine edges when required, and this is the perspective we will use forthe proof. ▶ Lemma 4.7.
Let A be an algorithm having made k arbitrary rand_neighbor probes to G ( n, d ) and let the returned edges be E . Then the conditional distribution over graphs giventhe probe responses is uniform over G ( n, d ) ∩ E . Proof.
Let v , . . . , v k the origin vertices for the rand_neighbor probes and w , . . . , w k thereturned vertices. For H ∈ G ( n, d ) ∩ E we have Pr[ ∀ i, rand_neighbor H ( v i ) = w i ] = 1 /d k whereas for H ∈ G ( n, d ) \ E the equivalent probability is zero. ◀ Given these lemmas, we can now show the first Ω( √ n ) probes made by any local accessalgorithm will fail to find cycles or merge forests with constant probability. . S. Biswas, E. Pyne and R. Rubinfeld 9 ▶ Lemma 4.8.
Let A be an algorithm, where at each step A makes a rand_neighbor or rand_vertex probe to G ( n, d ) or marks any vertex. Each vertex touched by a probe ismarked. Then there is a constant k d depending only on d such that for any A with at most √ n/k d steps, with probability at least . ,the rand_neighbor probes will define a forest,no rand_neighbor probe will ever merge two marked trees. Proof.
Let V
1, and E
For any ℓ ≤ √ n/ log( n ) , we have that Pr G ← G ( n,d ) Pr σ ← U ℓG ( σ defines a tree ) ≥ − O (1 / log ( n )) . Proof.
This directly follows from setting q = √ n/ log( n ) in the above proof, as a randomwalk is simply a sequence where at each step we probe rand_neighbor at the currenthead. ◀ We now show random walks of length √ n/ log( n ) over random regular graphs exhibit adistinguishing feature that can be checked on small segments. Intuitively, with high probabilitythere will be no segment of length r = Ω(log( n )) where the simple path over the edges traversed in the walk between the endpoints of the segment is shorter than r/
2. Since theedges traversed by the walk will define a tree with high probability, an unusually shortinduced simple path implies the biased random walk corresponding to the tree metric in thatsegment is much shorter than its expectation, which is vanishingly unlikely. To show this, weformally define the path length of the induced simple path. ▶ Definition 4.10.
For a partially determined vertex sequence s = ( s , . . . , s ℓ ) ∈ ([ n ] , ∗ ) ℓ , letpath length PL( s ) be the distance between s and s ℓ in the induced (undirected, unweighted)graph G ′ = ([ n ] , E ′ ) , where ( u, v ) ∈ E ′ if and only if there exists i such that s i = u, s i +1 = v . We obtain that an unusually short simple path is vanishingly unlikely in any segment ofa random walk. ▶ Lemma 4.11.
Let σ ∈ [ n ] ℓ be a walk of length ℓ ≤ √ n/ log( n ) . Let F ( σ ) be the event anysegment s = ( σ i , . . . , σ j ) of length | s | ≥
40 log( n ) has PL( s ) < | s | / . Then Pr G ← G ( n,d ) Pr σ ← U ℓG [ F ( σ )] = o n (1) . Proof.
Let Ψ( σ ) be the event σ defines a tree. Then Pr G ← G ( n,d ) Pr σ ← U ℓG (Ψ( σ )) ≥ − O (1 / log ( n )) by Corollary 4.9.We now fix G and sequentially generate a random walk σ . For each vertex in the randomwalk, there is some edge that was the first traversed by σ (where we pick some edge for thefirst vertex arbitrarily). For each step of σ , at the current vertex v , label this first traversededge a − edge and all others + edges. Then in a random walk in any d -regular graph theprobability of step i being a + step is exactly ( d − /d and these events are independentfor all i . Furthermore, for all σ that define a tree the + and − labels exactly correspondto the distance metric on the tree induced by the random walk, with − corresponding tobacktracking towards the initial vertex.Now let s be any segment of length at least 40 log( n ). Let F ( s ) be the event F ( σ ) holdsin this segment. Let h ( s ) be the sum over + and − steps in s . Then by the definition ofsimple path and the correspondence between step labels and the tree metric: { h ( s ) ≥ s/ } ∩ Ψ( σ ) = ⇒ F ( s ) . But then we can apply a basic Chernoff bound to obtain Pr[ h ( s ) < (1 − δ ) µ ] ≤ exp( − δ µ/ δ = 1 / µ = E [ h ( s )] ≥ s/ h ( s ) < s/ ≤ exp( − (2 s/ / ≤ n − . Then taking a union bound over the at most ℓ such segments, we obtainPr( F ( σ )) ≤ Pr(Ψ( σ )) + X s ⊆ σ : | s |≥
40 log( n ) Pr( { h ( s ) < s/ } ) ≤ O (1 / log ( n )) + ℓ · n − . = o n (1) . ◀ We are now prepared to prove the lower bound. For the remainder of the section let A be alocal access algorithm with rand_neighbor and rand_vertex probe access to G ( n, d ).We give a sequence of at most c log( n ) time queries. By Lemma 4.8, any algorithm thatmakes fewer than √ n/k d c log( n ) probes per query sees non-merging trees with probability .
995 for the duration of the query sequence. Given this occurs, we force the algorithm toreturn a walk segment that appears with probability o (1) over the true distribution of randomwalks on G .We now begin to work with fixed instantiations of A . We use the perspective of A successively determining the graph by making new rand_neighbor probes. ▶ Definition 4.12.
For a fixed instantiation of A on G ( n, d ) , let T ( Q ) = ( V Q , S ) be the transcript of the history of the algorithm after a sequence of queries Q . V Q holds the verticesreturned at the times in Q , and S holds the set of edges revealed by rand_neighbor probes.Note the distribution over possible graphs at this time is G ( n, d ) ∩ S . Let X , . . . , X n be independent random variables taking values in { , } . Let X = P ni =1 X i and µ = E [ X ]. Then for any δ ∈ [0 , , Pr[ X ≤ (1 − δ ) µ ] ≤ exp( − δ µ/ . . S. Biswas, E. Pyne and R. Rubinfeld 11 An adaptive query sequence is simply a function f : T ( Q ) → N , where the next query is a (inour case deterministic) function of the existing transcript. A non-adaptive query sequence isa function g : V Q → N , where the next query can only depend on the vertices returned by A ,but not on the internal state of the algorithm. ▶ Notation 4.13.
Given a queried time t , denote by v t ∈ V Q the vertex returned by A for this time.Given a transcript T ( Q ) = ( V Q , S ) , for vertices v, w ∈ V , let d ( v, w ) be the length ofthe simple path between the vertices v, w in the graph induced by the edges in S , where d ( v, w ) = ∞ if no path exists. Denote the simple path itself (if one exists) as SP ( v, w ) . In the case where probes define non-merging trees, for all v, w ∈ V once d ( v, w ) < ∞ itis fixed for the duration of the query sequence, and there are never multiple simple pathsbetween vertices. This is a central component of the proof, as it implies A cannot “extend”paths without guessing.We first give a family of distinguishing functions that we will use to lower bound ℓ distance, and show that truly random walks satisfy them with vanishing probability. Thefunction F G checks two conditions - if the “walk” traversed edges that do not actually exist,and if the path length of a sufficiently large segment of the walk is too short. ▶ Definition 4.14.
For an arbitrary graph G = ( V, E ) let F G : { V, ∗} e → { , } be defined as F G ( w , . . . , w e ) = I ( ∃ i st. w i ̸ = ∗ , w i +1 ̸ = ∗ and ( w i , w i +1 ) / ∈ E OR ∃ i < j −
40 log( n ) st. PL( w i , . . . , w j ) < ( j − i ) / Furthermore PL is nonincreasing (and thus F is nondecreasing) with respect to revealing newvertices. Interestingly, the only reason we require knowing G to define F G is to rule out edges thatare not actually in the graph. ▶ Remark 4.15.
For ℓ ≤ √ n/ log( n ) we have E G ← G ( n,d ) F G ( U ℓG ) = o n (1) as a simpleconsequence of Lemma 4.11. Furthermore, as F G is nondecreasing with regard to addi-tional queries, for any set of timesteps W ⊆ [ ℓ ] and associated projection P W we obtain E G ← G ( n,d ) F G ( P W ( U ℓG )) = o n (1)For our first lower bound, as we chose the next query time based on the transcript of A after the previous query, we obtain that, for all local access algorithms, there exists asequence of bad queries. Note that the algorithm of Theorem 3.1 succeeds asymptoticallyalmost surely even on such adaptive sequences. ▶ Theorem 4.16.
There exist constants q d , n depending only on d , a family of distinguishingfunctions { F G : G ∈ G ( n, d ) } , and an adaptive query sequence Q of at most O (log( n )) queriessuch that any (possibly randomized) local access algorithm A , given rand_neighbor and rand_vertex probe access to G ( n, d ) that makes fewer than √ n/q d log( n ) probes per querysatisfies for all n ≥ n : E G ← G ( n,d ) | F G ( P Q ( U G )) − F G ( D G, A ,Q ) | ≥ . where D G, A ,Q is the distribution of A ’s responses given probe access to G over sequence Q . Proof.
Let q d = k d ·
203 where k d is from Lemma 4.8. Our procedure generates a sequenceof at most 203 log( n ) queries, so by assumption A makes at most √ n/k d probes. Thus by the lemma there is n such that for n > n with probability .
995 the algorithm never findscycles or merges trees. Note that we treat returned vertices as marked. Denote this event byΞ, and for the remainder of the proof we assume it holds (and otherwise we can terminatethe sequence).Our first query is at time e = √ n/ log( n ) (and there is an implicit query at time 0). Weclaim either d ( v , v e ) = ∞ or d ( v , v e ) < e/
20. Otherwise we would have ∞ > d ( v , v e ) ≥ e/
20 = √ n/
20 log( n ) , so the algorithm made at least √ n/
20 log( n ) probes at the first query, violating our assumptionon probe complexity.If d ( v , v e ) < e/
20, we apply Lemma 4.18. Thus we can extend the query sequence Q ← ( Q, Q ′ ) by at most 201 log( n ) queries such that any returned transcript T ( Q ) eithersatisfies F G ( V Q ) = 1 for all G (in which case we are done) or contains v t , v t ′ ∈ V Q such that d ( v t , v t ′ ) > | t ′ − t | .Now we have v t , v t ′ ∈ V Q such that d ( v t , v t ′ ) > | t ′ − t | , so we apply Lemma 4.17. Thuswe can extend the query sequence Q ← ( Q, Q ′ ) by at most log( n ) queries such that anyreturned transcript T ( Q ) contains v t , v t +1 ∈ V Q such that d ( v t , v t +1 ) >
1. Then let S be theedges in the transcript at the termination of the query sequence. We have | S | ≤ √ n and soby Lemma 4.6, and the definition of F G ,Pr G ← G ( n,d ) ∩ S [ F G ( V Q ) = 1] ≥ Pr G ← G ( n,d ) ∩ S [( v t , v t +1 ) / ∈ G ] = 1 − o n (1) . Then taking n such that this term is at least . n > max( n , n ) we obtain E G ← G ( n,d ) F G ( D G, A ,Q ) ≥ . n such that for anyprojection P Q , for all n > n E G ← G ( n,d ) F G ( P Q ( U ℓG )) ≤ E G ← G ( n,d ) F G ( U ℓG ) < . , and by taking n = max( n , n , n ) the result follows. ◀ To complete the proof, we must give short query sequences that when Ξ holds drivealmost all distinguishing functions to 1. We first show an algorithm that does not know of ashort enough path between returned vertices can be forced to return consecutive vertices inthe walk that it does not know a connecting edge between. ▶ Lemma 4.17 (No Viable Path Known) . Assuming Ξ holds, given a transcript T ( Q ) supposethere are prior queries v x , v y ∈ V Q such that d ( v x , v y ) > | y − x | . Then there exists an adaptiveextension of the sequence Q ′ ← ( Q, q ) of at most log( n ) queries such that for any returnedtranscript T ( Q ′ ) there are v t , v t +1 ∈ V Q ′ such that d ( v t , v t +1 ) > . Proof.
We show this by binary searching on the “gap”. WLOG assume x < y . At each step: Query at time m = ⌊ ( x + y ) / ⌋ . We have d ( v x , v m ) + d ( v m , v y ) ≥ d ( v x , v y ) so by non-negativity either d ( v x , v m ) > m − x or d ( v m , v y ) > y − m . If the first holds, let y ← m and recurse. Otherwise let x ← m and recurse.Since y − x < √ n at the start of the recursion after log( n ) queries we drive | x − y | to 1, andso obtain v t , v t +1 ∈ V Q such that d ( v t , v t +1 ) > ◀ . S. Biswas, E. Pyne and R. Rubinfeld 13 We next show algorithms cannot “fake” the existence of longer paths. The key idea is thatmodifying SP ( v , v e ) (or finding a second simple path) after returning v e is impossible whenthe algorithm fails to find cycles. We force A to return vertices that either trigger Lemma 4.17or feature excessive backtracking, which drives the distinguishing function to 1. ▶ Lemma 4.18 (Known Path Too Short) . Assuming Ξ holds, given a transcript T ( Q ) with v e ∈ V Q suppose d ( v , v e ) < e/ . Then there exists an adaptive extension of the sequence Q ′ ← ( Q, q ) of at most
201 log( n ) queries such that any returned transcript T ( Q ′ ) eithercontains v t , v t ′ ∈ V Q ′ where d ( v t , v t ′ ) > | t − t ′ | or satisfies F G ( V Q ′ ) = 1 for all G . Proof.
For the remainder of the analysis we implicitly assume that for all queries t, t ′ , d ( v t , v t ′ ) ≤ | t − t ′ | since otherwise the transcript satisfies the first condition and we are done.We give a recursive construction of q that “pushes down” the short path. Let x ← , y ← e .At each step we maintain the invariants that d ( v x , v y ) < ( y − x ) /
10 + 20 log( n ) + 2 and200 log( n ) ≤ y − x , which are initially satisfied by the lemma statement. Query A at time m = ⌊ ( x + y ) / ⌋ . Let r m = min v ∈ V { d ( v m , v ) : v ∈ SP ( v x , v y ) } be the length of the simple path from v m tothe simple path from v x to v y . If r m ≥
20 log( n ), we apply Lemma 4.19 with ( x, y, m ) which uses at most 3 log( n )additional queries and achieves the condition. If r m <
20 log( n ), we can bound the path length from some endpoint to v m . Either d ( v x , v m ) ≤ d ( v x , v y ) / r m or d ( v m , v y ) ≤ d ( v x , v y ) / r m . In the first case, d ( v x , v m ) ≤ d ( v x , v y ) / r m < (( y − x ) /
10 + 20 log( n ) + 2) / n ) ≤ ( m − x ) /
10 + 20 log( n ) + 2so letting y ← m the requirements of the recursion are satisfied. In the other case we set x ← m and achieve the same. Then if y − x <
200 log( n ), we have d ( v x , v y ) < ( y − x ) /
20 + 20 log( n ) + 2 < ( y − x ) / A at times { x + 1 , x + 2 , . . . , y − , y − } . Then any set of vertices { v x , . . . , v y } ⊂ V Q ′ where d ( v t , v t +1 ) ≤ t lies entirely inside S , and thus mustcontain SP ( v x , . . . , v y ). Therefore we have a walk segment of length at least 40 log( n )where PL( v x , . . . , v y ) = d ( v x , v y ) < ( y − x ) / F G ( V Q ′ ) = 1 for all G by thedefinition of F G as desired.Then the total number of queries is bounded above by (1 + 200) log( n ) by inspection andLemma 4.19, so we conclude. ◀▶ Lemma 4.19.
Assuming Ξ holds, given a transcript T ( Q ) suppose there are v x , v m , v y ∈ V Q where min v ∈ V { d ( v m , v ) : v ∈ SP ( v x , v y ) } ≥
20 log( n ) . Then there exists an adaptiveextension of the sequence Q ′ ← ( Q, q ) of at most n ) queries such that any returnedtranscript T ( Q ′ ) either contains v t , v t ′ ∈ V Q ′ where d ( v t , v t ′ ) > | t − t ′ | or satisfies F G ( V Q ′ ) = 1 for all G . Proof.
As before, we assume that for all queries t, t ′ , the returned vertices v t , v t ′ satisfy d ( v t , v t ′ ) ≤ | t − t ′ | since otherwise the transcript satisfies the first condition and we are done.We have x, m, y with a tree structure where the distance from v m to the simple pathfrom v x to v y is at least r t ≥
20 log( n ). Let w = arg min v ∈ V { d ( v m , v ) : v ∈ SP ( v x , v y ) } be the vertex (which has not necessarily been returned) at the point where the simple pathto v m branches from SP ( v x , v y ). With at most log( n ) queries, we force A to output that therandom walk visits w at times t ≤ m −
20 log( n ) and t ≥ m + 20 log( n ).To do so, we apply the following recursion. Let a ≤ b be times and u a vertex where u ∈ SP ( v a , v b ).Query A at time t = ⌊ ( a + b ) / ⌋ . If v t = u , halt.We have d ( v a , v t ) ≤ t − a and d ( v t , v b ) ≤ b − t by assumption.Either u ∈ SP ( v a , v t ) or u ∈ SP ( v t , v b ). If the first let b ← t and otherwise a ← t .After log( n ) queries we drive b − a to 1. By assumption d ( v a , v b ) ≤ b − a = 1 and u ∈ SP ( v a , v b ),so A must have returned u at some timestep.We use this subrecursion twice, with ( a, b, u ) = ( x, m, w ) for the first call and ( a, b, u ) =( m, y, w ) for the second. Let t , t be the times obtained from these applications where v t = v t = w . We claim t ≤ m −
20 log( n ) and t ≥ m + 20 log( n ). If t < m −
20 log( n ),we have d ( v t , v m ) = d ( w, v m ) ≥
20 log( n ) and thus | t − m | < d ( v t , v m ), violating ourfirst assumption (and the other case is identical). But then if this does not occur, wehave a segment { v t , . . . , v t } ⊂ V Q ′ of length at least 40 log( n ) where v t = v t = w , soPL( v t , . . . , v t ) = 0 <
40 log( n ) / F G ( V Q ′ ) = 1 for all G as desired. ◀ This concludes the proof of our adaptive lower bound.
The Theorem 4.16 lower bound constructs valid adaptive query sequences, but relies onlooking at the edges known to A to choose the next query and so does not rule out non-robustlocal access algorithms for G ( n, d ). We now give a weaker Ω( n / ) lower bound that uses aglobal query sequence (not even depending on the returned vertices) that still suffices to ruleout efficient local access by an exponential margin. ▶ Theorem 4.20.
There exist constants k d , n depending only on d , a family of distinguishingfunctions { F G : G ∈ G ( n, d ) } , and a fixed query sequence Q of n / queries such that any(possibly randomized) algorithm A given rand_neighbor and rand_vertex probe accessthat makes fewer than n / /k d probes per query satisfies for all n ≥ n : E G ← G ( n,d ) | F G ( P Q ( U G )) − F G ( D G, A ,Q ) | ≥ . where D G, A ,Q is the distribution of A ’s responses given probe access to G over sequence Q . Proof.
Take k d as in Lemma 4.8, and define the query sequence as Q = ( n / , , , . . . , n / − e = n / . The distinguishing function is identical to before, so byRemark 4.15 there is n such that for n ≥ n we have E G ← G ( n,d ) F G ( P Q ( U G )) < . | Q | = n / , the number of probes made by A is bounded by √ n/k d , and so byLemma 4.8 there is n such that for all n > n , with probability .
995 the algorithm neverfinds cycles or merges trees. Note that we treat returned vertices as marked. Denote thisevent by Ξ.Given Ξ holds, we claim that at the completion of the query sequence either d ( v , v e ) = ∞ or d ( v , v e ) < n / /
2. If this was not the case, since A cannot alter d ( v , v e ) after the firstquery without finding cycles, A made at least n / / > n / /k d probes after the first querywhich violates our assumption on probe complexity.Then let the transcript at the end of the sequence be T ( Q ) = ( V Q , S ), recalling S is theedges revealed via probes. . S. Biswas, E. Pyne and R. Rubinfeld 15 If there exist v t , v t +1 ∈ V Q such that d ( v t , v t +1 ) >
1, we have | S | ≤ √ n and so byLemma 4.6 and the definition of F G , Pr G ← G ( n,d ) ∩ S ( F G ( V Q ) = 1) ≥ Pr G ← G ( n,d ) ∩ S [( v t , v t +1 ) / ∈ G ] = 1 − o n (1) . If this never occurred, the segment { v , . . . , v e } ⊆ V Q traverses only edges in S , so itmust contain all edges in SP ( v , v e ). Therefore PL( v , . . . , v e ) = d ( v , v e ) < n / / F G ( V Q ) = 1 for all G .Then taking n = max( n , n , n ) where n is chosen such that the 1 − o n (1) term is above . ◀ We now turn to classes of graphs with algebraic structure. We achieve efficient (i.e. runtimepolylogarithmic in n ) local access to the hypercube, n -cycle and spectral expanders. InAppendix B, we achieve efficient local access for arbitrarily dense graphs via the tensor andCartesian product. This is comparable to the work of [3, 10], which (among many otherresults) give a local access algorithms for random walks with fixed start and end vertices onspecific classes of graphs. ▶ Definition 5.1.
For a group Γ of order n and S ⊆ Γ , the Cayley graph G = Cay(Γ , S ) isthe degree | S | graph on n vertices where for all g ∈ Γ , e ∈ S we add the edge ( g, ge ) withlabel e . We call S the generators of G . We say the Cayley graph is abelian if the subgroupgenerated by S is. More concretely, a Cayley graph is abelian if for all e i , e j ∈ S we have e i e j = e j e i . We donot require S to be closed under inverses. ▶ Theorem 5.2.
Fix ϵ > and B ∈ N . Let G = Cay(Γ , S ) be an abelian Cayley graph on n elements with d = | S | , where for all g ∈ G , g is computable in polylog( n ) time. There is alocal access algorithm using O ( d log( t )) additional space and d · polylog( n, t, B/ϵ ) time andworking space per query. We defer the proof to Appendix B. In the parallel model, [17] gives an algorithm forefficient generation of random walks on all Cayley graphs. For a walk of length t , they sample σ ∈ S t and compute the t prefixes { s i } i ∈ [ t ] = { Q ij =1 σ j } i ∈ [ t ] in parallel. Unfortunately,even computing a single prefix of a product of generators in sequential sublinear time is notobviously possible without further restrictions.Although abelianness represents a strong algebraic assumption, Theorem 5.2 immediatelyprovides local access algorithms for several graph families of interest in computer science. ▶ Corollary 5.3.1.
By considering
Γ = ( Z / Z ) d and taking S = ( e , . . . , e d ) as the generating set, there is anefficient local access algorithm for random walks on the dimension d hypercube for all d . By considering
Γ = Z /n Z and taking S = (1 , − as the generating set, there is anefficient local access algorithm for random walks on the n -cycle for all n . By Proposition 5 of [1], for all m ∈ N there is an explicitly constructible set S m where | S m | = O ( m ) such that Cay( Z m , S m ) has spectral gap / . Thus there is an efficient localaccess algorithm for random walks on a class of polylog degree expanders of size m forall m . We remark that despite all constant-degree abelian Cayley graphs being poor expanders,efficient local access is easy to provide, while for well-expanding random-regular graphswe obtain a polynomial lower bound. This indicates sublogarithmic mixing time is not adeterminative property for efficient local access.
References Noga Alon and Yuval Roichman. Random Cayley graphs and expanders.
Random Struct.Algorithms , 5(2):271–285, 1994. doi:10.1002/rsa.3240050203 . Alexandr Andoni, Robert Krauthgamer, and Yosef Pogrow. On solving linear systems insublinear time. In Avrim Blum, editor, , volume 124of
LIPIcs , pages 3:1–3:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.ITCS.2019.3 . Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Local access to hugerandom objects through partial sampling. In . Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Béla Bollobás. A probabilistic proof of an asymptotic formula for the number of labelledregular graphs.
European Journal of Combinatorics , 1(4):311–316, 1980. Guy Even, Reut Levi, Moti Medina, and Adi Rosén. Sublinear random access generators forpreferential attachment graphs. arXiv preprint arXiv:1602.06159 , 2016. Joel Friedman. A proof of Alon’s second eigenvalue conjecture and related problems.
CoRR ,cs.DM/0405020, 2004. URL: http://arxiv.org/abs/cs/0405020 . Alan M. Frieze, Navin Goyal, Luis Rademacher, and Santosh S. Vempala. Expanders viarandom spanning trees.
SIAM J. Comput. , 43(2):497–513, 2014. doi:10.1137/120890971 . Anna C Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, Sivaramakrishnan Muthukrishnan,and Martin J Strauss. Fast, small-space algorithms for approximate histogram maintenance.In
Proceedings of the thiry-fourth annual ACM symposium on Theory of computing , pages389–398, 2002. Oded Goldreich, Shafi Goldwasser, and Asaf Nussboim. On the implementation of hugerandom objects. In , pages 68–79. IEEE Computer Society, 2003. doi:10.1109/SFCS.2003.1238182 . Oded Goldreich, Shafi Goldwasser, and Asaf Nussboim. On the implementation of hugerandom objects.
SIAM Journal on Computing , 39(7):2761–2822, 2010. Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs.
Electron.Colloquium Comput. Complex. , 7(20), 2000. URL: http://eccc.hpi-web.de/eccc-reports/2000/TR00-020/index.html . Jonathan A. Kelner and Aleksander Madry. Faster generation of random spanning trees. In , pages 13–21. IEEE Computer Society, 2009. doi:10.1109/FOCS.2009.75 . Aleksander Madry, Damian Straszak, and Jakub Tarnawski. Fast generation of randomspanning trees and the effective resistance metric. In Piotr Indyk, editor,
Proceedings of theTwenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego,CA, USA, January 4-6, 2015 , pages 2019–2036. SIAM, 2015. doi:10.1137/1.9781611973730.134 . Brendan D McKay and Nicholas C Wormald. Asymptotic enumeration by degree sequence ofgraphs with degrees o ( n / ). Combinatorica , 11(4):369–382, 1991. Moni Naor and Asaf Nussboim. Implementing huge sparse random graphs. In
Approximation,Randomization, and Combinatorial Optimization. Algorithms and Techniques , pages 596–608.Springer, 2007. . S. Biswas, E. Pyne and R. Rubinfeld 17 Ronitt Rubinfeld and Arsen Vasilyan. Approximating the noise sensitivity of a monotoneboolean function. In Dimitris Achlioptas and László A. Végh, editors,
Approximation, Random-ization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM2019, September 20-22, 2019, Massachusetts Institute of Technology, Cambridge, MA, USA ,volume 145 of
LIPIcs , pages 52:1–52:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,2019. doi:10.4230/LIPIcs.APPROX-RANDOM.2019.52 . Shang-Hua Teng. Independent sets versus perfect matchings.
Theor. Comput. Sci. ,145(1&2):381–390, 1995. doi:10.1016/0304-3975(94)00289-U . A Proof of Lemma 4.6
We apply the configuration model of [4], extended to sequences of degrees. In the configurationmodel, given a degree sequence d = ( d i ) i ∈ [ n ] , we place d i half-edges at vertex i and connectall half-edges with a random matching. In the case where d i = d for all i , if the graphinduced by a random matching is simple, we produce a random draw from G ( n, d ). In ourcase, we “remove” half edges that are already occupied by S , place a random matching onthe remaining half edges, and show that if the induced graph is simple and does not duplicateedges in S , we obtain a random draw from G ( n, d ) ∩ S . We can use then use this to analyzeconditional edge probabilities.We first recall a lower bound on the probability that such a random matching induces asimple graph. For a degree sequence d , define D = D ( d ) = P ni =1 d i , D = P ni =1 d i ( d i − D = P ni =1 d i ( d i − d i − P ( d ) be the probability that a random matching on d has no loops or multiple edges. The forthcoming lemma assumes max i d i = o ( D ) whichclearly holds in our application. ▶ Lemma A.1 ([14] Lemma 5.1) . P ( d ) ≥ exp( − D D − D D − D D D ) . We can then apply this lemma to prove the main claim. We remark that the bound | S | ≤ √ n be be improved to | S | = o ( n ), with c d depending on when | S | /n falls below someconstant threshold. ▶ Lemma 4.6.
For all d ∈ N there is a constant c d depending only on d such that for anarbitrary set of edges S with | S | ≤ √ n and v, w ∈ V arbitrary vertices where ( v, w ) / ∈ S , wehave Pr G ← G ( n,d ) ∩ S [( v, w ) ∈ G ] ≤ c d /n . Proof.
Let d = ( d i ) i ∈ [ n ] be the sequence where d i is the remaining degree of vertex i given S . We place a random matching on this degree sequence. Given such a matching M , wecontract it to a (multi) graph G M by treating each bucket as a single vertex. ▷ Claim A.2.
Given a randomly drawn matching M where G M is simple and G M ∩ S = ∅ , G M ∪ S is a uniform draw from G ( n, d ) ∩ S . Proof.
All possible simple graphs G M are induced by exactly Q i ( d i !) matchings, so theconditional distribution over such graphs is uniform. Then multiplying by the indicatorvariable I [ G M ∩ S = ∅ ], which corresponds to there being no duplicated edges between G M and S , produces the uniform distribution over the desired subset of graphs. ◀ We next show G M satisfies the conditions of Claim A.2 with probability depending onlyon d . First, we show the matching is simple not considering the edges of S with constantprobability. ▷ Claim A.3.
We have Pr( I [ G M simple]) = P ( d ) ≥ exp( − d ( d + 2)). Proof.
We use the (crude) bounds D ≥ dn/ D ≤ d n and and D ≤ d n . Then applyingLemma A.1, P ( d ) ≥ exp( − d n/dn − d n /d n − d n d n/ d n ) = exp( − d ( d + 1 + d / n ))and choosing n ≥ d gives the claimed bound. ◀ We then show G M duplicates edges in S with vanishing probability, which suffices toestablish a constant lower bound on the probability of a “good” draw. ▷ Claim A.4.
Pr( { G M simple } ∩ { G M ∩ S = ∅} ) = ρ d > Proof.1.
Taking n large enough Pr( G M simple) ≥ exp( − d ( d + 2)) by the previous claim. The probability of an edge between any two vertices in G M is at most 2 d /dn by a unionbound. There are at most √ n pairs of vertices with edges in S , so by a further unionbound all such pairs are missing with probability at least 1 − d/ √ n .Then taking n large enough that the second term is at least 1 − exp( − d ( d + 2)) /
2, we havePr( I [ G M simple] ∩ I [ G M ∩ S = ∅ ]) ≥ exp( − d ( d + 2) / / ρ d as desired. ◀ Now we are almost done. We have Pr[( v, w ) ∈ G M ] ≤ d/n and thusPr G ← G ( n,d ) ∩ S [( v, w ) ∈ G ] = Pr[( v, w ) ∈ G M |{ G M simple } ∩ { G M ∩ S = ∅} ] ≤ Pr[( v, w ) ∈ G M ] / Pr[ { G M simple } ∩ { G M ∩ S = ∅} ] ≤ d/ρ d n So taking c d = 2 d/ρ d (and increasing as needed to handle the small n cases by making thebound greater than 1) we conclude. ◀ B Local Access With Algebraic Structure
We now detail the approach for local access to abelian Cayley graphs and graph products.The methods we use are simple and similar to those of [17], who construct algorithmsfor efficient parallel generation of random walks on a variety of structured graphs. Ineach case, there is some element of algebraic structure that enables sampling the relevantfeature of a walk (its position at a new timestep) via sampling lower-dimensional distributions.We first recall the Multinomial (MNom) and Multivariate Hypergeometric (MHGeom)distributions, which we can sample from efficiently. ▶ Proposition B.1 ([3] Theorem 21) . Given ϵ > , we can sample from the followingdistributions within ϵ in ℓ distance: given t ∈ N , ( p , . . . , p d ) ∈ Q d , we can generate S ← MNom( t, ( p , . . . , p d )) in time O ( d · polylog( t, /ϵ )) , given m ∈ N , ( c , . . . , c d ) ∈ N d , we can generate S ← MHGeom( m, ( c , . . . , c d )) in time O ( d · polylog( m, P i c i , /ϵ )) . Note that sample time is linear in the dimension of the distribution but (poly)logarithmic inthe number of elements. . S. Biswas, E. Pyne and R. Rubinfeld 19
B.1 Low-Degree Abelian Cayley Graphs
For all Cayley graphs, sampling a walk of length ℓ is equivalent to sampling a random productof elements in S of length ℓ . But in the abelian case, the value of a random product (andthus endpoint of a random walk) only depends on the counts of elements in the product.Thus we can sample the distribution of edge labels, and thus endpoints, in time linear in d but logarithmic in ℓ .To do this, we first recall the distribution of edge labels in a random product. ▶ Proposition B.2.
Let G = Cay(Γ , ( e , . . . , e d )) be an abelian Cayley graph where | Γ | = n . The counts of edge labels in a random walk of length ℓ from any vertex are distributed MNom( ℓ, (1 /d, . . . , /d )) . Let D C ( c , . . . , c d ) be the set of random walks from any vertex of length ℓ = P di =1 c i thattraverse c i edges with label i . Then the counts of edge labels along the first t ≤ ℓ steps ofwalks in D C ( c , . . . , c d ) are distributed MHGeom( t, ( c , . . . , c d )) . We can then provide local access to abelian Cayley graphs. In the random regular graphcase, the difficulty came from sampling conditional “products”, but here we take advantageof that fact that permuting the order of elements in a product preserves endpoints in orderto sample counts of edge labels unconditionally. ▶ Theorem 5.2.
Fix ϵ > and B ∈ N . Let G = Cay(Γ , S ) be an abelian Cayley graph on n elements with d = | S | , where for all g ∈ G , g is computable in polylog( n ) time. There is alocal access algorithm using O ( d log( t )) additional space and d · polylog( n, t, B/ϵ ) time andworking space per query. Proof.
The algorithm maintains a sorted list of previous times and positions T = t < · · · < t r , V T = v t , . . . , v t r where t = 0 and v is fixed at initialization. In addition, for all i ∈ [ r − L , where L t i = [ l , . . . , l d ] with the invariantthat the the walk has l j steps with label j between t i and t i +1 . Given a dictionary entry L t = [ l , . . . , l d ] and v ∈ G , define G [ v, L t ] = v Q e i ∈ S e l i i .Given a new query t : If t > t r , set L t r ← MNom( t − t r , (1 /d, . . . , /d ) , ϵ/B ), and set v t ← G [ v t r , L t r ]. Otherwise let t − < t < t + be the bracketing queries. Sample D ← MHGeom( t − t − , L t − , ϵ/B ), set v t ← G [ v t − , D ], set L t ← L t − − D and set L t − ← D .Storing the query time takes incremental space O (log( t )), storing the determined vertex O (log( n )), and storing the dictionary O ( d log( t )).The runtime is immediate from Proposition B.1 and the assumption that group productsare computable in time polylog( n ), so we can use d iterations of repeated squaring with eachrequiring time polylog( t, n ).In both the unidirectionally and bidirectionally constrained case, the vertex reachedby a random walk on an abelian Cayley graph is a deterministic function of the countsof the bracketing edge labels . Therefore ensuring the counts in each new dictionary aresampled to within ϵ/B of the true distribution is sufficient to establish the approximation byProposition 2.6. Since in both cases we approximate the true distribution to within ϵ/B in ℓ distance by Proposition B.2, the result follows. ◀ B.2 Graph Products
We can utilize the structure of common graph product operations to provide local access,given algorithms for their components. To do so, we give arguably the simplest possible local access algorithm, one that is only efficient when the time queries are far larger than the sizeof the graph, to use as the basis for product constructions. ▶ Lemma B.3.
Fix ϵ > and B ∈ N . Given a graph G = ( V, E ) on n vertices with λ ( G ) < c , there is a local access algorithm, which uses O (log( tn )) additional space andruntime O (poly( n, log( t/ϵ )) time and working space per query. Proof.
The algorithm solely remembers previously determined times and vertices v t , . . . , v t k .Given query t , let t − < t < t + be the bracketing times. For convenience, define l = t + − t − and m = t − t − . We then explicitly sample the desired distribution to within ϵ in ℓ distance.Let W be the transition matrix of G . Then for v ∈ V ,Pr( P m ( D C ( G, v t − , v t + , l )) = v ) = W mv t − ,v W l − mv,v t + / X u ∈ V W mv t − ,u W l − mu,v t + . We can then use n log( t ) repeated squares of the transition matrix to compute the PDF, andthen sample to the desired accuracy and return. ◀ To make this algorithm concrete, for an undirected aperiodic graph G on n vertices with ϵ = n − c , we obtain a runtime of e O ( n ω ), while we desire runtime polylogarithmic in n .We first examine the tensor product of graphs. ▶ Definition B.4.
Given graphs G = ( V , E ) , G = ( V , E ) the tensor product of G and G , denoted G × G , is the graph with vertex set V × V where ( v , v ) , ( w , w ) areadjacent if and only if ( v , w ) ∈ E and ( v , w ) ∈ E . The projection of a random walk on the tensor product onto its component graphs isan independent random walk over each graph. Then we can easily decompose samplingconditional products to sampling on the components. ▶ Lemma B.5.
Given local access algorithms A , A for graphs G , G running in time T ( A , ϵ, B, t ) , T ( A , ϵ, B, t ) , there is a local access algorithm A T for G × G with runtime T ( A T , ϵ, B, t ) = T ( A , ϵ/ , B, t ) + T ( A , ϵ/ , B, t ) + O (log( | G | · | G | , t, B/ϵ )) . Proof.
The algorithm initializes both sub-algorithms A , A with parameters ϵ/ , B . Uponreceiving query t , A T itself queries A , A with time t . Let the obtained vertices be v ′ , w ′ respectively, and A returns ( v ′ , w ′ ). Since the vertex in a walk on a tensor product is adeterministic function of the two (independent) component distributions, by Proposition 2.6we obtain the desired approximation. The runtime is composed of the required calls to thesub-algorithms, plus the time to write the inputs to each and output the returned vertex. ◀ We then obtain efficient local access to walks on arbitrarily dense graphs. ▶ Corollary B.6.
Fix ϵ > and B ∈ N . Let G be an arbitrary graph. For all k ≥ there is alocal access algorithm for G × k with runtime O ( k log ( B/ϵ )) , where we hide factors polynomialin | G | . Proof.
Let b ( i ) ∈ { , } log( k ) be the representation of i in binary. Then G × k ∼ = × j ∈ b ( k ) G × j and so the iterated tensor product can be written as a binary tree of products with depthbounded by 2 log( k ). Choosing the uppermost local access algorithm algorithm to have error ϵ/B implies the leaves have error parameter ϵ/B k ) = Ω( ϵ/B log ( n )). Then the resultfollows by applying the algorithm from Lemma B.3 to the k copies of G at the leaves of thetree. ◀ . S. Biswas, E. Pyne and R. Rubinfeld 21 For G a regular graph with degree d , since | G × k | = | G | k and deg( G × k ) = d k , we obtainefficient local access to infinite families of degree ( | G | k ) δ graphs for any δ = log | G | ( d ) ∈ (0 , ▶ Definition B.7.
Given regular graphs G = ( V , E ) , G = ( V , E ) the Cartesian prod-uct of G and G , denoted G □ G , is the graph with vertex set V × V where ( v , v ) , ( w , w ) are adjacent if and only if v = w and ( v , w ) ∈ E or, v = w and ( v , w ) ∈ E . In a similar manner to the abelian Cayley case, we use the ability to decompose walksof length r as t and r − t steps on the first and second coordinate respectively to alwayssample counts with a unidirectional constraint. Conditioning on these counts, we can sampleefficiently given the local access algorithms for each component. ▶ Lemma B.8.
Given local access algorithms A , A for regular graphs G , G running intime T ( A , ϵ, t ) , T ( A , ϵ, t ) , there is a local access algorithm A C for G □ G running in time T ( A C , ϵ, t ) = T ( A , ϵ/ , t ) + T ( A , ϵ/ , t ) + polylog( | G | · | G | , t, B/ϵ ) . Proof.
The approach is similar to that of Lemma 5.2. Rather than sample a conditionalwalk at each step, for a unidirectionally constrained walk of length ℓ we sample the numberof steps on each component, and use these “times” as inputs to A and A .Initialize A , A with ϵ = ϵ/ B = B . Let d = deg( G ) and d = deg( G ). Thealgorithm maintains a sorted list of previously queried times and positions T = t , . . . , t r , V T = v t , . . . , v t r where t = 0 and v is fixed at initialization. In addition, the algorithmmaintains S = s t , . . . , s t r where s i is the number of steps on G in the interval [0 , t i ]. Givena new query t : If t > t r set s t ← s t r + BNom( t − t r , ( d / ( d + d ) , d / ( d + d )) , ϵ/ B ) . Otherwise let t − < t < t + be the bracketing queries. Set s t ← s t − + HGeom( t − t − , ( s t + − s t − , t + − t − ) , ϵ/ B ).Finally set v t ← ( A ( s t ) , A ( t − s t )).The runtime consists of sampling via Proposition B.1, calling the algorithms for compo-nents and writing the output to the tape. The analysis of closeness in distance is nearlyidentical to that of Theorem 5.2, except that we take ϵ ← ϵ/ ◀ In an identical manner to the construction of higher tensor powers, composing Lemma B.8with itself in a binary tree and using Lemma B.3 as a base case gives local access to the k thcartesian product G □ k of a d -regular graph G with runtime O ( k polylog( ϵ/B )), again hidingfactors in ||
The approach is similar to that of Lemma 5.2. Rather than sample a conditionalwalk at each step, for a unidirectionally constrained walk of length ℓ we sample the numberof steps on each component, and use these “times” as inputs to A and A .Initialize A , A with ϵ = ϵ/ B = B . Let d = deg( G ) and d = deg( G ). Thealgorithm maintains a sorted list of previously queried times and positions T = t , . . . , t r , V T = v t , . . . , v t r where t = 0 and v is fixed at initialization. In addition, the algorithmmaintains S = s t , . . . , s t r where s i is the number of steps on G in the interval [0 , t i ]. Givena new query t : If t > t r set s t ← s t r + BNom( t − t r , ( d / ( d + d ) , d / ( d + d )) , ϵ/ B ) . Otherwise let t − < t < t + be the bracketing queries. Set s t ← s t − + HGeom( t − t − , ( s t + − s t − , t + − t − ) , ϵ/ B ).Finally set v t ← ( A ( s t ) , A ( t − s t )).The runtime consists of sampling via Proposition B.1, calling the algorithms for compo-nents and writing the output to the tape. The analysis of closeness in distance is nearlyidentical to that of Theorem 5.2, except that we take ϵ ← ϵ/ ◀ In an identical manner to the construction of higher tensor powers, composing Lemma B.8with itself in a binary tree and using Lemma B.3 as a base case gives local access to the k thcartesian product G □ k of a d -regular graph G with runtime O ( k polylog( ϵ/B )), again hidingfactors in || G ||