Efficient and near-optimal algorithms for sampling connected subgraphs
aa r X i v : . [ c s . D S ] J u l Faster algorithms for sampling connected induced subgraphs
Marco BressanSapienza University of [email protected] 24, 2020
Abstract
We consider the following problem: given a graph G = ( V, E ) and an integer k , sample aconnected induced k -node subgraph of G (also called k -graphlet) uniformly at random. Thebest algorithms known achieve ε -uniformity and are based on random walks or color coding.The random walk approach is elegant, but has a worst-case running time of ∆ Θ( k ) log nε where n = | V | and ∆ is the maximum degree of G . Color coding is more efficient, but requires apreprocessing phase with running time and space 2 Θ( k ) O ( m log ε ) where m = | E | .Our main result is an algorithm for ε -uniform sampling with the following guarantees. Thepreprocessing runs in time k O ( k ) O (cid:0) ε n log n (cid:1) and space O ( n ). This implies a sublinear prepro-cessing when G is dense enough. After, the algorithm yields independent ε -uniform k -graphletsin k O ( k ) O (cid:0) ( ε ) log ε (cid:1) expected time per sample. The preprocessing phase computes in one passan approximate ordering of G that makes rejection sampling efficient in the sampling phase, andthe ε -uniformity is based on estimating cuts and coupling arguments. In fact, the algorithmderives from a much simpler algorithm which has O ( m log ∆) preprocessing time and returns perfectly uniform k -graphlets from G in k O ( k ) O (log ∆) expected time per sample. To the bestof our knowledge this is the first efficient algorithm for perfectly uniform graphlet sampling. Inaddition, we give an almost-tight bound for the random walk technique. More precisely, we showthat the most commonly used random walk has mixing time k O ( k ) O ( t ( G )( ∆ δ ) k − log n ) where t ( G ) is the mixing time of G and δ is its minimum degree. This improves on recent results andis tight up to a factor k O ( k ) δ log n . Introduction
We address the following problem. Given an undirected graph G , sample uniformly at random aconnected and induced k -node subgraph of G , where k ≥ some subgraph on k nodes uniformlyat random can be done by sampling a random subset of k nodes from G . As [15] notes, however,guaranteeing that the subgraph is connected and induced is a challenge by itself. These connectedinduced subgraphs are usually called k -graphlets, and we follow this convention. Although graphletsampling is a natural and relevant problem, a tight characterization of its complexity is missing.Currently, all graphlet sampling algorithms with formal guarantees are based on one of twounrelated techniques. The first technique is Markov chain Monte Carlo, and is based on runninga random walk over the space of all k -graphlets of G until a stationary distribution is reached.The graphlets are sampled from this stationary distribution, and rejection sampling is then appliedto obtain a uniform distribution. This technique is widely used [1, 5, 11, 12, 15, 19, 23]. Thefundamental issue with this technique is in the mixing time of the chain. Knowing the mixing timeis crucial, since it bounds the running time of the algorithm and in fact it is even necessary forrunning it (as it tell us when to stop the walk). Yet the best upper bound on the mixing timehas a gap of ( ∆ δ ) k − against the lower bound, where ∆ and δ are the maximum and minimumdegrees of G . Thus in the worst case the walk takes ∆ Θ( k ) steps, and moreover a tight bound isunknown. The second technique is based on color coding [4]. As shown in [7], color coding yields atwo-phase algorithm for sampling graphlets. The algorithm has preprocessing time 2 O ( k ) O ( m ) andspace 2 O ( k ) O ( n ), and sampling time k O ( k ) O (∆); by increasing the space to 2 O ( k ) O ( m ), one cansample in k O ( k ) . The algorithm does in fact much more than just sampling, as it counts the numbercolorful treelets rooted at every node of G of any possible shape, size, and subset of colors. This isfor sampling from a small sub-population of all graphlets in G , but can be extended to all graphletswith a multiplicative time and space overhead of log ε . Whether faster or simpler algorithms existremains unknown. Moreover, both techniques yield only approximately uniform samples. Our contribution.
In this work we give two contributions. The first one consists in two algo-rithms for sampling k -graphlets uniformly and ε -uniformly. These algorithms do not use randomwalks or color coding. The first algorithm, u-sampler , yields: Theorem 1.
The preprocessing of u-sampler ( G ) uses time O ( m log ∆) and space O ( n ) . Af-terwards, u-sampler ( G ) draws k -graphlets independently and uniformly at random from G in k O ( k ) O (log ∆) expected time per sample. u-sampler is simple, and is mainly intended as entry point for the second algorithm. Yet, itguarantees perfectly uniform samples; to the best of our knowledge it is the first efficient algorithmto do so. The idea behind the algorithm is to make rejection sampling efficient. Indeed, rejectionsampling becomes inefficient because of the gap between the largest and the smallest probabilityof the sampling distribution. We show that we can bypass this obstacle by sorting the nodes of G greedily, in time O ( m ), by iteratively removing the node of maximum degree. This implicitlypartitions the graphlets of G into n buckets, so that the max-to-min probability ratio is bounded by k O ( k ) , both in the distribution of bucket sizes and in the distribution of graphlet samples inside eachbucket. We obtain a two-stage sampling (first the bucket, then the graphlet) where each sample is2ccepted with probability at least k −O ( k ) , so we need k O ( k ) trials per sample. To sample a singlegraphlet efficiently, in the preprocessing we sort the adjacency lists of G with the mentioned order,in time O ( m log ∆). We can then use binary search to locate the neighbors of a node during thesampling, which explains the O (log ∆) factor in the sampling.Starting from u-sampler we derive our second algorithm eps - u-sampler , which yields: Theorem 2.
The preprocessing of eps - u-sampler ( G, ε ) uses time k O ( k ) O (cid:16) ( ε ) k − n log n (cid:17) andspace O ( n ) . With high probability, thereafter eps - u-sampler ( G, ε ) draws k -graphlets independentlyand ε -uniformly at random from G in k O ( k ) O (cid:16) ( ε ) k − log ε (cid:17) expected time per sample. Note that the preprocessing time is independent of m , and the sampling time is independent of G . eps - u-sampler is fairly more complex than u-sampler . Loosely speaking, it sketches thegraph ordering of u-sampler , and then simulates its sampling phase over it. The obstacle withthe sketching is that we need to guarantee a bounded ratio between the degree of a node v and thatof all the nodes following it in the order, both in G and in the subgraph induced by these nodes(this is necessary for efficient sampling). In fact, the sketch guarantees this boundedness only for afraction of the nodes, but we can prove that the remaining fraction is irrelevant since it correspondsto a fraction O ( ε ) of all graphlets, and thus can be ignored. The difficulty in the sampling phase isthat just doing as in u-sampler fails because we do not have the sorted adjacency lists anymore.Thus, we have to approximate the behaviour of u-sampler by approximating the distribution ofthe choices it makes. Unfortunately, u-sampler samples edges from potentially very sparse cuts(with only one “good” edge out of ∆, for instance), so we cannot approximate such distributions point-wise up to an ε multiplicative error. We resort to a coarser approximation of the cuts, andby a coupling argument we show that the output of the two algorithms is identical with probability1 − ε , implying that eps - u-sampler is ε -uniform.Our second result is related to the “classic” graphlet random walk mentioned above and widelyemployed in the literature [5, 11, 12, 15, 17, 19, 23]. In this technique, one defines a graph G k wherethe nodes are the graphlets of G and two graphlets are adjacent if their intersection has size k − G k until reaching the stationary distribution, or moreprecisely, until the walk is within variation distance ε from it. Assuming G is connected, the walkcan be made ergodic via standard techniques. Finally, this ε -approximate distribution is turnedinto ε -uniform via rejection sampling. As said, the fundamental question with this approach is themixing time t ε ( G k ), that is, the number of steps for the walk to be within variation distance ε of thestationary distribution. The best bound known so far is from [1], t ( G k ) = k O ( k ) ˜ O (cid:0) t ( G ) (cid:0) ∆ δ (cid:1) k − (cid:1) .This still gives a mixing time of t ( G ) (cid:0) ∆ δ (cid:1) for sampling 2-graphlets, i.e., edges. In this work weprove the following bound: Theorem 3.
For all graphs G and all k ≥ the ε -mixing time of the lazy random walk over G k satisfies t ε ( G k ) = k O ( k ) O (cid:0) t ( G ) (cid:0) ∆ δ (cid:1) k − log nε (cid:1) . Ignoring k O ( k ) factors, this result improves the upper bound of [1] by a factor ( ∆ δ ) k − , and is only afactor k O ( k ) δ log nε away from their k O ( − k ) Ω( t ( G ) ∆ k − δ k ) lower bound. As a consequence, we obtainan algorithm mc-sampler yielding: Theorem 4. mc-sampler ( G, ε ) returns an ε -uniform k -graphlet from G in expected running time k O ( k ) O (cid:0) t ( G ) (cid:0) ∆ δ (cid:1) k − log nε (cid:1) . Note that this bound depends on ( ∆ δ ) k − , less than the ( ∆ δ ) k − in the mixing bound. The reasonis that, as observed in [15, 23], we can sample k -graphlets indirectly by sampling edges in G k − ;3ach edge is a pair of ( k − k -graphlet. By standard results, to achieve ε -uniformity over the edges of G k − , we just need to run the walk over it until mixing time. Wenote that recently [15] showed an ε -sampling algorithm with running time k O ( k ) O ((∆ log ε ) k − ),assuming sampling edges uniformly at random from G in time O (1), which requires a O ( n ) prepro-cessing of the graph (which we do not need). Our mixing time bound is obtained using a techniquecompletely different from [1]. We recursively relate the relaxation times of G k and G k − via spectralgaps, Dirichlet forms, and Markov chain comparison theorems. In this way we avoid Cheeger’sinequality which in [1] leaves a quadratic factor on the ground. As part of the proof, we relate therelaxation time of the walk on an arbitrary graph G and that on its line graph L ( G ), that is, thegraph of adjacencies between the edges of G , proving: Lemma 1. If | E ( G ) | ≥ , then τ ( L ( G )) ≤ δ τ ( G ) where L ( G ) is the line graph of G and τ ( · ) denotes the relaxation time of the lazy walk. Comparison with existing bounds.
The table below puts our bounds in perspective. Theuniform sampling bound for [7] is obtained by running the whole color-coding algorithm, includingthe preprocessing phase, for each single sample. The bound of [15] has preprocessing time O ( n )since the authors assume sampling edges uniformly at random in O (1). The bounds for ε -uniformsampling for [7] are explained in Appendix A. preprocessing time pr. space time per sample output[7] - - 2 O ( k ) O ( m ) + k O ( k ) uniform u-sampler O ( m log n ) O ( n ) k O ( k ) O (log n ) uniform[15] O ( n ) - k O ( k ) O (∆ k − (cid:0) log nε (cid:1) k − ) ε -uniform[7] 2 O ( k ) O ( m log ε ) + f ( ε, k ) 2 O ( k ) O ( n log ε ) k O ( k ) O (∆(log ε ) ) ε -uniform[7] 2 O ( k ) O ( m log ε ) + f ( ε, k ) 2 O ( k ) O ( m log ε ) k O ( k ) O ((log ε ) ) ε -uniform mc-sampler - - k O ( k ) O ( t ( G ) ( ∆ δ (cid:1) k − log nε ) ε -uniform eps - u-sampler k O ( k ) O (cid:16)(cid:0) ε (cid:1) k − n log n (cid:17) O ( n ) k O ( k ) O (cid:16)(cid:0) ε (cid:1) k − log ε (cid:17) ε -uniform Table 1: Summary of results. Here f ( ε, k ) = k O ( k ) O ( ε log ε ), see Appendix A. Preliminaries.
For each v ∈ V we let d v be the degree of v . We let ∆ = max v ∈ V d v and δ = max v ∈ V d v . We assume the following graph access model: for every v ∈ V in constanttime we can check d v , check the i -th neighbor of v , and check if { u, v } ∈ E . For the randomwalks we assume G is connected (this is standard and necessary for ergodicity). A k -graphlet g = ( V g , E g ) is a subgraph of G that is connected and induced (that is, E g = G [ V g ]). We considergraphlets as unlabelled; therefore g = ( V g , E g ) is identified by its nodes V g and we use g and V g interchangeably. We denote by V k the set of all k -graphlets of G . To measure the distancebetween two distributions π, σ over a finite domain X (e.g., X = V k ), we use the total variationdistance, tvd( π, σ ) = max A ⊆V { π ( A ) − σ ( A ) } . If π is the uniform distribution and tvd( π, σ ) ≤ ε ,then we say σ is ε -uniform. In this paper, “high probability” means that for any a > − / n a by increasing the cost by a multiplicative factor O ( a ). Aweighted graph is denoted as G = ( V , E , w ) where w : E → R + . For every u ∈ V the weight of u is w ( u ) = P e ∈E : u ∈ e w ( e ). The random walk on G is defined as follows. Consider the Markov chain4ith transition matrix P given by P ( u, v ) = w ( u,v ) w ( u ) . The walk is given by the lazy version of thischain, with an added loop of weight w ( u ) at each node, whose transition matrix is P = ( P + I ). Bystandard Markov chain theory, this chain is ergodic and converges to the limit distribution π given by π ( u ) = w ( u ) P v ∈ V w ( v ) . Given an ergodic Markov chain X = { X t } t ≥ with limit distribution π , if π t is thedistribution of X t , the ε -mixing time of X is t ε ( X ) = min { t : ∀ X ∈ X : ∀ t ≥ t : tvd( π t , π ) ≤ ε } .We write t ε ( G ) for the mixing time of the random walk on G as defined above. All our walks aretime-reversible. Therefore, if π ∗ = min u ∈V π ( u ), then ( τ −
1) log( / ε ) ≤ t ε ( X ) ≤ τ log( / επ ∗ ) where τ = / γ the relaxation time of the chain, where γ being the spectral gap of the transition matrix(see [13], Theorems 12.3 and 12.4). Appendix E gives the necessary background on Markov chains.The other appendices give the full proofs of all our theorems and the pseudocode of our algorithms. ε -uniform algorithm This section gives a walk-through of the ideas and proofs behind u-sampler and eps - u-sampler .We start with u-sampler which helps setting the stage. The pseudocode and the proof of itsguarantees (Theorem 1) are given in Appendix B.1 and B.2. Warm-up: rejection sampling.
To begin, consider a generic rejection sampling technique.First, we draw a random k -graphlet g from G according to some known distribution p . Second, withprobability p ∗ p ( g ) we accept g and output it. Here, p ∗ is (a lower bound on) the smallest probability ofany graphlet. For any fixed g , in any given trial, the probability that g is returned is p ( g ) p ∗ p ( g ) = p ∗ ,which is constant i.e. independent of g . Thus, the distribution of returned graphlets is uniform. Theproblem is that the expected number of trials before some g is accepted is potentially order of p ∗ .However, p ∗ is at least the number of graphlets in G , which in turn is at least k O ( − k ) ∆ k − . Thus,we could end up with doing k O ( − k ) ∆ k − trials. (This is effectively the obstacle in the random walkapproach, where the ∆ k − is paid by the mixing time). To avoid this situation we shall constructa distribution p that is already close to uniform, that is, where p ∗ p ( g ) ≥ k O ( − k ) for all g , so that theexpected number of trials is at most k O ( k ) . Two-step sampling with k O ( k ) rejection trials. Suppose we can partition the k -graphletsinto n buckets, B (1) , . . . , B ( n ) associated to the nodes of G , such that for every B ( v ) we can:1. obtain an estimate a v such that k O ( − k ) | B ( v ) | ≤ a v ≤ k O ( k ) | B ( v ) |
2. sample efficiently from a distribution p v over B ( v ) such that k O ( − k ) | B ( v ) | ≤ p v ( g ) ≤ k O ( k ) | B ( v ) | for all g
3. compute p v ( g ) efficiently for any g Then we can achieve uniform sampling with a two-stage rejection sampling. First, we choose abucket B ( v ) with probability a v / P a u , then we draw a graphlet g from B ( v ) according to p v , andfinally we accept g with probability k O ( − k ) a v p v ( g ) . The acceptance probability is constant over B ( v ), andsince k O ( − k ) ≤ a v p v ( g ) ≤ k O ( k ) , it is well defined and in k O ( − k ) . Thus, with k O ( k ) rejection trialsin expectation we can return a uniform sample. Bucketing by sorting G . The aforementioned bucketing can be obtained by sorting G inone pass. Consider indeed the order ≺ given by repeatedly removing the highest degree node. Astandard greedy algorithm does this in time O ( m ) and space O ( n ). Denote by G ( v ) the graphinduced by all u (cid:23) v (the transitive closure of v in ≺ ). We define the bucket B ( v ) as the set ofall graphlets of G ( v ) containing v . Clearly, this bucketing is a partition of all graphlets. Now let d ( v | G ( v )) be the degree of v in G ( v ). By construction, d ( v | G ( v )) is the maximum degree of G ( v )5s well. Therefore, by standard counting arguments, if B ( v ) = ∅ : d ( v | G ( v )) k − ( k − k − ≤ | B ( v ) | ≤ ( k − d ( v | G ( v )) k − (1)Therefore a v = d ( v | G ( v )) k − is our desired estimate of | B ( v ) | . (The case B ( v ) = ∅ can be checkedquickly with a graph search). We then let a = P v ∈ V : B ( v ) = ∅ a v , and p ( v ) = a v a for each v ∈ V . Notethat the crucial point here is that d ( v | G ( v )), the degree of v in G ( v ), is also the maximum degree of G ( v ). In eps - u-sampler we lose this property and we need to guarantee it approximately. Finallywe sort the adjacency lists of G according to ≺ in time O ( P v ∈ V d v log d v ) = O ( m log ∆). Sampling from B ( v ) . Now we want to sample a graphlet in B ( v ) from a distribution p v thatguarantees k O ( − k ) 1 | B ( v ) | ≤ p v ( g ) ≤ k O ( k ) 1 | B ( v ) | for all g . Thanks again to the fact that d ( v | G ( v ))is the maximum degree of G ( v ), a straightforward random subset growing procedure does thetrick. We start with S = { v } . For each i = 1 , . . . , k −
1, we expand S i into S i +1 by choosingan edge uniformly at random in the cut between S i and G v \ S i . Since B = ∅ and every nodein G ( v ) has degree at most d ( v | G ( v )), the size ϕ i ( u ) of the cut between S i and G ( v ) \ S i satisfies i d ( v | G ( v )) ≤ ϕ i ( u ) ≤ i d ( v | G ( v )). Thus, the probability of any specific sequence S , S , . . . , S k is: k O ( − k ) d ( v | G ( v )) k − ≤ k − Y i =1 ϕ i ≤ k O ( k ) d ( v | G ( v )) k − (2)This is also, up to k O ( k ) factors, the probability of obtaining any specific graphlet g . Therefore thisprocedure yields the desired sampling distribution p v . The crucial point is once again the boundgiven by d ( v | G ( v )), which we lose in eps - u-sampler . Computing p v ( g ) and the role of sorted lists. Finally, we have to compute the probability p v ( g ), which we need in the rejection step. The procedure is essentially the same as for sampling, sowe describe it. To select the random edge in the cut of S i , for each u ∈ S i we compute the size ϕ i ( u )of the cut between u and G ( v ) \ S i . We can do this in time O (log ∆) using the sorted the adjacencylists of G , since the sub-list containing the neighbors of u in G ( v ) starts where v would sit in thelist, which we can find with a binary search. Then, we select u with probability proportional to ϕ i ( u ) and select an edge in its cut uniformly at random. This takes again time O (log ∆) via binarysearch. Thus we have our desired efficient procedure to compute p v ( g ) for every g . However, in eps - u-sampler the lists are not sorted (we do not have enough time), so we have to find anotherway for the sampling and the computation of p v ( g ). ε -uniform algorithm In this section we describe eps - u-sampler and sketch the proof of: Theorem 2.
The preprocessing of eps - u-sampler ( G, ε ) uses time k O ( k ) O (cid:16) ( ε ) k − n log n (cid:17) andspace O ( n ) . With high probability, thereafter eps - u-sampler ( G, ε ) draws k -graphlets independentlyand ε -uniformly at random from G in k O ( k ) O (cid:16) ( ε ) k − log ε (cid:17) expected time per sample. Appendix C gives the full proof and pseudocode. The algorithm has two main technical ingredients:1. a sketching routine that computes an ordering of G such that every G ( v ) has degrees balancedwithin ε k − k O ( − k ) factors and “loses” only a fraction O ( ε ) of graphlets (Lemma 2 below)2. a sampling routine that, exploiting the approximate degree balance, estimates the cut sizeswell enough to behave ε -closely to u-sampler conditional on accepting the graphlet6 lgorithm 1 eps - u-sampler ( G, ε ) (informal version) function preprocess ( ) compute ≺ and { a v } v ∈ V using bucket-sketch ( G, ε ) let p ( v ) = a v P u ∈ V a u for each v ∈ V function sample ( ) while true do draw a bucket B ( v ) from the distribution p g = eps - sample-subgraph (seed = v, tolerance = ε ) b p v ( g ) = eps - compute-p (graphlet = g, tolerance = ε ) with probability εp ( v ) c p v ( g ) P u ∈ V a u return g Note that this is not simply a matter of estimating some quantities (e.g., the cut sizes) by sampling.Consider for example the random cut-edge sampling described above. If for all u ∈ S i we had amultiplicative (1 ± ε ) estimate of the cut of u , then eps - u-sampler would behave essentially as u-sampler and we would be done. It is not hard to see that doing so requires examining Ω(∆)edges, since u might have degree ∆ but only one edge in the cut. A similar obstacle holds for theordering of G and for computing p v ( g ). Thus, we need a more refined approach. Sketching the graph order . Let us start with the graph ordering. Recall that we want anorder ≺ over V such that d ( v | G ( v )) is approximately the maximum degree in G ( v ). The trivialdegree ordering does not work. This can be seen when G is the union of a complete bipartite graph K ∆ ,δ and a clique on δ nodes. The degree ordering sets, in order, the left nodes L of K ∆ ,δ , theright nodes R of K ∆ ,δ , and the clique nodes C . Thus, R ≺ L ≺ C . Now for any v ∈ L we have d ( v | G ( v )) = 0, yet the first node u ∈ C has d ( u | G ( v )) = δ > d ( v | G ( v )). As a second try, we couldestimate d ( v | G ( v )) for each v by sampling the edges of v and checking how many of them fall after v in the order. If d ( v | G ( v )) is relatively large, say d ( v | G ( v )) ≥ εd v , then we keep bucket B ( v ), elsewe completely ignore it. This guarantees d ( v | G ( v )) ≥ ε max u ≻ v d ( u | G ( v )) whenever B ( v ) is kept.At the same time, by a counting argument, ( εd v ) d k − v = εd k − v if B ( v ) is ignored, yet v in G is partof roughly d k − v graphlets (the stars centered in it). Thus, the ignored buckets hold only a fraction ε of all graphlets and we are still fine. The problem is that now the cuts of different S i can differby a factor ε depending on the nodes of G ( v ) they contain. By equation (2), this means for twodifferent graphlets g, g ′ in B ( v ) we can have p v ( g ) p v ( g ′ ) ≃ ε k . This makes the rejection trials grow to( ε ) k .We prove that we can do better as follows. We take each node v ∈ V in descending degree orderand test how many edges v has towards the nodes below it in the current order; in other words weestimate d ( v | G ( v )). To this end we draw roughly ( kε ) k − log n neighbors of v uniformly at random.If enough neighbors are below v , then we estimate d ( v | G ( v )) = d v and v is not moved. Else, weestimate d ( v | G ( v )) ≃ ε k − d v and v is accordingly moved down to its “correct” position, updatingthe order. This is our sketching routine, bucket-sketch , see below. We prove: Lemma 2 (simplified version) . bucket-sketch ( G, ε ) runs in time k O ( k ) O (cid:0) ( / ε ) k − n log n (cid:1) , andwith high probability returns an order ≺ over V and estimates { a v } v ∈ V such that:1. if a v > , then d ( v | G ( v )) ≥ k O ( − k ) ε k − d ( u | G ( v )) for all u ≻ v
2. if a v > , then εk O ( − k ) | B ( v ) | ≤ a v ≤ ε k O ( k ) | B ( v ) | P v : a v =0 | B ( v ) | ≤ ε P v ∈ V | B ( v ) | lgorithm 2 bucket-sketch ( G, ε ) let η = ( εk ) k − k − and h = Θ( η − log n ) init s v = d v for all v ∈ V ⊲ any-time upper bound on d ( v | G ( v )) init ≺ = the order over V by nonincreasing s v ⊲ ≺ will respect s v at any time for each v in V in nonincreasing order of degree do ⊲ each v is processed only once sample h neighbors x , . . . , x h of v u.a.r. let X = P hj =1 I { x j ≻ v } if X ≥ ηh then ⊲ d ( v | G ( v )) & ε k − d v let a v = ( d v ) k − else ⊲ d ( v | G ( v )) ≪ ε k − d v let a v = 0 and s v = 3 η d v update ≺ according to s v ⊲ mode v to its correct position for each v : d v ≤ k/η, a v = 0 do check if B ( v ) = ∅ via BFS in G ( v ) if so then compute d ( v | G ( v )) and let a v = d ( v | G ( v )) k − return the order ≺ and the estimates { a v } v ∈ V where B ( · ) , G ( · ) are meant as induced by the returned order ≺ . Properties (1.) and (2.) guarantee the desired balance in the degrees of G ( v ) and in the bucketsize estimates (we lose only a factor ε against u-sampler ). These two properties hold for bucketswith a v > a v = 0 do not guarantee them, because d ( v | G ( v )) ≪ ε k − d v so we could not estimate d ( v | G ( v )) accurately. Property (3.) however says we can ignore thesebuckets since they account for an ε fraction of the whole graphlet distribution. Note that we canprecompute the distribution p over V using time and space O ( n ) by the alias method, so thatdrawing a bucket takes constant time [22]. Coupling with the uniform sampling.
We now move to sampling, with the followingstrategy. First, we imagine running the sampling phase of u-sampler on our sketched preprocessingoutput ( ≺ , { a v } v ∈ V ). So for simplicity we can imagine the adjacency lists of G are sorted. First, wechoose a bucket B ( v ) with probability p ( v ). Now, starting with S = { v } we run the random edgesampling procedure of u-sampler above. Suppose we are drawing a random edge in the cut of S i .Property (1.) of Lemma 2 ensures that the cut of S i is between d ( v | G ( v )) and k O ( k ) ε − k − d ( v | G ( v )).This implies that for any two graphlets g, g ′ we have p v ( g ) ≤ ε p v ( g ′ ). Therefore we have lost onlya factor ε compared to the distribution used by u-sampler . Moreover, by property (2.) we loseanother ε in the distribution of the a v . It can be shown that this has the following consequence: Claim 1.
If we replace the sampling phase of eps - u-sampler with that of u-sampler , then theoutput graphlets are uniform over ∪ v : a v > B ( v ) , and the probability of accepting a graphlet is at least ε k O ( − k ) in any given trial. Therefore, in terms of rejection trials, the loss of efficiency due to the sketched preprocessing isessentially ε . It remains to (i) achieve ε -distance of the output from the uniform distribution over ∪ v : a v > B ( v ), and (ii) bound the time spent in a single trial. Achieving ε -uniformity. Recall the two key steps of sampling in u-sampler : sampling agraphlet g from B ( v ) according to p v , and computing the probability p v ( g ) (and then accepting8 with probability ∝ p v ( g ) ). For the first step, we cannot achieve p v anymore, or more precisely,without sorted adjacency lists we would need to examine all edges of S i , which could be ∆. We canhowever draw from a distribution δ -close to p v , for some small δ (unrelated to the minimum degreeof G ), by estimating the cut size ϕ i ( u ) of each u ∈ S i . The reason why we use a new parameter δ will be clear soon. We claim: Claim 2.
By sampling ≃ k ( δ ) ( ε ) k − log ε edges of u ∈ S i we obtain an approximation of ϕ i ( u ) within an additive factor δk ϕ i where ϕ i = P z ∈ S i ϕ i ( z ) . To see why this holds, recall property (1.) of Lemma 2. The cut ϕ i of S i satisfies d ( v | G ( v )) ≤ ϕ i ≤ ( ε ) k − d ( v | G ( v )). In addition, we can prove d u ≤ ( ε ) k − d v (this is shown in the full version ofLemma 2). Hence ϕ i ≥ ε k − d u . Thus every sampled edge of u has probability at least µ = ε k − ofbeing in the cut. For an (1 ± δk )-approximation we therefore need ( k δµ ) = k ( δ ) ( ε ) k − samples, asthe claim says. (The additional factor log ε is to guarantee this happens with probability 1 − poly( ε ),so we can absorb the approximation failure in our ε -uniformity bound). Now, since we have at most k nodes in S i , and for each of them we have a δk ϕ i -additive approximation, it follows that we candraw an edge δk -uniformly from the cut of S i . This implies a coupling. That is, we can assume u-sampler and eps - u-sampler choose the same edge in the cut of S i with probability at least1 − δk . By a union bound over all i , they choose the same graphlet with probability 1 − δ . Thus: Lemma 3 (simplified version) . After the preprocessing of eps - u-sampler , the distributions of thesubgraphs sampled by eps - u-sampler and u-sampler before rejection are δ -close. This gives a bound on the statistical distance between eps - u-sampler and u-sampler up tothe sampling step. We still have to perform the rejection and ensure the distribution conditional onaccepting the graphlet is ε -uniform. To this end recall from Claim 1 that the probability of accep-tance is ≃ ε . Thus, to ensure ε -uniformity of the accepted graphlets, we shall ensure ε -uniformityof the sampled graphlets (before the rejection trial happens). Hence, we run the approximatesampling above with δ = ε . This brings the number of samples to ≃ ( ε ) ( ε ) k − log ε .This essentially completes the running time of a single round of sampling. It remains to computethe probability p v ( g ) that the algorithm samples g . Note that this probability is not the same of u-sampler . However, we can show that we can obtain a (1 ± δ )-multiplicative estimate: Lemma 4 (simplified version) . In time k O ( k ) O (( δ ) ( ε ) k − log ε ) we can compute with probability − poly( ε ) an estimate b p v ( g ) ∈ (1 ± ε ) p v ( g ) of p v ( g ) . Now, we accept the graphlet with probability ∝ c p v ( g ) . By the lemma, this is δ -close to the probabil-ity that u-sampler accepts it. Once again we can couple the two algorithms so that (conditionallyon having sampled the same graphlet) they agree on acceptance with probability 1 − δ . Finally, weset δ = ε . We get the same running time as for the sampling, ( ε ) ( ε ) k − log ε .We can now conclude the sampling. As said, each trial accepts the sampled graphlet with prob-ability at least ε k O ( − k ) . Thus, we need k O ( k ) ( ε ) trials in expectation. Each trial has (expected)running time k O ( k ) O (( ε ) ( ε ) k − log ε ). This gives a total expected sampling time of: k O ( k ) (cid:16) ε (cid:17) · k O ( k ) O (cid:16)(cid:16) ε (cid:17) (cid:16) ε (cid:17) k − log 1 ε (cid:17) = k O ( k ) O (cid:16)(cid:16) ε (cid:17) (cid:16) ε (cid:17) k − log 1 ε (cid:17) (3)This completes the overview of eps - u-sampler . For complete formal proofs of our claims seeAppendix C. 9 Overview of the MCMC results
We sketch the proofs of our results for the random walk graphlet sampling technique (full proofsin Appendix D). Recall that our input graph G is assumed to be connected (otherwise the walk isnot ergodic). For all k ≥
2, the graph G k = ( V k , E k ), usually called the k -state graph or k -graphletgraph of G , is defined as follows. V k is the set of all k -node induced connected subgraphs of G , and E k contains an edge for every pair g, g ′ ∈ V k such that g ∩ g ∈ V k − . It is not hard to see that G k is connected. Recall that t ε denotes the mixing time of the lazy walk on a graph, and recall: Theorem 3 (shortened version) . t ε ( G k ) = k O ( k ) O (cid:0) t ( G ) (cid:0) ∆ δ (cid:1) k − log nε (cid:1) . To prove the theorem, first recall that t ε ( G k ) is within O (log nε ) factors of the relaxation time τ ( G k )of the walk (see the preliminaries). Therefore the heart of the proof deals with bounding τ ( G k ),and in particular proving: Lemma 5. τ ( G k ) = O (cid:0) τ ( G k − ) poly( k ) ∆ δ (cid:1) . The main difficulty in proving Lemma 5 is comparing random walks over completely different graphs;for example, a coupling does not work. In [1], this is done by relating directly the conductance of G k and G via an analysis of the cuts. Instead, here we use the line graph L ( G k − ), that is, the graphof adjacencies between the edges of G k − , for simplicity denoted L k − . Consider indeed any two( k − h, h ′ ∈ G k − that are adjacent. Clearly, g = h ∪ h ′ is a k -graphlet of G and thus anode of G k . On the other hand, { h, h ′ } is an edge in G k − and thus a node of L k − . In fact, G k canbe obtained from L k − by collapsing together all states corresponding to the same graphlet g in G k , and rescaling edge weights by poly( k ) factors. By standard Markov chain comparison results,this implies τ ( G k ) = O (poly( k ) τ ( L k − )). This is the first step of the proof. To obtain Lemma 5, itremains to show that τ ( L k − ) = O ( τ ( G k − ) ∆ δ ). This is the second step of the proof. More precisely,in two steps we prove: τ ( G k ) ≤ poly( k ) · τ ( L k − ) ≤ poly( k ) · δ τ ( G k − ) (4) Intermediate result: relaxation time of the line graph.
For the second step of the proof,as anticipated we show the following result which might be of independent interest:
Lemma 1 (shortened version) . Any G with | V ( G ) | ≥ satisfies τ ( L ( G )) ≤ δ τ ( G ) . The idea is to simulate the walk on L ( G ) over the 1-subdivision of G . This is the graph S obtainedby replacing each edge of G with two consecutive edges, where the middle node represents theoriginal edge. The relaxation time of S is essentially the same of G . We then alter the weightsof the edges of S (originally unweighted, that is with unitary weights) by factors in [ δ, ∆]. Wecan show that the resulting random walk, observed over the states representing the original edges,is exactly the random walk on L ( G ). Since the weights are in [1 , ∆ / δ ], by standard Markov chaincomparison results τ ( L ( G )) = O ( ∆ δ τ ( S )), and the hidden constant is at most 20.10 eferences [1] Matteo Agostini, Marco Bressan, and Shahrzad Haddadan. Mixing time bounds for graphletrandom walks. Information Processing Letters , 152:105851, 2019.[2] David Aldous and James Fill.
Reversible Markov chains and random walks on graphs . 1995.[3] N. Alon, P. Dao, I. Hajirasouliha, F. Hormozdiari, and S. C. Sahinalp. Biomolecular networkmotif counting and discovery by color coding.
Bioinformatics , 24(13):i241–249, Jul 2008.[4] Noga Alon, Raphael Yuster, and Uri Zwick. Color-coding.
J. ACM , 42(4):844–856, 1995.[5] Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan.Guise: Uniform sampling of graphlets for large graph analysis. In
Proc. of IEEE ICDM 2012 ,pages 91–100, 2012.[6] Anthony Bonato, David F Gleich, Myunghwan Kim, Dieter Mitsche, Pawe l Pra lat, YanhuaTian, and Stephen J Young. Dimensionality of social networks using motifs and eigenvalues.
PloS one , 9(9):e106052, 2014.[7] Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi.Counting graphlets: Space vs time. In
Proc. of ACM WSDM , pages 557–566, 2017.[8] Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi.Motif counting beyond five nodes.
ACM TKDD , 12(4), 2018.[9] Marco Bressan, Stefano Leucci, and Alessandro Panconesi. Motivo: Fast motif counting viasuccinct color coding and adaptive sampling.
Proc. VLDB Endow. , 12(11):16511663, July2019.[10] Jin Chen, Wynne Hsu, Mong Li Lee, and See-Kiong Ng. Nemofinder: Dissecting genome-wideprotein-protein interactions with meso-scale network motifs. In
Proc. of ACM KDD , KDD 06,page 106115, 2006.[11] Xiaowei Chen, Yongkun Li, Pinghui Wang, and John C. S. Lui. A general framework forestimating graphlet statistics via random walk.
Proc. VLDB Endow. , 10(3):253–264, November2016.[12] Guyue Han and Harish Sethu. Waddling random walk: Fast and accurate mining of motifstatistics in large graphs. In
Proc. of IEEE ICDM , pages 181–190, 2016.[13] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer.
Markov Chains and Mixing Times .American Mathematical Society, 2009.[14] Pan Li, Hoang Dau, Gregory Puleo, and Olgica Milenkovic. Motif clustering and overlappingclustering for social network analysis. In
Proc. of IEEE INFOCOM , pages 1–9, 2017.[15] Ryuta Matsuno and Aristides Gionis. Improved mixing time for k-subgraph sampling. In
Proc.of SIAM SDM , pages 568–576, 2020.[16] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs:Simple building blocks of complex networks.
Science , 298(5594):824–827, 2002.1117] Kirill Paramonov, Dmitry Shemetov, and James Sharpnack. Estimating graphlet statistics vialifting. In
Proc. of ACM KDD , page 587595, 2019.[18] Nataˇsa Prˇzulj. Biological network comparison using graphlet degree distribution.
Bioinfor-matics , 23(2):e177–e183, 2007.[19] Tanay Kumar Saha and Mohammad Al Hasan. Finding network motifs using mcmc sampling.In
Proc. of CompleNet , pages 13–24. Springer International Publishing, 2015.[20] Charalampos E. Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. Scalable motif-aware graph clustering. In
Proc. of WWW , page 14511460, 2017.[21] Johan Ugander, Lars Backstrom, and Jon Kleinberg. Subgraph frequencies: Mapping theempirical and extremal geography of large graph collections. In
Proc. of WWW , pages 1307–1318, 2013.[22] M. D. Vose. A linear algorithm for generating random numbers with a given distribution.
IEEE Transactions on Software Engineering , 17(9):972–975, Sep 1991.[23] Pinghui Wang, John C. S. Lui, Bruno Ribeiro, Don Towsley, Junzhou Zhao, and XiaohongGuan. Efficiently estimating motif statistics of large networks.
ACM TKDD , 9(2):8:1–8:27,2014. A ε -uniform sampling via color coding We show how to use the color coding algorithm of [7] in a black-box fashion to perform ε -uniformsampling from G . The overhead in the running time and space is 2 O ( k ) O (log ε ), and the overheadin the sampling time is 2 O ( k ) O ((log ε ) ).First, we perform ℓ = O ( e k log ε ) independent colorings of G and run the preprocessing phase ofthe algorithm of [7] on each of them, storing the resulting count table (this is a table produced by thedynamic program of the algorithm). This gives a O ( e k log ε ) overhead for the preprocessing phasein terms of both time and space. In a single run, any given graphlet g has probability k k k ! ≥ e − k ofbecoming colorful, i.e., of being colored with k distinct colors. Thus, with O ( e k log ε ) independentruns, graphlet of G is colorful with probability 1 − poly( ε ) in at least one run and thus appears inthe respective count table Now, as shown in [7], for each run i = 1 , . . . , ℓ , with O ( k O ( k ) ε ) samplesone can estimate the total number of graphlets N i detected in that run within a multiplicative(1 ± ε ) factor. (This requires estimating the average number of spanning trees per-graphlet in thatrun, and multiply by the number of detected trees, which is known). In time O ( k O ( k ) ε log ε ), we doso for all runs with probability 1 − poly( ε ). Now, we choose randomly a run i ∈ [ ℓ ] with probabilityproportional to (the estimate of) N i . Then, we draw a graphlet from that run uniformly at randomusing the algorithm. This yields a graphlet uniformly at random from the union of all runs. Thus,the probability that a specific graphlet g is sampled is proportional to the number of runs where g appears. This is just the number of colorings ℓ ( g ) in which g is colorful, which we can computeby looking at the colors assigned to g by each one of the ℓ runs, in ℓ = O ( e k log ε ) time. Then, weaccept g with probability ℓ . Therefore we need at most ℓ = O ( e k log ε ) trials in expectation before agraphlet is accepted. This gives an overhead of ( e k log ε ) in the sampling phase. This constructioncan be derandomized using an ( n, k )-family of perfect hash functions of size ℓ = 2 O ( k ) log n , see [4].This derandomization would increase the time and space of the preprocessing by a factor log n , butwe would still need to estimate the number of graphlets in each run, so the final distribution wouldstill be non-uniform. 12 Appendix for Section 2
Lemma 6.
Let G = ( V, E ) be any graph, and for any v ∈ V let N v be the number of k -graphletsof G containing v . If N v > , then: N v ≥ ( d v ) k − ( k − k − = ( d v ) k − k −O ( k ) (5) Moreover, if d u ≤ ∆ for all u ∈ G , then: N v ≤ ( k − k − = (∆) k − k O ( k ) (6) Proof.
For the lower bound, if d v ≤ k − ( d v ) k − ( k − k − ≤
1, so if N v ≥ N v ≥ ( d v ) k − ( k − k − . Ifinstead d v > k −
1, then N v ≥ (cid:0) d v k − (cid:1) since any set of nodes formed by v and k − N v ≥ (cid:0) d v k − (cid:1) ≥ ( d v ) k − ( k − k − since (cid:0) ab (cid:1) ≥ a b b b for all a ≥ b ∈ [ a ].For the upper bound, note that we can construct a connected subgraph on k nodes containing v by starting with S = { v } and at every step i = 1 , . . . , k − S i in G \ S i .Since each u ∈ G has degree at most ∆, then S i has at most i ∆ neighbors. Thus the total numberof choices is at most Q k − i =1 i ∆ = ( k − k − . B.1 Pseudocode
We give our uniform algorithm u-sampler ( G ) and the routines used by it. Algorithm 3 u-sampler ( G ) ( ≺ , { a v } v ∈ V ) = preprocess ( G ) let a = P v ∈ V a v let p ( v ) = a v a for each v ∈ V let β k ( G ) = k ! a function sample ( ) while true do draw v from the distribution p S = sample-subgraph ( G, v ) p ( S ) = compute-p ( G, S ) with probability β k ( G ) p ( v ) p ( S ) return S Algorithm 4 preprocess ( G ) compute ≺ by repeatedly removing the highest-degree node from G for each v ∈ V do sort the adjacency list of v according to ≺ compute d ( v | G ( v )) check if B ( v ) = ∅ via BFS in G ( v ) if B ( v ) = ∅ then let a v = d ( v | G ( v )) k − else let a v = 0 return the order ≺ and the estimates { a v } v ∈ V lgorithm 5 sample-subgraph ( G, v ) S = { v } for i = 1 , . . . , k − do for u ∈ S i do ϕ i ( u ) = d ( u | G ( v )) − (degree of u in G [ S i ]) ⊲ cut between u and G ( v ) \ S i draw u with probability ϕ i ( u ) P z ∈ Si ϕ i ( z ) draw u ′ u.a.r. from the neighbors of u in G ( v ) \ S i S i +1 = S i ∪ { u ′ } return S k Algorithm 6 compute-p ( G, S = { v, u , . . . , u k } ) p = 0 for each permutation σ = ( σ , . . . , σ k ) of v, u , . . . , u k such that σ = v do p σ = 1 for each i = 1 , . . . , k − do S i = { σ , . . . , σ i } d ( σ i +1 | S i ∪ σ i +1 ) = number of neighbors of σ i +1 in S i ϕ i ( u ) = d ( u | G ( v )) − (degree of u in G [ S i ]) ⊲ cut between u and G ( v ) \ S i p σ = p σ · d ( σ i +1 | S i ∪ σ i +1 ) ϕ i p = p + p σ return p B.2 Proof of Theorem 1
We go through a series of lemmas.
Lemma 7. preprocess ( G ) runs in time O ( m log ∆) . The output order ≺ satisfies d ( v | G ( v )) ≥ d ( u | G ( v )) for all v ≺ u . The output estimates a v > satisfy a v | B ( v ) | ∈ (cid:2) k −O ( k ) , k O ( k ) (cid:3) .Proof. For the running time, the greedy removal can be implemented in time O ( m ) by bucketingthe nodes by degree. For each bucket we process each node in turn by updating its neighbors(moving them to the bucket ahead) if they have not been processed yet. This takes time O ( m ) intotal. Sorting the adjacency lists takes time P v ∈ V O ( d v log( d v )) = O ( m log ∆). The BFS takestime O ( k log ∆) since, for a generic node u , its adjacency list for G v is just the tail of the adjacencylist of u formed by the positions (cid:23) v , which can be found in time O (log ∆). The claim on the a v follows by Lemma 6, since in G ( v ) node v has degree d ( v | G ( v )) and as noted above this themaximum of G ( v ) as well. Lemma 8. sample-subgraph ( G, v ) runs in time O ( k log ∆) . For any g = G [ S ] ∈ B ( v ) , theprobability p ( S ) that sample-subgraph ( G, v ) returns S is in (cid:2) k O ( − k ) , k O ( k ) (cid:3) · d ( v | G ( v )) − ( k − .Proof. Running time.
We show that one run of the outer cycle takes time O ( k log ∆). Comput-ing ϕ i ( u ) takes time O ( k log ∆). This holds since in O (log ∆) we locate the position of v in theadjacency list of u , which subtracted from d u yields d ( u | G ( v )); and for each u ′ ∈ S i in O (log ∆)we check whether u ′ ∼ u . Thus the cycle over u ∈ S i takes O ( k log ∆) in total. Drawing u takes O ( k ). Finally, drawing u ′ takes O ( k ) as well. To see this, note that if u had no neighbors in S i then we could just draw a node from the last d ( u | G ( v )) elements of the adjacency list of u . If u has14eighbors in S i , we still know the (at most k ) disjoint sublists of the adjacency lists containing theneighbors in the cut. Thus we can draw a uniform integer j ∈ [ ϕ i ( u )] and select the j -th neighborof u in G v \ S i in time O ( k ). Probability.
Consider any S such that g = G [ S ] ∈ B ( v ). By construction S is a k -node subsetsuch that v ∈ S and G [ S ] is connected. We compute an upper bound and a lower bound on theprobability p ( S ) that the algorithm returns S . For the upper bound, there are at most ( k − S . Let v, u , . . . , u k be any suchsequence and consider the generic subset S i . Let ϕ i ( u ) be the size of the cut between u ∈ S i and G v \ S i , and let ϕ i = P u ∈ S i ϕ i ( u ). By construction, S i +1 is obtained by adding to S i the endpoint u i +1 of an edge chosen uniformly at random in the cut between S i and G v \ S i . Thus, for any u ′ ∈ G v \ S i , P ( u i +1 = u ′ ) = d ( u ′ | S i ∪ u ′ ) ϕ i where d ( u ′ | S i ∪ u ′ ) is the number of neighbors of u ′ in S i .Now, on the one hand d ( u ′ | S i ∪ u ′ ) ≤ i . On the other hand: ϕ i ≥ d ( v | G ( v )) i (7)To see this, note that ϕ i ≥ i = 1 , . . . , k − G [ S k ] is connected. Also, ϕ = d ( v | G ( v ))since S = { v } . Now, if ϕ ≤ i then ϕ i ≤ ϕ i ≥ ϕ i . If instead ϕ ≥ i , since the degreeof v in S i is at most i −
1, then the cut of S i still contains at least ϕ − | S i | + 1 ≥ ϕ − ( i −
1) edges.Therefore ϕ i ≥ ϕ − ( i − ≥ ϕ − ( i − ϕ i = ϕ i . It follows that P ( u i +1 = u ′ ) = d ( u ′ | S i ∪ u ′ ) ϕ i ≤ i d ( v | G ( v )) .Thus the probability that sample-subgraph ( G, v ) follows the particular sequence v, u , . . . , u k is at most Q k − i =1 i d ( v | G ( v )) = k O ( k ) d ( v | G ( v )) − ( k − . Since there are at most ( k − S , then p ( S ) = k O ( k ) d ( v | G ( v )) − ( k − . For the lower bound, since G [ S ] is connected thenat least one sequence v, u , . . . , u k exists such that ϕ i ≥ i = 1 , . . . , k −
1. However, ϕ i ≤ i d ( v | G ( v )) since d ( v | G ( v )) is the maximum degree of G ( v ). So the sequence has probabilityat least Q k − i =1 1 i d ( v | G ( v )) − = k O ( − k ) d ( v | G ( v )) − ( k − . Lemma 9. compute-p ( G, S = { v, u , . . . , u k } ) runs in time k O ( k ) O (log ∆) and outputs the prob-ability p ( S ) that sample-subgraph ( G, v ) returns S .Proof. The proof is essentially the same of Lemma 8.
Lemma 10. sample ( ) has expected running time k O ( k ) O (log ∆) and returns graphlets uniformlyat random from G .Proof. First, let us check that β k ( G ) p ( v ) p ( S ) ≤ v and all S , so that the probability of rejectionin u-sampler ( G ) is well-defined. We have: β k ( G ) p ( v ) p ( S ) = 1 k ! a aa v p ( S ) = 1 k ! a v p ( S ) (8)By Lemma 9, p ( S ) ≥ k ! ( d ( v | G ( v ))) − ( k − = k ! a v , hence the expression is bounded by 1 as desired.Conditioned on drawing S , the probability that the algorithm stops is β k ( G ) p ( v ) p ( S ) . By Lemma 8, p ( S ) = k O ( − k ) 1 a v so β k ( G ) p ( v ) p ( S ) = k O ( − k ) . Therefore each invocation of sample ( ) terminates within k O ( k ) rounds in expectation. Finally, by Lemma 8 and Lemma 9 each round takes k O ( k ) O (log ∆),thus the expected total running time is k O ( k ) O (log ∆) too. The uniformity of the samples followsby construction: the algorithm draws S from B ( v ) with probability p ( v ) p ( S ) and conditioned onthis event it outputs S with probability β k ( G ) p ( v ) p ( S ) . Thus, for any given execution of the loop, theprobability that S is returned is β k ( G ) which is constant over B ( v ).15 Appendix for Section 2.1
C.1 Pseudocode
We give the pseudocode of our ε -uniform algorithm eps - u-sampler ( G, ε ). The routines invokedby the algorithm are provided below. Finally we provide eps - u-sampler - full ( G, ε ) which we usein the analysis by coupling it with eps - u-sampler ( G, ε ). Algorithm 7 eps - u-sampler ( G, ε ) let ε = ε = ε
32 ( k !) (3 k ) k let η as in bucket-sketch ( G, ε / ) ( ≺ , { a v } v ∈ V ) = bucket-sketch ( G, ε / ) let a = P v ∈ V a v let p ( v ) = a v a for each v ∈ V let β k ( G ) = ε / a k !(3 k ) k function ε -sample ( ) while true do draw v from the distribution p S = eps - sample-subgraph ( G, v, ε , η ) b p ( S ) = eps - compute-p ( S, G, ε , η ) with probability min (cid:0) , β k ( G ) p ( v ) b p ( S ) (cid:1) return S Algorithm 8 bucket-sketch ( G, ε ) let η = ( εk ) k − k − and h = Θ( η − log n ) init s v = d v for all v ∈ V ⊲ any-time upper bound on d ( v | G ( v )) init ≺ the order over V by nonincreasing s v ⊲ s v > s u = ⇒ v ≺ u for each v in V in nonincreasing order of degree do ⊲ each v is processed only once sample h neighbors x , . . . , x h of v u.a.r. let X = P hj =1 I { x j ≻ v } if X ≥ ηh then let a v = ( d v ) k − else let a v = 0 and s v = 3 η d v move v to its position in ≺ according to s v ⊲ s v > s u = ⇒ v ≺ u for each v : d v ≤ k/η, a v = 0 do check if B ( v ) = ∅ via BFS in G ( v ) if so then compute d ( v | G ( v )) and let a v = d ( v | G ( v )) k − return the order ≺ and the estimates { a v } v ∈ V lgorithm 9 eps - sample-subgraph ( G, v, ε, η ) let h = Θ (cid:0) k log / ε ε η (cid:1) S = { v } for i = 1 , . . . , k − do for each u ∈ S i do sample h neighbors x , . . . , x h of u i.i.d. u.a.r. let X = P hj =1 I { x j ≻ v ∧ x j / ∈ S i } if X ≥ h εη k then let b ϕ i ( u ) = d u h X else let b ϕ i ( u ) = 0 draw u with probability c ϕ i ( u ) P z ∈ Si c ϕ i ( z ) (if all b ϕ i ( u ) = 0 then fail ) repeat draw u ′ u.a.r. from the adjacency list of u until u ′ ≻ v ∧ u ′ / ∈ S i let S i +1 = S i ∪ { u ′ } return S k Algorithm 10 eps - compute-p ( G, S = { v, u , . . . , u k } , ε, η ) for each u ∈ S do sample h = Θ( k ε η log / ε ) neighbors x u, , . . . , x u,h of u i.i.d. u.a.r. b p = 0 for each permutation σ = ( σ , . . . , σ k ) of v, u , . . . , u k such that σ = v do b p σ = 1 for each i = 1 , . . . , k − do S i = { σ , . . . , σ i } c i = number of neighbors of σ i +1 in S i b ϕ i = P u ∈ S i d u h P hj =1 I { x u,j ≻ v ∧ x u,j / ∈ S i } ⊲ cut size estimate b p σ = b p σ · c i c ϕ i b p = b p + b p σ return b p Algorithm 11 eps - u-sampler - full ( G, ε ) let η as in bucket-sketch ( G, ε ) ( ≺ , { a v } v ∈ v ) = bucket-sketch ( G, ε ) let a = P v ∈ V a v let p ( v ) = a v a for each v ∈ V let β k ( G ) = εa k !(3 k ) k sort the adjacency lists of G according to ≺ function sample ( ) while true do draw v from the distribution p S = sample-subgraph ( G, v ) p ( S ) = compute-p ( G, S ) with probability β k ( G ) p ( v ) p ( S ) return S else continue17 .2 Proofs for bucket-sketch ( G, ε ) In what follows, η = ( εk ) k − k − as defined in bucket-sketch . Lemma 11.
With high probability the output of bucket-sketch ( G, ε ) satisfies:1. if v ≺ u , then d v ≥ η d u
2. if a v > , then d ( v | G ( v )) ≥ ηk d v ≥ ηk · d ( u | G ( v )) for all u ≻ v
3. if a v > then a v | B ( v ) | ∈ h εk O ( k ) , k O ( k ) ε i P v : a v =0 | B ( v ) | ≤ ε P v ∈ V | B ( v ) | Proof.
First, we prove a set of observations that are used repeatedly in the proof. We also need toestablish some notation to describe the algorithm along the rounds. We denote by: • t = 1 , . . . , n the generic round of the first loop • ≺ t the order ≺ at the start of round t • G t ( v ) = G [ { u (cid:23) t v } ] the subgraph induced by v and the nodes below it at time t • d ( u | G t ( v )) the degree of u in G t ( v ) • t v the round where v is processed • s ∗ v the final value of s v (note that 3 ηd v ≤ s ∗ v ≤ s v = d v ) • X j = I { x j ≻ v } and X = P hj =1 X j , in a generic roundWe denote by ≺ n the output order (formally it would be ≺ n +1 but clearly this equals ≺ n ), and by G n ( · ) the subgraphs induced in G under the order ≺ n . By a v we always mean the value at outputtime unless otherwise specified. Observation 1. If u ≺ t u v then G n ( v ) ⊆ G t u ( u ) and d ( u | G n ( v )) ≤ d ( u | G t u ( u )) .Proof. Note that no node preceding u at t u is ever moved past v after round t u . Therefore G n ( v )contains a subset of the nodes { z (cid:23) t u u } . But by definition G t u ( u ) = G [ { z (cid:23) t u u } ]. Thus G n ( v ) ⊆ G t u ( u ). By monotonicity of d ( u |· ) it follows that d ( u | G n ( v )) ≤ d ( u | G t u ( u )). Observation 2.
For all v and all t ≥ t v we have d ( v | G t v ( v )) ≥ d ( v | G t ( v )) , with equality if a v > .Proof. By definition, G t v ( v ) = G [ { u (cid:23) t v v } ]. Thus, every u ≺ t v v is not in G t v ( v ). Any such u however has been processed before v , and cannot be moved past v after round t v . Hence G t v ( v ) ⊇ G t ( v ) for all t ≥ t v . The claim then follows by monotonicity of d ( v |· ), and by noting that if a v > v is not moved by the algorithm and thus G t ( v ) = G t v ( v ) for all t ≥ t v . Observation 3.
At each round, conditioned on past events, with high probability | X − E X (cid:12)(cid:12) ≤ ηh .Proof. Consider round t v . Conditioned on past events, the X j are independent binary randomvariables. Therefore by Hoeffding’s inequality: P ( | X − E X | > hη ) < e − hη = e − Θ(log n ) (9)regardless of E X and since h = Θ( η − log n ). Observation 4.
With high probability, d ( v | G t ( v )) ≤ s ∗ v for every v at any time in any round t . roof. If s v is never updated, then by definition s ∗ v = s v = d v and obviously d v ≥ d ( v | G t ( v )). Sup-pose instead s v is updated at round t v , so s ∗ v = 3 ηd v . By Observation 2, d ( v | G t ( v )) ≤ d ( v | G t v ( v )).Now we show that with high probability d ( v | G t v ( v )) ≤ ηd v . Suppose indeed d ( v | G t v ( v )) > ηd v .Note that E X j = d ( v | G tv ( v )) d v for all j . Therefore, if d ( v | G t v ( v )) > ηd v , then E X > ηh . Now, thealgorithm updates s v only if X < ηh . This implies the event X < E X − ηh , which by Observation 3fails with high probability. Thus with high probability d ( v | G t v ( v )) ≤ ηd v . Observation 5.
If round t v of the first loop sets a v > then w.h.p. d ( v | G t v ( v )) ≥ ηd v , else w.h.p. d ( v | G t v ( v )) ≤ ηd v . If the second loop sets a v > then d ( v | G t v ( v )) > ηk d v deterministically.Proof. The first claim has the same proof of Observation 4: if d ( v | G t v ( v )) < ηd v then E X < ηh , so a v > X ≥ ηh and thus X > E X + ηh . Similarly, if d ( v | G t v ( v )) > ηd v then E X > ηh , soletting a v = 0 implies X < ηh which means X < E X − ηh . Both events fail with high probabilityby Observation 3. The second claim holds deterministically since, in the second loop, d v < kη andif a v > B ( v ) = ∅ which implies d ( v | G t v ( v )) ≥ Observation 6.
With high probability, for all v , for all u ≻ n v we have d ( u | G n ( v )) ≤ s ∗ v .Proof. When u is processed in round t u , either v ≺ t u u or u ≺ t u v . If v ≺ t u u , then v wasprocessed before u . Thus at the beginning of the round s ∗ v ≥ s u , since we assumed v ≺ t u u . Butat the beginning of the round s u = d u since s u has not been updated yet. Thus s ∗ v ≥ d u , andclearly d u ≥ d ( u | G t ( v )) for all t and in particular for t = n . Therefore s ∗ v ≥ d ( u | G n ( v )). Supposeinstead u ≺ t u v . Then by Observation 1 d ( u | G n ( v )) ≤ d ( u | G t u ( u )), and by Observation 4 w.h.p. d ( u | G t u ( u )) ≤ s ∗ u . But since by hypothesis u ≻ n v , then s ∗ u ≤ s ∗ v . Therefore d ( u | G n ( v )) ≤ s ∗ v .We can finally prove Lemma 2. Proof of 1 ( if v ≺ n u , then d v ≥ η d u )Simply note that d v ≥ s ∗ v ≥ s ∗ u ≥ ηd u , where s ∗ v ≥ s ∗ u holds by construction of the order ≺ n . Proof of 2 ( if a v > , then d ( v | G n ( v )) ≥ ηk d v ≥ ηk · d ( u | G n ( v )) for all u ≻ n v )Since a v >
0, by Observation 2 d ( v | G n ( v )) = d ( v | G t v ( v )) and by Observation 5 d ( v | G t v ( v )) ≥ ηk d v .Thus d ( v | G n ( v )) ≥ ηk d v and the first claim is proven. The second claim follows from the first one byproving that d v ≥ d ( u | G n ( v )). If d v ≥ d u then obviously d v ≥ d ( u | G n ( v )) as well. Suppose instead d v < d u . Then u is processed before v and so u ≺ t u v , so by Observation 4 w.h.p. s ∗ v ≥ d ( u | G n ( v )).But d v ≥ s ∗ v , so d v ≥ d ( u | G n ( v )). Proof of 3 ( if a v > then a v | B ( v ) | ∈ (cid:2) εk O ( k ) , k O ( k ) ε (cid:3) )First, we observe that if a v > | B ( v ) | ≥
1. Indeed, if d v < k / η then v is processedin the second loop, which deterministically sets a v > | B ( v ) | ≥
1. If instead d v ≥ k / η ,then v is processed only in the first loop, and: d ( v | G n ( v )) = d ( v | G t v ( v )) ≥ ηd v ≥ η kη = k (10)where since a v > d ( v | G n ( v )) ≥ k then G n ( v ) contains a k -star centered in v and so | B ( v ) | ≥ | B ( v ) | ≥
1. To lighten the notation define d ∗ v = d ( v | G n ( v )) and∆ ∗ v = max u ∈ G ( v ) d ( u | G n ( v )). Lemma 6 applied to G n ( v ) yields: k −O ( k ) ( d ∗ v ) k − ≤ | B ( v ) | ≤ k O ( k ) (∆ ∗ v ) k − (11)We now show that w.h.p.: εk −O ( k ) (∆ ∗ v ) k − ≤ a v ≤ ε k O ( k ) ( d ∗ v ) k − (12)which implies our claim. For the upper bound, note that by construction a v ≤ ( d v ) k − and that bypoint (2) of this lemma d v ≤ ( kη ) d ∗ v . Substituting η we obtain: a v ≤ ( d v ) k − ≤ ( d ∗ v ) k − (cid:16) kη (cid:17) k − = 1 ε k O ( k ) ( d ∗ v ) k − (13)For the lower bound, note that since a v > a v ≥ ( d ∗ v ) k − regardless of where a v is set.But point (2) of this lemma gives d ∗ v ≥ d ( v | G n ( v )) ≥ ηk · d ( u | G n ( v )) for all u ≻ n v . Thus ∆ ∗ v =max u ∈ G ( v ) d ( u | G n ( v )) ≤ kη d ∗ v . Therefore:(∆ ∗ v ) k − ≤ (cid:16) kη d ∗ v (cid:17) k − = εk O ( k ) ( d ∗ v ) k − ≤ εk O ( k ) a v (14)which completes the claim. Proof of 4 ( P v : a v =0 | B ( v ) | ≤ ε P v ∈ V | B ( v ) | )Consider any v such that a v = 0. First, observe that the only nonzero terms in the left summationhave d v ≥ kη . ndeed, if | B ( v ) | > d v < kη , then the second loop of the algorithm would detect | B ( v ) | > a v >
0; thus v does not appear in the summation.Now we prove the bound by charging every graphlet in G to at most ε graphlets in ∪ v : a v =0 B ( v ).On the one hand, if d v ≥ kη then d v > k −
1, so in G there are at least (cid:0) d v k − (cid:1) stars formed by v and k − / k -th of each star to v (each star can be charged to at most k nodes).Therefore v contributes at least k (cid:0) d v k − (cid:1) to the right summation. On the other hand, by Observation 4and Observation 6, the maximum degree of G v is at most s ∗ v , which equals 3 ηd v since we know a v = 0was set in the first loop. Lemma 6 then gives | B ( v ) | ≤ ( k − s ∗ v ) k − = ( k − η ) k − ( d v ) k − .Thus, v contributes at most ( k − η ) k − ( d v ) k − to the left summation. By taking the ratio,using the fact that (cid:0) d v k − (cid:1) ≥ ( d v ) k − ( k − k − , and substituting η , we obtain: P v : a v =0 | B ( v ) | P v | B ( v ) | ≤ ( k − η ) k − ( d v ) k − k ( d v ) k − ( k − k − < (3 η ) k − k ( k − k − = ε (15)The proof is complete. Lemma 11. bucket-sketch ( G, ε ) runs in time k O ( k ) O (cid:0) ( / ε ) k − n log n (cid:1) .Proof. The initialisation takes time O ( n log n ) because of the sorting. In the first loop, each iterationtakes time O ( h ) = O ( η − log n ) for the sampling and the summation. In the second loop, eachiteration takes time O ( k / η ). To see this, note that the BFS visits the edges of at most k − G n ( v ). Now, if u ∈ G n ( v ) then s ∗ u ≤ s ∗ v ≤ d v < k / η . On the other hand s ∗ u ≥ ηd u .It follows that 3 ηd u < k / η , that is, d u ≤ k / η . Therefore the total cost is at most O ( k / η ). Thetotal running time is therefore dominated by O ( η − n log n ). Replacing η = ( ε / k ) k − k − yieldsthe bound of the lemma. 20 .3 Proofs for eps-sample-subgraph ( G, v, ε, η ) Lemma 3.
Suppose after running bucket-sketch ( G, ε ) the high-probability claims of Lemma 2hold. If we run sample-subgraph ( G, v ) and eps - sample-subgraph ( G, v, ε, η ) with ε ≤ ε , thetwo output distributions are ε -close.Proof. Consider first the execution of bucket-sketch ( G, ε ). We let η = ( / ε ) k − k −O ( k ) bethe variable defined in there, and similarly η be the one defined in a hypothetical execution of bucket-sketch ( G, ε ). Since ε ≤ ε then obviously η ≤ η .Now we consider sample-subgraph ( G, v ) and eps - sample-subgraph ( G, v, ε, η ), on the same G and with the same preprocessing output ( ≺ , { a v } v ∈ V ) of bucket-sketch ( G, ε ). Let q ( · ) bethe distribution of the output of eps - sample-subgraph ( G, v, ε, η ) and p ( · ) the distribution of theoutput of sample-subgraph ( G, v ). In the same way we denote by q and p the distributions overthe nodes and edges used by the two algorithm (which one applies is clear from the context). Theproof has two parts. First we show that, at each step i , the distribution q of the edge chosen bythe algorithm is εk − -close to p . Then, by a coupling argument and a union bound we show p and q are ε -close over S k . To obtain ε -closeness, we show ε / -closeness conditioned on some event thathas probability 1 − poly( ε ). Instead of ε / we use just ε , and one can compensate by increasing h by constant factors.Let us start with the distribution of the edges. Suppose eps - sample-subgraph is processingnode u ∈ S i at line 4. We denote by ϕ i ( u ) the cut size of u , and by ϕ i = P u ∈ S i ϕ i ( u ) the total cutsize of S i (both cuts meant towards G ( v ) \ S i ). Recall from the proof of Lemma 8 that: ϕ i ≥ ϕ i (16)Let us go back to the algorithm and the generic round when u ∈ S i is processed. For each j let X j = I { x j ≻ v ∧ x j / ∈ S i } , so that X = P hj =1 X j . Clearly, E [ d u h X ] = ϕ i ( u ). By Hoeffding’sinequality, for any δ > P (cid:18)(cid:12)(cid:12)(cid:12) d u h X − E d u h X (cid:12)(cid:12)(cid:12) ≥ δϕ (cid:19) = P (cid:18) | X − E X | ≥ δϕ hd u (cid:19) ≤ (cid:18) − δ ϕ hd u (cid:19) (17)Now recall points (1) and (2) of Lemma 2, and note that ϕ = d ( v | G ( v )). The lemma then saysthat ϕ ≥ η k d v and d v ≥ η k d u for all u ∈ G ( v ). Therefore d u ≤ ( k / η ) ϕ , and since η ≤ η , also d u ≤ ( k / η ) ϕ . Now we set: δ = ε k (18)which gives:2 exp (cid:18) − δ ϕ hd u (cid:19) ≤ (cid:18) − ε / k ) ϕ h ( k / η ) ϕ (cid:19) ≤ (cid:18) − ε η h k (cid:19) (19)Since h = Θ (cid:0) k ε η log / ε (cid:1) , we obtain exp( − Θ(log / ε )) = ( / ε ) Θ(1) . Therefore (cid:12)(cid:12) d u h X − E d u h X (cid:12)(cid:12) < δϕ for all u ∈ S i with probability 1 − poly( ε ).Now, we want to bound the deviation of the estimate b ϕ i ( u ). If eps - sample-subgraph sets b ϕ i ( u ) = d u h X , then the concentration result above automatically implies | b ϕ i ( u ) − ϕ i ( u ) | ≤ δϕ .Suppose instead eps - sample-subgraph sets b ϕ i ( u ) = 0. If ϕ i ( u ) ≤ δϕ then obviously | b ϕ i ( u ) − i ( u ) | ≤ δϕ and we are again fine. Thus, the only remaining bad event is that b ϕ i ( u ) = 0 when ϕ i ( u ) > δϕ and. Recall from above that E X = h ϕ i ( u ) d u and d u ≤ ( k / η ) ϕ . We have: E X = h ϕ i ( u ) d u > h δϕ d u ≥ h δ ( η / k ) d u d u = h ε k η k = h εη k (20)Now b ϕ i ( u ) = 0 only if X < h εη k , i.e., X < E X − t with t = h εη k . By Hoeffding’s inequality: P (cid:18) X < E X − h εη k (cid:19) < exp − h εη k ) h ! = exp (cid:18) − h ε η k (cid:19) (21)Since h = Θ (cid:0) k ε η log / ε (cid:1) , the probability drops again to poly( ε ). Hence with probability 1 − poly( ε )we have | b ϕ i ( u ) − ϕ i ( u ) | ≤ δϕ for all u ∈ S i .Let us now analyse the distribution of the edges. Suppose the algorithm is going to expand S i .Let q ( { u, u ′ } ) be the probability that edge { u, u ′ } is selected. We show that q ( {· , ·} ) is ε / k -uniformover the cut between S i and G v \ S i . Let q ( u ) denote the probability that the algorithm selects u ∈ S i at line 8, and let q ( u ′ | u ) = q ( { u, u ′ }| u ) be the probability that u ′ is chosen at line 10 giventhat u is chosen at line 8. Note that q ( u ′ | u ) = ϕ i ( u ) , since once u is selected the algorithm keepsdrawing edges incident to u uniformly until one in the cut is found. Let C be any subset of the cutbetween S i and G v \ S i . Note that tvd( p, q ) ≤ max C ( q ( C ) − p ( C )), since the family of all possible C is exactly the space of events. Let C u be the subset of edges of C containing u . Then: q ( C ) − p ( C ) = X u ∈ S i ( q ( C u ) − p ( C u )) (22)= X u ∈ S i (cid:16) X { u,u ′ }∈ C u q ( u ) q ( { u, u ′ }| u ) − X { u,u ′ }∈ C u p ( u ) p ( { u, u ′ }| u ) (cid:17) (23)= X u ∈ S i (cid:16) X { u,u ′ }∈ C u q ( u ) 1 ϕ i ( u ) − X { u,u ′ }∈ C u p ( u ) 1 ϕ i ( u ) (cid:17) (24)= X u ∈ S i ϕ i ( u ) X { u,u ′ }∈ C u ( q ( u ) − p ( u )) (25)= X u ∈ S i | C u | ϕ i ( u ) ( q ( u ) − p ( u )) (26)= X u ∈ S i | C u | ϕ i ( u ) (cid:16) b ϕ i ( u ) P z ∈ S i b ϕ i ( z ) − ϕ i ( u ) P z ∈ S i ϕ i ( z ) (cid:17) (27) ≤ X u ∈ S i (cid:16) b ϕ i ( u ) P z ∈ S i b ϕ i ( z ) − ϕ i ( u ) P z ∈ S i ϕ i ( z ) (cid:17) (28)Now recall that | b ϕ i ( u ) − ϕ i ( u ) | ≤ δϕ for all u ∈ S i . Therefore P z ∈ S i b ϕ i ( z ) ≥ ( P z ∈ S i ϕ i ( z )) − kδϕ ≥ − kδ ) P z ∈ S i ϕ i ( z ). Moreover, kδ ≤ thus − kδ ≤
2. This gives: q ( C ) − p ( C ) ≤ X u ∈ S i (cid:16) ϕ i ( u ) + δϕ (1 − kδ ) P z ∈ S i ϕ i ( z ) − ϕ i ( u ) P z ∈ S i ϕ i ( z ) (cid:17) (29) ≤ P z ∈ S i ϕ i ( z ) X u ∈ S i (cid:16) (1 + 2 kδ ) (cid:0) ϕ i ( u ) + δϕ ) − ϕ i ( u ) (cid:17) (30)= 1 P z ∈ S i ϕ i ( z ) X u ∈ S i (cid:16) kδϕ i ( u ) + δ (1 + 2 kδ ) ϕ (cid:17) (31)= 2 kδ P u ∈ S i ϕ i ( u ) P z ∈ S i ϕ i ( z ) + δ (1 + 2 kδ ) ik ϕ P z ∈ S i ϕ i ( z ) (32) ≤ kδ (33) ≤ εk (34)Therefore for each i with probability 1 − poly( ε ) the distributions of the two algorithms for thechoice of the edge in the cut of S i satisfy tvd( q, p ) ≤ εk . By increasing h by a O (log k ) factor wecan take a union bound on all i so that with probability 1 − poly( ε ) the claim holds simultaneouslyfor all i .We can now show that the distributions of the two algorithms over their outputs satisfytvd( p, q ) ≤ ε . To this end we couple the two processes. Let S i and R i the random sets heldrespectively by eps - sample-subgraph ( G, v, ε, η ) and sample-subgraph ( G, v ) at step i . For ev-ery i = 1 , . . . , k − e S i and e R i be the edges selected by the two processes at step i , so that S i +1 = S i ∪ e S i and R i +1 = R i ∪ e R i . Clearly, at the beginning S = R = { v } . Now suppose S i = R i for some i ≥
1. Then, as shown above, the distribution of S i +1 and R i +1 are εk -close. Thusthere exists a coupling of the two processes such that P ( e S i = e R i ) ≤ εk for all i . This implies that P ( e S k = e R k ) ≤ P k − i =1 P ( e S i = e R i ) ≤ ε . Thus the output distributions satisfy tvd( q, p ) ≤ ε . Lemma 12. eps - sample-subgraph ( G, v, ε, η ) has expected running time k O (1) O (cid:0) ( ε η ) log / ε (cid:1) .Proof. At the beginning, the algorithm deterministically takes h samples for ≤ k times, spendingin total time O ( h poly( k )). We show this dominates the expected time of the trials at lines 9–11as well. Let T be the number of trials (the number of times lines 9–11 are executed). If u is thenode chosen at line 8, then the algorithm returns after d u ϕ i ( u ) trials in expectation. Clearly if u hasnonzero probability of being chosen then b ϕ i ( u ) >
0. Taking the expectation over the b ϕ i ( u ) yields: E T = X u ∈ S i P ( u chosen) E [ T | u chosen] ≤ X u ∈ S i P ( b ϕ i ( u ) > d u ϕ i ( u ) (35)Now, E X = h ϕ i ( u ) d u and b ϕ i ( u ) > X > h εη k . By Markov’s inequality: P ( X > < h ϕ i ( u ) d u h εη k = ϕ i ( u ) d u k εη (36)Therefore: P ( b ϕ i ( u ) > d u ϕ i ( u ) ≤ k εη = O ( h ) (37)Summing over all terms in E T and over all rounds shows that the expected number of trials overthe whole algorithm is O ( h poly( k )). Finally, note that each single trial takes time O ( k ); just checkif u ′ satisfies u ′ ≻ v and has no neighbors in S i . Substituting h = Θ (cid:0) k log / ε ε η (cid:1) yields the bound.23 .4 Proofs for eps-compute-p ( G, S, ε, η ) Lemma 13. eps - compute-p ( G, S, ε, η ) runs in deterministic time k O (1) O ( ε η log / ε ) .Proof. The running time is dominated by line 9, where we sum over at most kh terms. This isrepeated a total of at most k ! times (for each permutation on k − k − h = Θ( k ε η log / ε ) yields the bound. Lemma 4.
Suppose after running bucket-sketch ( G, ε ) the high-probability claims of Lemma 2hold. Consider any k -node set S reachable from v in G ( v ) . Then, for any ε ≤ ε , with probability − poly( ε ) the output b p ( S ) of eps - compute-p ( G, S, ε, η ) and the output p ( S ) of compute-p ( G, S )satisfy b p ( S ) ∈ (1 ± ε ) p ( S ) where η is the value bucket-sketch ( G, ε ) would use.Proof.
We compare eps - compute-p ( G, S, ε, η ) against compute-p ( G, S ) (Algorithm 6). We let η = ( / ε ) k − k −O ( k ) be the value used by bucket-sketch ( G, ε ). Clearly η ≤ η . Consider anysingle iteration at line 4. Let ϕ i denote the size of the cut between S i and G v \ S i , and let p σ bethe value computed by compute-p ( S, G ). This is the value b p σ would take if we had b ϕ i = ϕ i for all i . We now show that b ϕ i ∈ (1 ± ε k ) ϕ i for all i with probability 1 − poly( ε ). If this is the case, then: b p σ p σ = Q k − i =1 c i / b ϕ i Q k − i =1 c i /ϕ i = Q k − i =1 ϕ i Q k − i =1 b ϕ i = k − Y i =1 ϕ i b ϕ i ∈ (cid:16) ± ε k (cid:17) k − (38)This implies that b p σ ∈ (1 ± ε ) p σ . Indeed, on one side, (1 − ε k ) k − ≥ − ε k ( k − ≥ − ε . On theother side, for all r > < x < k − we have (1 − x ) r ≤ rx − ( r − x ; plugging in r = k − x = ε k yields (1 + ε k ) k − ≤ ε .Thus, we show that w.h.p. b ϕ i ∈ (1 ± ε k ) ϕ i . Consider again any single execution of line 4 andlet X u,j = I { x u,j ≻ v ∧ x u,j / ∈ S i } . Clearly E [ X u,j ] = ϕ i ( u ) d u , so E b ϕ i = ϕ i . Now let X = hd v b ϕ i . Recallfrom the proof of Lemma 3 that ϕ i ≥ ϕ k − and d u ≤ ( k / η ) ϕ ≤ ( k / η ) ϕ . This yields: E X = hd v ϕ i ≥ hd v ϕ k − > hd v η k d v = hη k (39)Therefore: P (cid:16) | b ϕ i − E b ϕ i | > ε k E b ϕ i (cid:17) = P (cid:16) | X − E X | > ε k E X (cid:17) < P (cid:16) | X − E X | > εhη k (cid:17) (40)By Hoeffding’s inequality: P (cid:16) | X − E X | > εhη k (cid:17) ≤ exp (cid:16) − εhη k ) h (cid:17) = exp (cid:16) − h ε η k (cid:17) (41)Since the algorithm uses h = Θ( k ε η log / ε ), the probability drops below − k / poly( ε ) . The algorithmuses the same random variables to estimate the cuts of at most 2 k subsets; a union bound on thosesubsets completes the proof. 24 .5 Proofs for eps-u-sampler ( G, ε ) Lemma 14.
The preprocessing phase of eps - u-sampler ( G, ε ) runs deterministically in time k O ( k ) O (cid:0) ( / ε ) n log n (cid:1) .Proof. The runtime of the preprocessing is dominated by bucket-sketch ( G, ε / ) and the boundfollows from Lemma 2. Lemma 15. In eps - u-sampler ( G, ε ) , suppose after running bucket-sketch ( G, ε / ) the high-probability claims of Lemma 2 hold. Then each invocation of ε - sample ( ) returns a graphlet inde-pendently and ε -uniformly at random from G .Proof. We will couple eps - u-sampler ( G, ε ) and eps - u-sampler - full ( G, ε ′ / ). With these inputs,the preprocessing phases of the two algorithms are identical (except that eps - u-sampler - full also sorts the adjacency lists), thus we can couple them to behave identically. In particular weassume they obtain the same order ≺ , the same bucket distribution p : V → [0 , β k ( G ). Now let H = v ∈ V : a v > H for heavy buckets). If the high-probability claims of Lemma 2 hold, then H contains a fraction (1 − ε / ) of all graphlets of G . ByLemma 17, sample of eps - u-sampler - full returns graphlets uniformly at random from H . Thusto prove the ε -uniformity of ε - sample , we show the distribution of its returned graphlets is ε / -close to that of sample in eps - u-sampler - full . By the triangle inequality this implies the claim.Formally, if U V and U H are the uniform distributions respectively over ∪ v ∈ V B ( v ) and P v ∈ H B ( v ),and q is the distribution given by ε - sample , then:tvd( q, U V ) ≤ tvd( q, U H ) + tvd( U H , U V ) ≤ ε / + ε / (42)where the bound on the first term is the one we will prove, and the bound on the second termfollows as said from Lemma 2 and Lemma 17.Let us then consider one single iteration of sample and ε - sample . We couple the two algorithmsalong the way. First, since the bucket distribution p is identical, we can couple them so they choosethe same bucket B ( v ) (lines 10 of sample and 10 of ε - sample ). From now on we consider v asfixed and all probability distributions are meant as conditioned on v being the chosen node. Wedenote by S P the random set of nodes drawn by sample at line 11, and by S Q the one drawn by ε - sample at line 11. Now, the two algorithms invoke respectively sample-subgraph ( G, v ) and eps - sample-subgraph ( G, v, ε , η ). Since ε ≤ ε , by Lemma 3 we have tvd( S P , S Q ) ≤ ε . Hencewe can couple the two algorithms so that P ( S Q = S P ) ≤ ε . Now let X P be the indicator randomvariable of the event that sample accepts S P and returns is (see line 13) and by X Q the indicatorrandom variable of the event that ε - sample accepts S Q and returns it (see line 13). We definethe outcome of sample as the pair ( S P , X P ), and that of ε - sample as the pair ( S Q , X Q ). Let D P and D Q be the distributions of ( S P , X P ) and ( S Q , X Q ) respectively. We want to show that the twodistributions are ε / -close conditioned on the graphlets being accepted, that is, we want:tvd( D P ( ·| X P = 1) , D Q ( ·| X Q = 1)) ≤ ε (43)Ignoring for a moment the k O ( − k ) factors, the strategy is the following. The probability that thealgorithms accept the sampled graphlet is Ω( ε ). Therefore, to make the conditional distributions O ( ε )-close, we can make the unconditional distributions O ( ε )-close. Obviously we have to take careof the fact that the two algorithms can disagree on both the sampled graphlet and its acceptance.Before continuing let X ∨ = max( X P , X Q ). This indicates the event that at least one of thealgorithms accepted its graphlet. Clearly P ( X ∨ = 1) ≥ P ( X P = 1). By Lemma 17, since eps - u-sampler - full is invoked with ε / , at any given iteration we have P ( X P = 1) ≥ ε k −O ( k ) . Now, by25he triangle inequality:tvd( D P ( ·| X P = 1) , D Q ( ·| X Q = 1)) ≤ tvd( D P ( ·| X P = 1) , D P ( ·| X ∨ = 1)) (44)+tvd( D P ( ·| X ∨ = 1) , D Q ( ·| X ∨ = 1))+tvd( D Q ( ·| X Q = 1) , D Q ( ·| X ∨ = 1))By the coupling, the middle term satisfies:tvd( D P ( ·| X ∨ = 1) , D Q ( ·| X ∨ = 1)) ≤ P ( S Q = S P | X ∨ = 1) ≤ P ( S Q = S P ) P ( X ∨ = 1) ≤ ε ε k O ( k ) (45)where we used the bounds on P ( S Q = S P ) and P ( X ∨ = 1) from above.We bound similarly the sum of the other two terms. For the first term note that:tvd( D P ( ·| X P = 1) , D P ( ·| X ∨ = 1)) ≤ P ( X P = 0 | X ∨ = 1) (46)This is true since D P ( ·| X P = 1) is just D P ( ·| X ∨ = 1) conditioned on X P = 1, an event which hasprobability 1 − P ( X P = 0 | X ∨ = 1). Symmetrically, for the last term tvd( D Q ( ·| X Q = 1) , D Q ( ·| X ∨ =1)) ≤ P ( X Q = 0 | X ∨ = 1). The sum of the two terms is therefore bounded by: P ( X P = 0 | X ∨ = 1) + P ( X Q = 0 | X ∨ = 1) = P ( X P = X Q | X ∨ = 1) ≤ P ( X P = X Q ) P ( X ∨ = 1) (47)The denominator is at least ε k −O ( k ) , see above. For the numerator, P ( X Q = X P ) ≤ P ( S Q = S P ) + P ( X Q = X P | S Q = S P ) (48)As said, P ( S Q = S P ) ≤ ε . For the second term, we again couple X Q and X P so that: P ( X Q = X P | S Q = S P ) = (cid:12)(cid:12) P ( X P = 1 | S Q = S P ) − P ( X Q = 1 | S Q = S P ) (cid:12)(cid:12) (49) ≤ (cid:12)(cid:12)(cid:12)(cid:12) − P ( X Q = 1 | S Q = S P ) P ( X P = 1 | S Q = S P ) (cid:12)(cid:12)(cid:12)(cid:12) (50)Now, eps - u-sampler ( G, ε ) and eps - u-sampler - full ( G, ε ′ / ) by construction have respectively P ( X P = 1 | S Q = S P ) = β k ( G ) p ( v ) p ( S ) and P ( X Q = 1 | S Q = S P ) = min (cid:0) , β k ( G ) p ( v ) b p ( S ) (cid:1) . Note that the mincan only bring the ratio above closer to 1, since P ( X P = 1 | S Q = S P ) ≤
1. Therefore: P ( X Q = X P | S Q = S P ) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − β k ( G ) p ( v ) p ( S ) min (cid:0) , β k ( G ) p ( v ) b p ( S ) (cid:1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) − p ( S ) b p ( S ) (cid:12)(cid:12)(cid:12) (51)Now, by Lemma 4, with probability 1 − poly( ε ) we have b p ( S ) ∈ [1 ± ε ] p ( S ). Conditioning on thisevent, we have | − p ( S ) b p ( S ) | ≤ ε − ε = O ( ε ). Otherwise, P ( X Q = X P | S Q = S P ) ≤
1. Thus: P ( X Q = X P | S Q = S P ) ≤ (1 − poly( ε )) O ( ε ) + poly( ε ) = O ( ε ) (52)which drops below ε by adjusting the constants. Applying these two bounds to the right-handside of (48), we obtain: P ( X Q = X P ) ≤ ε + ε (53)Therefore the sum of the first and third term of (44) is at most: ε ε k O ( k ) + ( ε + ε ) 1 ε k O ( k ) = ε + ε ε k O ( k ) (54)Since eps - u-sampler ( G, ε ) defines ε = ε = ε k −O ( k ) , the bound drops below ε / , as desired. Thisconcludes the proof. 26 emma 16. In eps - u-sampler ( G, ε ) , suppose after running bucket-sketch ( G, ε / ) the high-probability claims of Lemma 2 hold. Then every invocation of ε - sample ( ) has expected runningtime k O ( k ) O (cid:16) ( / ε ) k − log / ε (cid:17) .Proof. As shown above, P ( X Q = X P ) ≤ ε +2 ε which implies P ( X Q = 1) ≥ P ( X P = 1) − ( ε +2 ε ).Recall however that P ( X P = 1) = ε k −O ( k ) , while ε = ε = ε k −O ( k ) . Thus, scaling ε = ε by k −O ( k ) factors if needed, we can ensure that P ( X Q = 1) ≥ P ( X P = 1). The expected number ofrounds is then at most k O ( k ) ( / ε ) . Each round is dominated by eps - sample-subgraph ( G, v, ε , η )and eps - compute-p ( S, G, ε , η ), which by Lemma 3 and Lemma 4 have expected running time k O (1) O ( ε η log / ε ). Note that these bounds holds even conditioned on past events. Recall that ε = ε k −O ( k ) and η = ε k − k −O ( k ) , since this is the η defined by bucket-sketch ( G, ε / ). Multiplyingthe expected rounds bound by the per-round time, the total expected running time is at most k O ( k ) ( / ε ) · k O ( k ) O (cid:16) ( / ε ) k − log / ε (cid:17) = k O ( k ) O (cid:16) ( / ε ) k − log / ε (cid:17) (55)This concludes the proof. C.6 Proofs for eps-u-sampler-full ( G, ε ) Lemma 17. In eps - u-sampler - full ( G, ε ) , suppose after running bucket-sketch ( G, ε ) thehigh-probability claims of Lemma 2 hold. Then the following holds. First, at any given itera-tion sample ( ) accepts the sampled graphlet with probability ε k O ( − k ) . Second, sample ( ) returnsgraphlets uniformly at random from ∪ a v > B ( v ) .Proof. First, note that sample ( ) is the same of u-sampler . Therefore Lemma 10 implies the claimon the distribution of the returned graphlets, provided that the rejection probability is well-defined,that is, β k ( G ) p ( v ) p ( S ) ≤ p ( v ) > , p ( S ) >
0. By substituting β k ( G ) and p ( v ): β k ( G ) p ( v ) p ( S ) = ε k O ( − k ) a v p ( S ) ≤ ε k O ( − k ) d ( v | G ( v )) − ( k − p ( S ) (56)where we used the fact that when p ( v ) >
0, i.e. a v >
0, then a v ≥ d ( v | G ( v )) k − by constructionof bucket-sketch ( G, ε ). Thus we need p ( S ) ≥ ε k O ( − k ) d ( v | G ( v )) − ( k − . To this end we adaptthe lower bound on p ( S ) of Lemma 8. In particular, since G [ S ] is connected then at least onesequence v, u , . . . , u k exists such that ϕ i ≥ i = 1 , . . . , k −
1. Now, by Lemma 2, all u ≻ v satisfy d ( u | G ( v )) ≥ kη · d ( v | G ( v )), i.e., the degree of G ( v ) is bounded by kη d ( v | G ( v )). It follows that ϕ i ≤ i kη d ( v | G ( v )) for all i , so as desired the sequence probability is at least k − Y i =1 i ( k / η ) k − d ( v | G ( v )) − = εk O ( − k ) d ( v | G ( v )) − ( k − (57)Now we bound the acceptance probability from below. To this end recall the upper boundon p ( S ) of Lemma 8, which gives p ( S ) ≤ k O ( k ) d ( v | G ( v )) − ( k − regardless of the degrees of theother nodes in G ( v ). Now, since a v > d ( v | G ( v )) k − ≥ εk O ( − k ) ( d v ) k − , so p ( S ) ≤ k O ( k ) 1 ε d − ( k − v . On the other hand a v ≤ d ( v ) k − by construction of bucket-sketch ( G, ε ).Therefore: β k ( G ) p ( v ) p ( S ) = ε k O ( − k ) a v p ( S ) ≥ εk O ( − k ) d ( v ) k − k O ( k ) 1 ε d − ( k − v = ε k O ( − k ) (58)as claimed. 27 Proofs of the MCMC results
In this section we prove the results of Section 3: Theorem 3, Lemma 5, Lemma 1 (Appendix D.1)and Theorem 2 (Appendix D.4). We employ standard Markov chain results reported in Appendix E.
D.1 Proof of Theorem 3, Lemma 5, Lemma 1
As anticipated, Theorem 3 derives from Lemma 5. The derivation is as follows. First, a recursiveapplication of Lemma 5 yields τ ( G k ) = k O ( k ) O τ ( G ) (cid:18) ∆ δ (cid:19) k − ! (59)Now recall (93) from Appendix E. On the one hand we have τ ( G ) = O ( t ( G )). On the other hand,we have t ε ( G k ) = O ( τ ( G k ) log( επ ∗ ) where π ∗ = min g ∈V k π ( g ) is the smallest mass of any k -graphletin G k in the stationary distribution π . Therefore, t ε ( G k ) = k O ( k ) (cid:18) t ( G ) (cid:16) ∆ δ (cid:17) k − log 1 επ ∗ (cid:19) (60)Now π ( g ) = d g P g ′∈V k d g ′ where d g is the degree of g in G k . Since |V k | ≤ (cid:0) nk (cid:1) and each g ∈ G k hasdegree at most poly( n ), then π ∗ = Ω (cid:0) n ) (cid:1) and log επ ∗ = O (log nε ). The bound of Theorem 3follows.In the rest of the section we prove Lemma 5, that is, τ ( G k ) = O (cid:18) poly( k ) · ∆ δ · τ ( G k − ) (cid:19) (61)We denote a generic node of G k as g , and a generic node of G k − as u, v , or z (no confusion withthe nodes of G should arise). We always denote by d u the degree of u in the original simple graph(without self-loops or weights), and the same for d g . We let L = L ( G k − ) denote the line graph of G k − . The node set of L is V ( L ) = { x uv : { u, v } ∈ E ( G k − ), and its edges are all the pairs in theform { x uv , x uz } with u = z , representing edges { u, v } and { u, z } adjacent in G k − ). The lazy walkon L is defined by the following weighting: w ( x uv , x uz ) = 1 u = z (62) w ( x uv , x uv ) = d u + d v − d u is as usual the degree of u in G k − . See Figure 1 below. u zv d u d v d z G k − x uv x uz d u + d v − d u + d z − L ( G k − ) Figure 1: A path of 3 graphlets in G k − , and how it appears in L ( G k − ). Both graphs are weightedso to yield a lazy random walk. 28 .2 From G k − to L ( G k − ) The first step in the proof of Lemma 5 is proving Lemma 1, here recalled:
Lemma 1 (reformulated with G = G k ) . If |E k | ≥ , then: τ ( L ( G k − )) ≤
20 ∆ δ τ ( G k − ) (64) where τ ( · ) is the relaxation time of the lazy walk. To prove the result we build an auxiliary weighted graph S ′ and show that τ ( S ′ ) ≤ δ τ ( G k − )and τ ( L ( G k − )) ≤ τ ( S ′ ). To begin, let S be the 1-subdivision of G k − . This is the graph obtainedby replacing each edge { u, v } ∈ G k − with two consecutive edges { u, x uv } , { x uv , v } where x uv is anew node representing { u, v } . We make S lazy by adding loops and assigning the following weights: w S ( u, u ) = d u u ∈ G k − (65) w S ( u, x uv ) = 1 { u, v } ∈ G k − (66) w S ( x uv , x uv ) = 2 { u, v } ∈ G k − (67)The graph S ′ is the same as S but with the following weights: w S ′ ( u, u ) = d u u ∈ G k − (68) w S ′ ( u, x uv ) = d u { u, v } ∈ G k − (69) w S ′ ( x uv , x uv ) = d u + d v { u, v } ∈ G k − (70)The reader may refer to Figure 2 below. u vd u d v G k − u x uv vd u d v
21 1 S u x uv vd u d v d u + d v d u d v S ′ Figure 2:
Left: a pair of ( k − u, v forming an edge in G k − . Middle: how { u, v } appears in S ,the 1-subdivision of G . Right: the reweighting given by S ′ . We now show:
Lemma 18. τ ( S ′ ) ≤ δ τ ( G k − ) .Proof. First, note that min xy ∈S w S ( x,y ) w S′ ( x,y ) ≥ / ∆ and max x ∈S w S ( x ) w S′ ( x ) ≤ / δ . By Lemma 26 this implies γ ( S ′ ) ≥ δ ∆ γ ( S ), or equivalently τ ( S ′ ) ≤ ∆ δ τ ( S ). Thus, we need only to show that τ ( S ) ≤ τ ( G k − ),or equivalently, γ ( G k − ) ≤ γ ( S ). We do so by comparing the numerators and denominators of (95)in Lemma 25 for S and G k − .In the remainder of the proof we write G for G k − and uv for x uv . Consider the walk on S andlet π S be its stationary distribution. Let f S ∈ R V ( S ) be the choice of f that attains the minimumin (95) under π = π S . We will show that there exists f G ∈ R V ( G ) such that: E P G ,π G ( f G )Var π G ( f G ) ≤ E P S ,π S ( f S )Var π S ( f S ) (71)29y Lemma 25 this implies our claim, since the left-hand side of (71) bounds γ ( G ) from above andthe right-hand side equals 4 γ ( S ). Now, first, note that for all u ∈ G we have π S ( u ) = π G ( u ) forall u ∈ G (the weight of u is the same in G and S , but the total sum of weights in S is / that of G ). Similar calculations show that for all uv ∈ G we have π S ( uv ) = d u π G ( u ), where d u continuesto denote the original degree of u in G . Third, observe that since f S attains the minimum in (95)then necessarily f S ( uv ) = f S ( u )+ f S ( v )2 for all uv ∈ G . We define f G as the restriction of f S on V ( G ).Now we relate the numerator of (71) for S and for G . To begin, note that: E P S ,π S ( f S ) = X uv ∈G (cid:16) ( f S ( u ) − f S ( uv )) Q S ( u, uv ) + ( f S ( v ) − f S ( uv )) Q S ( u, uv ) (cid:17) (72)Observe that Q S ( u, uv ) = Q S ( v, uv ) = π S ( u ) d u , and as noted above, f S ( uv ) = f S ( u )+ f S ( v )2 , thus( f S ( u ) − f S ( uv )) = ( f S ( v ) − f S ( uv )) = ( f S ( u ) − f S ( v )). Recalling that π S ( u ) = π G ( u ), we have: E P S ,π S ( f S ) = 12 X uv ∈G (cid:16) f S ( u ) − f S ( v ) (cid:17) π S ( u ) 12 d u = 13 X uv ∈G (cid:16) f S ( u ) − f S ( v ) (cid:17) π G ( u ) 12 d u (73)On the other hand, since by construction f G ( u ) = f S ( u ) and since Q G ( u, v ) = π G ( u ) d u : E P G ,π G ( f G ) = X uv ∈G (cid:16) f G ( u ) − f G ( v ) (cid:17) Q G ( u, v ) = X uv ∈G (cid:16) f S ( u ) − f S ( v ) (cid:17) π G ( u ) 12 d u (74)Comparing (73) and (74) shows that E P G ,π G ( f G ) = 3 E P S ,π S ( f S ).We turn to the denominator of (71). First, we have:Var π S ( f S ) = X u ∈G π S ( u ) f S ( u ) + X uv ∈G π S ( uv ) f S ( uv ) (75)Since π S ( u ) = π G ( u ) and f S ( u ) = f G ( u ), the first term equals Var π G ( f G ). We now show thatthe second term equals Var π G ( f G ) as well. By convexity of the square, we have: X uv ∈G π S ( uv ) f S ( uv ) = X uv ∈G π S ( uv ) (cid:16) f S ( u ) + f S ( v )2 (cid:17) ≤ X uv ∈G π S ( uv ) 12 (cid:0) f S ( u ) + f S ( v ) (cid:1) (76)Each summand in the right-hand side charges π S ( uv ) f S ( u ) to u . Therefore, X uv ∈G π S ( uv ) f S ( uv ) = X u ∈G π S ( uv ) d u f S ( u ) = 23 X u ∈G π G ( u ) f S ( u ) (77)where the last equality comes from π S ( uv ) = d u π G ( u ). Since f S ( u ) = f S ( u ), the last term equalsagain Var π G ( f G ). Therefore Var π G ( f G ) ≥ Var π S ( f S ). By combining our two bounds, we obtain: E P G ,π G ( f G )Var π G ( f G ) ≤ E P S ,π S ( f S ) Var π S ( f S ) = 4 E P S ,π S ( f S )Var π S ( f S ) (78)which shows that γ ( G ) ≤ γ ( S ), completing the proof.We conclude by relating the relaxation time of S ′ to that of L ( G k − ). Formally: Lemma 19. If | V ( G k ) | > then τ ( L ( G k − )) ≤ τ ( S ′ ) . roof. Let X = { X t } t ≥ be the walk on S ′ , and let Y = X [ A ] be the chain induced by X on thesubset of states A = { x uv : { u, v } ∈ E ( G k − ) } (Definition 5). By Lemma 28, τ ( Y ) ≤ τ ( S ′ ). Nowwe want to prove that τ ( L ( G k − )) ≤ τ ( Y ). To this end, we show that Y is exactly the randomwalk on L if we just alter the weights of L as follows: w ′ L ( x uv , x uz ) = 1 u = z (79) w ′ L ( x uv , x uv ) = d u + d v + 2 (80)To see this, let us compute the transition probabilities of Y from the generic state x uv of S ′ . Thereader may refer to Figure 3 below. First, if Y t = x uv then we can assume X s = x uv for some s = s ( t ). From x uv , the two possible transitions are Y t +1 = x uv and Y t +1 = x uz for some z = v .The first transition, Y t +1 = x uv , requires exactly one of these three disjoint events occurs:1. X s +1 = x uv X s +1 = . . . = X s ′ − = u and X s ′ = x uv for some s ′ ≥ s + 23. the same as (2) but with v in place of u .The probability of (1) is by construction of the loop weights. The probability of (2) is the productof P ( X s +1 = u | X s = x uv ) = d u d u + d v ) times P ( X s ′ = x uv | X s ′ − = u ) = d u , since X leaves u withprobability 1, in which case it moves to x uv with probability d u . Thus, the probability of (2) is d u + d v ) , and by symmetry the same is for (3). Therefore: P ( Y t +1 = x uv | Y t = x uv ) = 12 + 1 d u + d v = d u + d v + 22( d u + d v ) (81)The second transition, Y t +1 = x ux , is the same as event (2) above, only with X s ′ = x uz instead of X s ′ = x uv . But conditioned on X s ′ − = u the two events have the same probability, therefore: P ( Y t +1 = x uz | Y t = x uv ) = 12( d u + d v ) (82)Thus the probabilities are proportional to 1 and d u + d v + 2, as the weighting w ′ L says.We can now conclude the proof. Consider the generic node x uv . If d u + d v = 2, then G k − is a single edge, and | V ( G k ) | = 1. Thus if | V ( G k ) | > d u + d v ≥
3. But then d u + d v +2 d u + d v − ≤ − = 5. Therefore w S ≤ w ′ L ≤ w S ′ , and Lemma 26 yields τ ( L ) ≤ τ ( S ′ ). x uz u x uv vd u d u d v d u + d v d u + d z d u d v d v S ′ x uv x uz d u + d v + 2 d u + d z + 21 L ′ Figure 3:
Left: the graph S ′ described above. Right: the reweighted line graph L ′ obtained by weightingthe loops of L ( G k − ) with ( d u + d v + 2) instead of ( d u + d v − S ′ onlyon the states x · , we see exactly the walk over L ′ . .3 From L ( G k − ) to G k We now prove that τ ( G k ) ≤ O k ( τ ( L ( G k − ))). To this end, we first obtain from L ( G k − ) a newweighted graph L N by repeatedly collapsing subsets of nodes. Standard results guarantee that τ ( L N ) ≤ τ ( L ( G k − )), and we will be left with proving that τ ( G k ) ≤ O k ( τ ( L N )).Let us begin the construction. For any g ∈ V ( G k ) let H ( g ) = { x uv ∈ V ( L ) : g = u ∪ v } . Notethat { H ( g ) } g ∈ V ( G k ) is a partition of V ( L ) into equivalence classes. Now let V ( G k ) = { g , . . . , g N } ,and let L = L ( G k − ). For each i = 1 , . . . , N we define L i by taking L i − and collapsing H ( g i ).Formally, we let L i = ( V ( L i ) , E ( L i ) , w i ), where V ( L i ) = V ( L i − ) \ H ( g i ) ∪ { a i } with a i being anew state representing H ( g i ), and: w i ( x, x ′ ) = w i − ( x, y ) x = a i , x ′ = a i (83) w i ( x, a i ) = X x ′ ∈ a i w i − ( x, x ′ ) x = a i (84) w i ( a i , a i ) = X x ∈ a i X x ′ ∈ a i w i − ( x, x ′ ) (85)Now consider the walk on L i . This is the collapsed version of the walk L i − with respectto the set of states A C = H ( g i ), see Definition 4. Therefore by Lemma 27 the spectral gaps ofthe two walks satisfy γ ( L i ) ≥ γ ( L i − ), and the relaxation times satisfy τ ( L i ) ≤ τ ( L i − ). Thus τ ( L N ) ≤ τ ( L ) = τ ( L ( G k − )). Now we prove: Lemma 20. τ ( G k ) ≤ O (poly( k ) τ ( L N )) .Proof. We show that the walk on L N is the lazy walk on G k up to a reweighting of the edges bymultiplicative factors O k (1). By Lemma 26 this implies the thesis. In particular we show that,if G k is taken in its lazy version (with loops accounting for half of the node weight), then (1) V ( L N ) = V ( G k ), (2) E ( L N ) = E ( G k ), (3) w N / w G k = Θ k (1). We denote the generic state a i ∈ L N simply as g , meaning that a i represents H ( g ). Proof that V ( L N ) = V ( G k ) . Suppose g ∈ V ( L N ). Then by construction g = u ∪ v for sometwo u, v that form an edge in G k − . This implies that g has k nodes and is connected, hence g is a k -graphlet and g ∈ V ( G k ). Conversely, suppose g ∈ V ( G k ). Since g is connected it has a spanningtree T (a subgraph of G ). Let a, b be two distinct leaves of T and let g ′ = g \ { a } and g ′′ = g \ { b } .Then g ′ , g ′′ are connected and have k − V ( G k − ), Moreover | g ′ ∩ g ′′ | = k − { g ′ , g ′′ } ∈ E ( G k − ). Thus { g ′ , g ′′ } ∈ L ( G k − ) and as a consequence g ∈ V ( L N ). Therefore V ( L N ) = V ( G k − ). Proof that E ( L N ) = E ( G k ) . First, both L N and the lazy version of G k have a loop at eachnode ( L N inherits from L a positive self-transition probability at each node). Consider then anon-loop edge { g ′ , g ′′ } ∈ E ( L N ). By construction of L N we must have g ′ = u ∪ v and g ′′ = u ∪ z ,with { u, v } , { u, z } ∈ E ( G k − ) and u, v, z ∈ V ( G k − ) all distinct. This implies g ′ ∩ g ′′ = u and so { g ′ , g ′′ } ∈ E ( G k ). It follows that E ( L N ) ⊆ E ( G k ). Consider now a non-loop edge { g ′ , g ′′ } ∈ E ( G k ).First, let u = g ′ ∩ g ′′ ; note that by hypothesis u is connected and | u | = k −
1, so u ∈ V ( G k − ). Nowlet { a ′ } = g ′ \ g ′′ and let b ′ be any neighbor of a in u . Choose any spanning tree T ′ of u rootedat b ′ , and let c ′ = b ′ be any leaf of T ′ (such a leaf must exist since | g | ≥ | u | ≥ v = g ′ \ { c ′ } . Note that by construction (1) v is connected and has size k −
1, (2) u ∩ v isconnected and has size k −
2, and (3) u ∪ v = g ′ . Therefore v ∈ V ( G k − ) and { u, v } ∈ E ( G k − ). Asymmetric construction using g ′′ and u yields z such that z ∈ V ( G k − ) and { u, z } ∈ E ( G k − ) and u ∪ z = g ′′ . Now, by construction, { u, z } and { u, v } give two adjacent states x uv , x uz ∈ V ( L ). Butsince u ∪ v = g ′ and u ∪ z = g ′′ , then x uv ∈ H ( g ) and x uz ∈ H ( g ′′ ). This implies that { g ′ , g ′′ } appears as an edge in E ( L N ). Therefore E ( G k ) ⊆ E ( L N ). We conclude that E ( G k ) = E ( L N ).32 roof that w N w G k = O (poly( k )) . Recall that every state a ∈ L N corresponds to a subset H ( g )in the partition of L . The weight w N ( a, a ′ ) of a non-loop edge in L N is exactly the size of the cutbetween the corresponding sets H ( g ) , H ( g ′ ) in L , where we know from above that a ∼ a ′ if andonly if g ∼ g ′ . Consider then a non-loop edge { g, g ′ } ∈ G k . By construction w G k ( g, g ′ ) = 1. Now,there are at most (cid:0) k (cid:1) distinct pairs of ( k − u, v ∈ G k − such that u ∪ v = g . Thus, H ( g ) ≤ (cid:0) k (cid:1) . The same holds for g ′ . Therefore the cut between H ( g ) and H ( g ′ ) in L ( G k − ) has sizebetween 1 and (cid:0) k (cid:1) . It follows that 1 ≤ w N ( a,a ′ ) w G k ( g,g ′ ) ≤ (cid:0) k (cid:1) where a, a ′ represent g, g ′ .A similar argument holds for the loops. First, recall that w G k ( g ) = d g by the lazy weighting.Consider then any non-loop edge { g, g ′ } ∈ G k . Note that { g, g ′ } determines u = g ∩ g ′ ∈ V ( G k − )univocally. Moreover, there exist some v, z ∈ G k − such that u ∪ v = g and u ∪ z = g ′ and that { x uv , x uz } is an edge in L ; and note that there are at most k distinct v and at most k distinct z satisfying these properties. Therefore, every { g, g ′ } can be mapped to a set of between 1 to k edges in L , such that every edge in the set is in the cut between H ( g ) and H ( g ′ ). Furthermore,note that different g ′ are mapped to disjoint sets, since any edge { x uv , x uz } identifies univocally g = u ∪ v and g ′ = u ∪ z . It follows that the cut of H ( g ) is at least d g and at most k d g . Since thecut has at least one edge, and H ( g ) has at most (cid:0) k (cid:1) internal edges, then the total weight of H ( g )is between 1 and poly( k ) times the cut. This is also w N ( a ), the weight of the state a representing g in L N . The claim follows by noting that by construction w N ( a ) ≤ w N ( a, a ) ≤ w N ( a ). D.4 ε -uniform sampling via random walks We give the proof behind mc-sampler , showing:
Theorem 4 (alternative version) . For any graph G , any k ≥ , and any ε > , one can sample k -graphlets independently and ε -uniformly from V k in k O ( k ) O (cid:0) t ε ( G ) (cid:0) ∆ δ (cid:1) k − log nε (cid:1) expected timeper sample.Proof of Theorem 4. Consider G k − = ( V k − , E k − ). By construction, every edge yields a k -graphlet g = u ∪ v . Recall from Lemma 22 the set T ( g ) and that T ( g ) ≤ k . Now suppose we draw froma O ( εk )-uniform distribution over E k − and accept the sampled edge { u, v } with probability T ( g ) .Since the acceptance probability is > k , we obtain an ε -uniform distribution of accepted graphlets.Now, By Lemma 21, we can achieve the O ( εk )-uniform distribution over E k − if we achieve an O ( εk )-uniform distribution over V k − . Thus we just need run the walk over G k − for t ε k ( G k − )steps where ε k = Θ( εk ). By Theorem 3, we have t ε k ( G k − ) = kok O (cid:0) t ε ( G ) (cid:0) ∆ δ (cid:1) k − log nε (cid:1) . (The k factor of ε k is absorbed by k O ( k ) ). Finally, by Lemma 23, each step takes O (poly( k )) time inexpectation. This completes the proof. Lemma 21.
Consider the lazy random walk { X t } t ≥ over G = ( V , E ) and let Y t = { X t , X t +1 } be therandom (unordered) edge crossed the walk traverses between t and t + 1 . Let π t be the distributionof X t over V and σ t the distribution of Y t over E , and let π be the stationary distribution of X . If π t is ε -close to π , then σ t is ε -uniform.Proof. Recall that π ( x ) ∝ d x for all x ∈ V . Let X ∗ be a random variable with distribution π ,and let Y ∗ = { X ∗ , Z } where Z is a neighbor of X ∗ chosen uniformly at random. Clearly Y ∗ has uniform distribution by construction. Since tvd( π t , π ) ≤ ε , we can couple X t and X ∗ so that P ( X t = X ∗ ) ≤ ε . If X t = X ∗ , we can then couple Y t and Y π by letting Y t = Y . This shows that P ( Y ∗ = Y t ) ≤ ε , thus Y t is ε -uniform. 33 emma 22. Consider the graphs G k = ( V k , E k ) and G k − = ( V k − , E k − ) . For every g ∈ G k define T ( g ) = {{ u, v } ∈ E k − : u ∪ v = g } . Then | T ( g ) | ≤ (cid:0) k (cid:1) , and given g we can compute | T ( g ) | in time O (poly( k )) .Proof. Every { u, v } ∈ T ( g ) satisfies: (i) u = g \ { x } and v = g \ { y } for some x, y ∈ g , and (ii) u , v ,and u ∩ v are connected. Thus given g we can just enumerate all (cid:0) k (cid:1) pairs of nodes in g and countwhich ones have u , v , and u ∩ v connected. This gives the bound on | T ( g ) | too. Lemma 23.
Any single step of the lazy walk over G k can be simulated in O (poly( k )) expected time.Proof. To decide whether to follow the loop we just toss a fair coin. Let us now see how totransition to a neighbouring graphlet uniformly at random. Let g = ( V g , E g ) ∈ V k be the currentnode of the walk and let N ( g ) be the neighbourhood of g . Note that N ( g ) can be partitioned as N ( g ) = ∪ x ∈ V g N x ( g ) where N x ( g ) = { g ′ ∼ g : x / ∈ g ′ } . Thus we first select x and then draw from N x ( g ). For each x ∈ V g , if g \ x is connected we proceed (otherwise N x ( g ) = ∅ by definition ofthe chain). For every y ∈ g \ x let ϕ g ( y ) be the degree of y in g , which we can compute in O ( k ).Now, each neighbor y ′ of y in V \ g gives a graphlet g ′ = g ∪ y ′ \ x adjacent to g . Thus we let ϕ ( y ) = d y − ϕ g ( y ), which is the degree of y in V \ g . Similarly let ϕ ( g \ x ) = P y ∈ g \ x ϕ ( y ); this isthe cut size between g \ x and V \ g . Finally, we let ϕ ( g ) = P x ∈ g ϕ ( g \ x ).Now, we select x ∈ g with probability ϕ ( g \ x ) ϕ ( g ) . Then, we select y ∈ g \ x with probability ϕ ( y ) ϕ ( g \ x ) .Finally, we select one of the ϕ ( y ) neighbors y ′ ∼ y in V \ g uniformly at random. To do this we justsample y ′ uniformly at random from the neighbors of y until y ′ / ∈ g . This requires ≤ k expectedtrials, since y has at most k − g and at least 1 neighbor in V \ g .Now consider any g ′ ∼ g . Note that G ′ is identified univocally by the pair ( x, y ′ ) where x = g \ g ′ and y ′ = g ′ \ g . The probability that the process above selects ( x, y ′ ) is: P ( x, y ′ ) = P ( x ) X y ∈ g \ xϕ x ( y ) > P ( y | x ) P ( y ′ | x, y ) (86)= ϕ ( g \ x ) ϕ ( g ) · X y ∈ g \ xϕ x ( y ) > ϕ ( y ) ϕ ( g \ x ) · I { y ′ ∼ y } ϕ ( y ) (87)= 1 ϕ g (cid:12)(cid:12)(cid:8) y ∈ g \ x : ϕ x ( y ) > , y ′ ∼ y (cid:9)(cid:12)(cid:12) (88)Once we draw y ′ , we compute α ( y ′ ) = (cid:12)(cid:12)(cid:8) y ∈ g \ x : ϕ x ( y ) > , y ′ ∼ y (cid:9)(cid:12)(cid:12) . Note that α ( y ′ ) can becomputed in time O (poly( k )), and that 1 ≤ α ( y ′ ) ≤ k −
1. Finally, we apply rejection, accepting y ′ with probability k − α ( y ′ ) ∈ [ k , x, y ′ ) is therefore uniform.The expected number of rejection trials is O ( k ), and so is the expected running time of the entireprocess. 34 Technical background for Markov chain analysis
We recall spectral results on Markov chains used in this work. As anticipated, we consider anergodic time-reversible Markov chain { X t } t ≥ over a finite state space V with transition matrix P . The chain is given by the simple random walk over a weighted graph G = ( V , E , w ). We let π t be the distribution of X t , and π = lim t →∞ π t be the unique limit distribution. We denote by π ∗ = min g ∈V π ( g ) the smallest mass of any state according to π . E.1 Mixing time, conductance, spectral gap
Given two distributions π, σ over some domain X , the total variation distance between π and σ is:tvd( π, σ ) = max A ⊆V { π ( A ) − σ ( A ) } (89)If π is the uniform distribution and tvd( π, σ ) ≤ ε , then we say σ is ε -uniform. The ε -mixing timeof the chain X is: t ε ( X ) = min { t : ∀ X ∈ V : ∀ t ≥ t : tvd( π t , π ) ≤ ε } (90)In this work we bound t ε ( X ) in two ways. The first is via the conductance of G . For any subsetof states U ⊆ V , the volume of U is vol( U ) = P u ∈ U w ( u ). The cut of U is C ( U ) = { e = { u, u ′ } ∈E : u ∈ U, u ′ ∈ V \ U } , and its weight is c ( U ) = P e ∈ C ( U ) w ( e ). Then: Definition 1 (Conductance) . The conductance of U is Φ( U ) = c ( U ) / vol( U ) . The conductance of G is Φ( G ) = min { Φ( U ) : U ⊂ V , vol( U ) ≤ vol( V ) } . A classical result (see e.g.[13]) states that:14Φ ≤ t ε ( X ) ≤ log (cid:16) επ ∗ (cid:17) (91)A second way to bound t ε is via spectral gaps and relaxation times: Definition 2 (Spectral gap) . The spectral gap of the chain X is γ = 1 − λ ∗ , where: λ ∗ = max n | λ | : λ is an eigenvalue of P, λ = 1 o (92) The relaxation time of the chain is τ = γ . Then (see again [13]):( τ −
1) log (cid:16) ε (cid:17) ≤ t ε ( X ) ≤ τ log (cid:16) επ ∗ (cid:17) (93) E.2 Dirichlet forms
For any two states x, y ∈ V we denote the transition rate of { x, y } by Q ( x, y ) = π ( x ) P ( x, y ). Forany function f : V → R we let Var π f = E π ( f − E π f ) . Definition 3 (Dirichlet form; see [13], § . Let P be a reversible transition matrix on a statespace V with stationary distribution π . Let f : V → R be any function. Then the Dirichlet formassociated to P, π, f is: E P,π ( f ) = 12 X x,y ∈V (cid:0) f ( x ) − f ( y ) (cid:1) Q ( x, y ) (94)35he Dirichlet form characterises the spectral gap as follows: Lemma 24 (see [13], Lemma 13.12) . The spectral gap satisfies: γ = min f ∈ R V Var π ( f ) =0 E P,π ( f )Var π ( f ) (95) E.3 Markov chain comparison theorems
We report several comparison theorems for spectral gaps of related/transformed chains.
Direct comparison.Lemma 25 ([13], Lemma 13.18) . Let P and ˜ P be reversible transition matrices with stationarydistributions π and ˜ π , respectively. If E ˜ P , ˜ π ( f ) ≤ α E P,π ( f ) for all functions f , then ˜ γ ≤ (cid:16) max x ∈V π ( x )˜ π ( x ) (cid:17) αγ (96) Lemma 26 ([2], Lemma 3.29) . Consider an undirected graph G = ( V , E ) possibly with loops. Let w and w ′ be two weightings over E and let γ and γ ′ be the spectral gaps of the corresponding randomwalks. Then: γ ′ ≥ γ · min e ∈E ( w ( e ) /w ′ ( e ))max v ∈V ( w ( v ) /w ′ ( v )) (97) Collapsed chains (see [2], § Let A ⊂ V and let A C = V \ A (note that A C = ∅ ). The collapsed chain X ∗ hasstate space A ∪ { a } where a is a new state representing A C , and transition matrix given by: P ∗ ( u, v ) = P ( u, v ) u, v ∈ A (98) P ∗ ( u, a ) = X v ∈ A C P ( u, v ) u ∈ A (99) P ∗ ( a, v ) = 1 π ( A C ) X u ∈ A C π ( u ) P ( u, v ) v ∈ A (100) P ∗ ( a, a ) = 1 π ( A C ) X u ∈ A C X v ∈ A C π ( u ) P ( u, v ) (101) Lemma 27.
The collapsed chain X ∗ satisfies γ ( X ∗ ) ≥ γ ( X ) . Induced chains (see [13], Theorem 13.20).Definition 5.
Given a reversible chain { X t } t ≥ on a state space V , consider an arbitrary nonempty A ⊆ V . Let τ + A = min { t ≥ X t ∈ A } (when X ∈ A this is the first return time of A ). Theinduced chain on A is the chain with state space A and transition probabilities: P A ( x, y ) = P ( X τ + A = y | X = x ) ∀ x, y ∈ A (102) Lemma 28.
Consider a reversible chain on V with stationary distribution π and spectral gap γ .Let A ⊆ V be nonempty and let γ A be the spectral gap for the chain induced on A . Then γ A ≥ γ ..