[PDF] Cut Sparsification of the Clique Beyond the Ramanujan Bound: A Separation of Cut Versus Spectral Sparsification

Abstract

We prove that a random d -regular graph, with high probability, is a cut sparsifier of the clique with approximation error at most (2 2 π − − √ + o n,d (1))/ d − − √ , where 2 2 π − − √ =1.595… and o n,d (1) denotes an error term that depends on n and d and goes to zero if we first take the limit n→∞ and then the limit d→∞ . This is established by analyzing linear-size cuts using techniques of Jagannath-Sen '17 derived from ideas from statistical physics and analyzing small cuts via martingale inequalities. We also prove that every spectral sparsifier of the clique having average degree d and a certain high "pseudo-girth" property has an approximation error that is at least the "Ramanujan bound" (2− o n,d (1))/ d − − √ , which is met by d -regular Ramanujan graphs, generalizing a lower bound of Srivastava-Trevisan '18. Together, these results imply a separation between spectral sparsification and cut sparsification. If G is a random logn -regular graph on n vertices, we show that, with high probability, G admits a (weighted subgraph) cut sparsifier of average degree d and approximation error at most (1.595…+ o n,d (1))/ d − − √ , while every (weighted subgraph) spectral sparsifier of G having average degree d has approximation error at least (2− o n,d (1))/ d − − √ .

Full PDF

aa r X i v : . [ c s . D S ] A ug Cut Sparsiﬁcation of the Clique Beyond the Ramanujan Bound

Antares ChenUniversity of Chicago Jonathan ShiBocconi University Luca TrevisanBocconi UniversityAugust 14, 2020

Abstract

We prove that a random d -regular graph, with high probability, is a cut sparsiﬁer of the clique withapproximation error at most (cid:16) q π + o n,d (1) (cid:17) / √ d , where q π = 1 . . . . and o n,d (1) denotes an errorterm that depends on n and d and goes to zero if we ﬁrst take the limit n → ∞ and then the limit d → ∞ .This is established by analyzing linear-size cuts using techniques of Jagannath and Sen [JS17] derivedfrom ideas from statistical physics and analyzing small cuts via martingale inequalities.We also prove that every spectral sparsiﬁer of the clique having average degree d and a certainhigh “pseudo-girth” property has an approximation error that is at least the “Ramanujan bound” (2 − o n,d (1)) / √ d , which is met by d -regular Ramanujan graphs, generalizing a lower bound of Srivastava andTrevisan [ST18].Together, these results imply a separation between spectral sparsiﬁcation and cut sparsiﬁcation. If G is a random log n -regular graph on n vertices, we show that, with high probability, G admits a (weightedsubgraph) cut sparsiﬁer of average degree d and approximation error at most (1 . . . . + o n,d (1)) / √ d ,while every (weighted subgraph) spectral sparsiﬁer of G having average degree d has approximation errorat least (2 − o n,d (1)) / √ d . If G = ( V, E G , w G ) is a, possibly weighted, undirected graph, a cut sparsiﬁer of G with error ǫ is a weightedgraph H = ( V, E H , w H ) over the same vertex set of G and such that ∀ S ⊆ V (1 − ǫ ) cut G ( S ) ≤ cut H ( S ) ≤ (1 + ǫ ) cut G ( S ) (1)where cut G ( S ) denotes the number of edges in G with one endpoint in S and one endpoint in V − S , orthe total weight of such edges in the case of weighted graphs. This deﬁnition is due to Benczur and Karger[BK96].Spielman and Teng [ST11] introduced the stronger deﬁnition of spectral sparsiﬁcation . A weighted graph H = ( V, E H , w H ) is a spectral sparsiﬁer of G = ( V, E G , w G ) with error ǫ if ∀ x ∈ R V (1 − ǫ ) x T L G x ≤ x T L H x ≤ (1 + ǫ ) x T L G x (2)where L G is the Laplacian matrix of the graph G . If A G is the adjacency matrix of G and D G is the diagonalmatrix of weighted degrees, then the Laplacian matrix is L G = D G − A G and it has the property that, forevery vector x ∈ R V , x T L G x = X ( u,v ) ∈ E G w u,v · ( x u − x v ) The deﬁnition of spectral sparsiﬁer is stronger than the deﬁnition of cut sparsiﬁer because, if x = S is the0/1 indicator vector of a set S , then we have x T L G x = cut G ( S ) . So we see that the deﬁnition in (1) isequivalent to the specialization of the deﬁnition of (2) to the case of Boolean vectors x ∈ { , } V .1n all the known constructions of sparsiﬁers, the edge set E H of the sparsiﬁer is a subset of the edge set E G of the graph G . We will take this condition to be part of the deﬁnition of sparsiﬁer.A cut sparsiﬁer H of a graph G has, approximately, the same cut structure of G , so that, if we areinterested in approximately solving a problem involving cuts or ﬂows in G , we may instead solve the problemon H and be guaranteed that an approximate solution computed for H is also an approximate solution for G . As the name suggests, for every graph G it is possible to ﬁnd a cut sparsiﬁer H of G which is very sparse,and running an algorithm on a sparse graph yields a faster running time than running it on G , if G is notsparse itself.A spectral sparsiﬁer H of G has all the properties of a cut sparsiﬁer, and, furthermore, it can be substitutedfor G in some additional applications. For example, if we approximately solve the Laplacian linear system L H x = b , and H is a good spectral sparsiﬁer of G , then the resulting solution will also be an approximatesolution to the Laplacian linear system L G x = b . If the matrix L H is sparser than the matrix L G , solving L H x = b will be faster than solving L G x = b .Benczur and Karger [BK96] showed that, for every graph G , a cut sparsiﬁer with error ǫ having O ( ǫ − n log n ) edges can be computed in nearly linear time. Spielman and Teng [ST11] proved that a spectral sparsiﬁerwith error ǫ having O ( ǫ − n (log n ) O (1) ) edges can be computed in nearly linear time. Spielman and Srivas-tava [SS11] improved the number of edges that suﬃce to construct a spectral sparsiﬁer to O ( ǫ − n log n ) , andBatson, Spielman and Srivastava [BSS09] reduced it to O ( ǫ − n ) . Up to the constant in the big-Oh notation,the O ( ǫ − n ) bound is best possible, because every ǫ cut sparsiﬁer of the clique (and, for a stronger reason,every ǫ spectral sparsiﬁer of the clique) requires Ω( ǫ − n ) edges [ACK + O ( ǫ − n ) edges running in nearly quadratic time [AZLO15] and nearly linear time [LS17].In this paper we focus on the combinatorial problem of understanding the minimum number of edgesthat suﬃce to achieve cut and spectral sparsiﬁcation, regardless of the eﬃciency of the construction. Inparticular, we aim to understand the best possible constant in the Θ( ǫ − n ) bound mentioned above.Currently, the construction (or even non-constructive existence proof) of cut sparsiﬁers for general graphswith the smallest number of edges is the one due to Batson, Spielman and Srivastava, which also achievesspectral sparsiﬁcation with the same parameters. In particular, prior to this work, there was no evidencethat cut sparsiﬁcation is “easier” than spectral sparsiﬁcation, in the sense of requiring a smaller number ofedges. In this paper we show that random log n -regular graphs, with high probability, can be cut-sparsiﬁedwith better parameters than they can be spectrally-sparsiﬁed, if one requires the sparsiﬁer to use a subset ofthe edges of the graph to be sparisiﬁed. Under a conjecture of Srivastava and Trevisan, the same separationwould apply to sparsiﬁers of the clique.In the following, instead of referring to the number of edges in the sparsiﬁer as a function of the errorparameter ǫ and of the number of vertices n , it will be cleaner to refer to the error parameter ǫ as a functionof the average degree d of the sparsiﬁer (that is, we call dn/ the number of edges of the sparsiﬁer).The construction of Batson, Spielman and Srivastava achieves error (2 √ / √ d with a sparsiﬁer of averagedegree d , for general graphs. Batson, Spielman and Srivastava also show that every sparsiﬁer of the cliqueof average degree d has error at least / √ d . Srivastava and Trevisan [ST18] prove that every sparsiﬁer ofthe clique of average degree d and girth ω n (1) (that is, with girth that grows with the number of edges)that spectrally sparsiﬁes the clique has error at least (2 − o n,d (1)) / √ d . An appropriately scaled d -regularRamanujan graph is a spectral sparsiﬁer of the clique with error (2 + o n,d (1)) / √ d , so we will refer to / √ d as the Ramanujan bound for sparsiﬁcation. Srivastava and Trevisan conjecture that the Ramanujan boundis best possible for all graphs that sparsify the clique.

Conjecture 1 (Srivastava and Trevisan) . Every family of weighted graphs of average degree d that are ǫ spectral sparsiﬁers of the clique satisfy ǫ > (2 − o d (1)) / √ d . .1 Our Results Our main result is that it is possible to do better than the Ramanujan bound for cut sparsiﬁcation of theclique.In the following, we use G reg n,d to denote the distribution over random d -regular multigraphs on n verticescreated by taking the disjoint union of d random perfect matchings. We will always assume that n is even. Theorem 2 (Main) . With − o n (1) probability, a random regular graph drawn from G reg n,d , in which all edgesare weighted ( n − /d , is a (cid:16) q π + o n,d (1) (cid:17) / √ d cut sparsiﬁer of the clique, where q π = 1 . ... Together with Conjecture 1, the above theorem gives a conditional separation between the error-densitytradeoﬀs of cut sparsiﬁcation versus spectral sparsiﬁcation of the clique.In order to achieve an unconditional separation, we prove a generalization of the result of Srivastava andTrevisan to families of graphs that satisfy a property that is weaker than the property of having large girth(it is enough that most vertices, rather than all vertices, see no cycles within a certain distance) which wethen use to prove the following result.

Theorem 3. If G is a random regular graph drawn from G reg n, log n , then the following happens with highprobability over the choices of G : for all graphs H of average degree d which are weighted edge-subgraphs of G , if H is an ǫ -spectral sparsiﬁer of G then ǫ ≥ (2 − o n,d (1)) / √ d . Using the fact that a random log n -regular graph is, with high probability, a O (1 / √ log n ) spectral spar-siﬁer of the clique, and that a random log n -regular graph contains a random d -regular graph as a subgraph,we have our separation. Theorem 4.

Let G be a random regular graph drawn from G reg n, log n . Then with probability − o n (1) over thechoice of G the following happens for every d :1. There is a weighted subgraph H of G with dn/ edges such that H is an ǫ cut sparsiﬁer of G with ǫ ≤ (1 . ... + o n,d (1)) / √ d ;2. For every weighted subgraph H of G with dn/ edges, if H is an ǫ spectral sparsiﬁer of G then ǫ ≥ (2 − o n,d (1)) / √ d . Our main result, Theorem 2, is established by analyzing cuts of linear size using rigorous techniques thathave been derived from statistical physics [JS17] and by analyzing sublinear size cuts using martingaleconcentration bounds.For a ﬁxed set S of k = αn ≤ n/ vertices, the average number of edges that leave S in a random d -regular graph is dn − · k · ( n − k ) and we are interested in showing that for every such set the deviationfrom the expectation is at most ǫ dn − · k · ( n − k ) , for ǫ ≤ . .../ √ d . One approach is to set up a martingale and apply an Azuma-like inequality. In this approach, it is betterto study the deviation from the expectation of the number of edges that are entirely contained in S . Thisis because, in a regular graph, the deviation from the expectation of the number of edges crossing thecut ( S, V − S ) is entirely determined by the deviation from the expectation of the number of edges entirelycontained in S , and the latter can be written as a sum of fewer random variables (that is, (cid:0) k (cid:1) versus k · ( n − k ) ),especially for small k . After setting up the appropriate Doob martingale, we can prove that the probabilitythat the cut ( S, V − S ) deviates from the expectation by more than . .../ √ d times the expectation is atmost e − Ω( n ) if k ≥ Ω( n/ √ d ) and at most e − Ω( dk log( n/dk )) for k ≤ O ( n/ √ d ) . In particular, there is an α > such that for all k ≤ α n the probability of having a large deviation is much smaller than / (cid:0) nk (cid:1) , in a waythat enables a union bound. These calculations are carried out in Section 3.3nfortunately, such “ﬁrst moment” calculations cannot be pushed all the way to α = 1 / . This isbecause our calculations with deviation bounds and union bounds are equivalent to estimating the averagenumber of cuts that have a relative error bigger than . .../ √ d , with the goal of showing that such averagenumber is much smaller than one. Unfortunately, the average number of balanced cuts that have a relativeerror bigger than / √ d is bigger than one, so we cannot hope to get a separation from the spectral boundswith ﬁrst moment calculations. We then turn to techniques derived from statistical physics in order to analyze large cuts. To illustrate thisapproach, consider the classical problem of bounding the typical value of the max cut optimum in Erdős-Rényi random graphs G n, / , up to o ( n . ) error terms. This is equivalent to the problem of understandingthe typical value of max σ ∈{± } n σ T M σ (3)where M is a random symmetric matrix with independent uniform ± entries oﬀ the diagonal and zerodiagonal.A ﬁrst step is to prove, by an interpolation argument, that, up to lower order o ( n . ) additive error, theoptimum of (3) is the same as the optimum of max σ ∈{± } n σ T W σ (4)where W is a Wigner matrix, a random symmetric matrix with zero diagonal and independent and standardnormally distributed oﬀ-diagonal entries.Finding the optimum of (4) up to an additive error o ( n . ) is a standard problem in statistical physics: itis the problem of determining the zero-temperature free energy of a spin-glass model called the Sherrington-Kirkpatrick model, or SK model for short.Parisi [Par80] deﬁned a family of diﬀerential equations, and presented a heuristic argument accordingto which the inﬁmum of the solutions of those diﬀerential equations, would give the free energy of the SKmodel. That inﬁmum is now called the Parisi formula. Parisi’s approach was extremely inﬂuential andwidely generalized. Guerra [Gue03] rigorously proved that a solution to each of the diﬀerential equationsgives an upper bound on the free energy, and, in a monumental work, Talagrand [Tal06] rigorously provedthe stronger claim that the Parisi formula is equal to the free energy of the SK model. Talagrand’s work wasfurther generalized by Panchenko [Pan14].Dembo, Montanari and Sen [DMS +

17] proved an interpolation result showing that the solution to (4)can also be used to bound the max cut in random sparse graphs of constant average degree d , including bothrandom d -regular graphs G reg n,d and Erdős-Rényi random graphs G n,d/n . Jagannath and Sen [JS17] provedinterpolation theorems for the problem of determining the max cut out of sets of size αn , for ﬁxed constant α , in G n,d/n and in G reg n,d graph, and they proved that the two models have diﬀerent asymptotic bounds when < α < / .In particular, to ﬁnd the maximum (and the minimum) over all sets S of cardinality αn of cut G ( S ) in arandom d -regular graph, Jagannath and Sen prove that one has to study max σ ∈ S n ( α ) σ T Π T W Π σ (5)where S n ( α ) is the subset of vectors σ ∈ {± } n that contain exactly αn ones, and Π = I − n J is the matrixthat projects on the space orthogonal to (1 , , . . . , . The restriction to S n ( α ) models the restriction to cuts ( S, V − S ) where | S | = αn , and the projection deﬁnes a matrix Π T W Π such that all rows and all columnssum to zero, in analogy to the fact that, in a regular graph, all rows and all columns of the adjacency matrixhave the same sum.Jagganath and Sen also deﬁne a Parisi-type family of diﬀerential equations and they rigorously provethat a solution to any of those equations provides an upper bound to (5). Since their goal is to compare cuts4n regular graphs to cuts in Erdős-Rényi graphs, rather than bounding cut sizes in random regular graphs,they do not provide solutions to their Parisi-type equations. In Section 2 we compute the replica-symmetric solution and get an explicit bound.From the bound, we get that, for every ﬁxed α , with high probability, sets of size αn in a random d -regulargraph satisfy the deﬁnition of ǫ cut sparsiﬁcation of the clique with ǫ ≤ r π + o n,d (1) ! · √ d = 1 . . . . + o n,d (1) √ d A tight upper bound on ǫ , which would come from an exact solution of (5), is likely to be / √ d times thevalue of the Parisi formula evaluated at zero temperature and no external ﬁeld (approximately . / √ d [CR02]), although we have not attempted to prove this. As discussed above, we established that a random d -regular graph is an ǫ cut sparsiﬁer with ǫ ≤ (1 . ... + o (1)) / √ d . Under Conjecture 1, this gives a conditional separation between the error-vs-density tradeoﬀ forcut sparsiﬁcation of the clique compared to spectral sparsiﬁcation of the clique.If we consider sparsiﬁers that are weighted edge-subgraphs of the graph to sparsify, we can obtain anunconditional separation if we can ﬁnd a random family of graphs that: • Contain random d -regular graphs as edge-induced subgraphs • Are, with high probability, o n,d (1 / √ d ) spectral sparsiﬁers of the clique • Are such that, with high probability, no weighted edge-induced subgraph of average degree d can be aspectral sparsiﬁer of the clique with error smaller than (2 + o n,d (1)) / √ d Then, if we consider the graphs G n in this family, and let H n be a random d -regular graph containedin G n , we have that the following happens with high probability: H n is a graph of average degree d that isa (1 . ... + o n,d (1)) / √ d cut sparsiﬁer of the clique and also a (1 . ... + o n,d (1)) / √ d cut sparisiﬁer of G n ,but every edge-weighted subgraph of G n of average degree d which is an ǫ spectral sparsiﬁer of G n is alsoan ǫ + o n,d (1 / √ d ) spectral sparsiﬁer of the clique and hence satisﬁes ǫ ≥ (2 − o n,d (1)) / √ d .Srivastava and Trevisan prove that the Ramunajan bound is optimal for families of graphs of growinggirth, but it is not possible to use this property in the above plan because a family of random graphs cannot,with high probability, both contain random d -regular graphs and have large girth. To overcome this diﬃculty,we generalize the result of Srivastava and Trevisan to graphs that have a large “pseudo-girth” g , that is, thatare such that for a − o (1) fraction of the nodes v there is no cycle in the ball centered at v of radius g/ . We deﬁne test vectors for every graph, and show that, if the graph satisﬁes the pseudogirth conditionwith g = d / , then the test vectors show that, if the graph is an ǫ spectral sparsiﬁer of the clique then ǫ ≥ / √ d − O ( d − / ) − o n (1) .The pseudogirth condition is satisﬁed by several families of random regular graphs and Erdős-Rényirandom graphs. In particular, random ∆ n -regular graphs, for any choice of the degree ∆ n in the range d . ≤ ∆ n ≤ n / d satisﬁes the three conditions above and can be used to establish the separation. Forconcreteness, we have stated our result for ∆ n = log n . The notions of cut sparsiﬁer and of spectral sparsiﬁer of the clique are interesting generalizations of the notionof expander graph, and they allow graphs that are possibly weighted and irregular. As with expander graphs,it seems worthwhile to study sparsiﬁers as fundamental combinatorial objects, beyond their applications tothe design of eﬃcient graph algorithms.A proof of Conjecture 1 would give us a signiﬁcant generalization of the Alon-Boppana theorem, and itwould be a very interesting result. 5t is plausible that the clique is the hardest graph to sparsify, both for cut sparsiﬁcation and for spectralsparsiﬁcation. This would mean that the error in the construction of Batson, Spielman and Srivastava can beimproved from √ / √ d to / √ d , up to lower order terms, and that there is a construction (or perhaps a non-constructive existence proof) of cut sparsiﬁers of general graphs with error smaller than . / √ d , up to lowererror terms. At present, unfortunately, there is no promising approach to construct (or non-constructivelyprove existence) of cut sparsiﬁers of general graphs error below / √ d , or even below √ / √ d . We show that random regular graphs are good cut sparsiﬁers of the clique over cuts with vertex set S oflinear size, so that | S | = αn for constant α . Theorem 5 (Linear Set Regime for Cut Sparsiﬁcation) . For every constant α ∈ (0 , , almost always over H ∼ G reg n,d , (cid:12)(cid:12)(cid:12)(cid:12) cut H ( S ) E S ′ ∈S α cut H ( S ′ ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ √ d r π + o n,d (1) ! , where S α = { S ⊆ V | | S | = αn } and q π = 1 . . . . . First we refer to a lemma showing that the maximum cut with relative cut volume α concentrates aroundits expectation, so that we reduce the problem to understanding the expected value of the maximum cut. Wealso state its version for minimum cuts, derived by ﬂipping signs and using sign symmetries in the statementand proof of the lemma, in accordance with [JS17, Remark 1]. Lemma 6 (Lemma 2.1 of [JS17]) . Pr H ∼G reg n,d "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max S ∈S α n cut H ( S ) − E H ′ ∼G reg n,d (cid:20) max S ′ ∈S α n cut H ′ ( S ′ ) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε ≤ e − nε /d , Pr H ∼G reg n,d "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) min S ∈S α n cut H ( S ) − E H ′ ∼G reg n,d (cid:20) min S ′ ∈S α n cut H ′ ( S ′ ) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε ≤ e − nε /d . As discussed in Section 1.2.2, we now invoke techniques of statistical mechanics developed in the studyof spin glasses, speciﬁcally the SK model and its generalizations.After the Parisi formula was proven to solve the SK model, Denbo, Montanari, and Sen [DMS + cut( S ) where | S | is a constanttimes n , as we study here, relating these problems to a generalization of the SK model.The SK model has internal energy σ T W σ/ √ n for W ∈ R n × n a symmetric Wigner matrix with stan-dard Gaussian entries on the oﬀ-diagonals and zero on the diagonals, to be optimized over conﬁgurations σ ∈ {± } n . The generalization studies the optimization problem with the same matrix W and the sameconﬁguration space {± } n but with internal energy H (1) W ( σ ) = 1 √ n σ T Π W Π σ, where Π is the orthogonal projection away from the all-ones vector. In this model, ﬁnding the extremalcuts of a given relative vertex density α corresponds to optimizing that energy over the restricted set ofconﬁgurations S n ( α ) = ( σ ∈ {± } n : X i σ i = n (2 α − ) . This deﬁnition corresponds to that used in [JS17], and is larger by a factor of than a convention used in some other places.

6e may formulate this equivalently as optimizing H (0) W ( σ ) = 1 √ n σ T W σ over a diﬀerent alphabet σ ∈ {± − (2 α − } , with graph cuts of relative vertex density α corresponding tothe set of conﬁgurations A n ( T ( α ) , ε n ) = ( σ ∈ {± − (2 α − } : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i σ i − T ( α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ε n ) , with T ( α ) = 4 α (1 − α ) and setting ε n = 0 to achieve the equivalence.Finally, Jagannath and Sen [JS17] used an analytical annealing approach to solve this generalized model,yielding the generalization of the Parisi formula stated here: Deﬁnition 7.

Let a ν be a measure over [0 , T ] of the form ν = m ( t )d t + cδ T with m ( t ) non-negative, non-decreasing, and everywhere right-continuous with left limits (cadlag), where d t is the uniform measure and δ T is the Dirac delta function at t = T . Then for λ ∈ R and T ( α ) = 4 α (1 − α ) , we deﬁne the ground stateenergy functional P T ( α ) ( ν, λ ) = u ν,λ (0 , − λT ( α ) − Z T ( α )0 s d ν ( s ) where u ν,λ is the solution to the diﬀerential equation with boundary condition  ∂u∂t + 2 ∂ u∂x + m ( t ) (cid:18) ∂u∂x (cid:19) = 0 , ( t, x ) ∈ [0 , T ( α )) × R ,u ( x, T ( α )) = max ζ ∈{± − M } ζx + ( λ + 2 c ) ζ , where M = 2 α − . This deﬁnition reduces to the original Parisi formula at zero temperature and external ﬁeld in the casethat T = 1 and M = 0 and when the inﬁmum over ν is taken.This generalized Parisi formula relates to average extremal cuts on random regular graphs in the followingway. Theorem 8 (Combination of Theorem 1.2 and Lemma 2.2 of [JS17]) . Let T ( α ) = 4 α (1 − α ) . As n → ∞ , E H ∼G reg n,d max S ∈S α | cut H ( S ) − dα (1 − α ) | ≤ √ d n inf ν,λ P T ( α ) ( ν, λ ) + o d (1 / √ d ) . Proof.

By [JS17, Lemma 2.2], almost always over the randomness of W as n → ∞ , E H ′ (cid:20) max S ′ ∈S α n cut H ′ ( S ′ ) (cid:21) = dα (1 − α ) + 14 √ d E W (cid:20) max σ ∈ S n ( α ) n H (1) W ( σ ) (cid:21) + o d ( √ d ) . As alluded to in [JS17, Remark 1], Lemma 2.2 of [JS17] holds also for minimum cuts: this requires onlychanging some signs and invoking a few instances of sign-ﬂip symmetry in the proof. E H ′ (cid:20) min S ′ ∈S α n cut H ′ ( S ′ ) (cid:21) = dα (1 − α ) − √ d E W (cid:20) max σ ∈ S n ( α ) n H (1) W ( σ ) (cid:21) − o d ( √ d ) . By the equivalence described earlier in this section and the fact that A n ( T ( α ) , ⊆ A n ( T ( α ) , ε n ) for anysequence of ε n > , max σ ∈ S n ( α ) n H (1) W ( σ ) ≤ max σ ∈ A n ( T ( α ) ,ε n ) n H (0) W ( σ )

7y [JS17, Theorem 1.2], for ε n → slowly enough as n → ∞ , it holds almost surely over W that lim n →∞ max σ ∈ A n ( T ( α ) ,ε n ) n H (0) W ( σ ) = inf ν,λ P T ( α ) ( ν, λ ) . Combining the above equations yields the theorem statement.It is not yet known how to eﬃciently compute the exact value of the Parisi formula or its generalization.We circumvent this issue by providing an upper bound, by choosing a particularly simple measure ν to boundthe inﬁmum inf ν,λ P T ( α ) ( ν, λ ) . Speciﬁcally, the choice of ν = cδ T with m ( t ) = 0 is known as the replica-symmetric ansatz [Mal19, Chapter 2], corresponding to the ﬁrst of Parisi’s original sequence of estimates. Lemma 9. inf ν,λ P T ( α ) ( ν, λ ) ≤ p α (1 − α ) · √ π e − (erf − (2 α − , where erf is the Gauss error function erf( x ) = √ π R x − x e − x d x .Proof. First we express R T s d ν ( s ) = cT + R T tm ( t )d t and reparameterize ˆ λ = λ + 2 c so that we can write inf ν,λ P T ( α ) ( ν, λ ) = inf ν, ˆ λ ˆ u ν, ˆ λ (0 , − ˆ λT − Z T tm ( t )d t where ˆ u ν, ˆ λ is the solution to  ∂u∂t + 2 ∂ u∂x + m ( t ) (cid:18) ∂u∂x (cid:19) = 0 , ( t, x ) ∈ [0 , T ) × R ,u ( x, T ) = max ζ ∈{± − M } ζx + ˆ λζ , with M = 2 α − .By taking ν ( t ) = cδ T so that m ( t ) = 0 , we can upper-bound the inﬁmum over ν , so that inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ ˆ u ˆ λ (0 , − ˆ λT and ˆ u ˆ λ is the solution to  ∂u∂t + 2 ∂ u∂x = 0 , ( t, x ) ∈ [0 , T ) × R ,u ( x, T ) = max ζ ∈{± − M } ζx + ˆ λζ . By reparameterizing t as − t here, we can see that u ( x, is simply the result of evolving u ( x, T ) according tothe heat equation with diﬀusivity constant for a time of T . Evolution of the heat equation with diﬀusivity k over a time of T is equivalent to convolution with the Gaussian heat kernel exp( − x / (4 kT )) / √ πkT [Eva10,Chapter 2.3], so ˆ u ˆ λ ( x,

0) = 1 √ πT Z ∞−∞ e − z / (8 T ) (cid:18) max ζ ∈{± − M } ζ ( z + x ) + ˆ λζ (cid:19) d z. Thus inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ πT Z ∞−∞ e − z / (8 T ) (cid:18) max ζ ∈{± − M } ζz + ˆ λζ (cid:19) d z − ˆ λT. max ζ ∈{± − M } ζz + ˆ λζ = max (cid:16) − z − M z + ˆ λ (1 + 2 M + M ) , z − M z + ˆ λ (1 − M + M ) (cid:17) = − M z + ˆ λ (1 + M ) + max( − z + 2 M ˆ λ, z − M ˆ λ )= − M z + ˆ λ (1 + M ) + | z − M ˆ λ | , so that inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ πT Z ∞−∞ e − z / (8 T ) (cid:16) − M z + ˆ λ (1 + M ) + (cid:12)(cid:12)(cid:12) z − M ˆ λ (cid:12)(cid:12)(cid:12)(cid:17) d z − ˆ λT. Partially evaluating the integral using the facts that a Gaussian probability density function integrates to 1and, by oddness of the integrand, R ∞−∞ ze − z / (8 T ) d z = 0 , inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ πT Z ∞−∞ e − z / (8 T ) (cid:12)(cid:12)(cid:12) z − M ˆ λ (cid:12)(cid:12)(cid:12) d z + ˆ λ (1 − T + M ) . Employing a change of variables z → √ T z to write the integral in terms of the normal Gaussian probabilitydensity φ ( z ) = √ π e − z / and also applying the identity − T = 1 + 4 α − α = M , inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ Z ∞−∞ φ ( z ) (cid:12)(cid:12)(cid:12) √ T z − M ˆ λ (cid:12)(cid:12)(cid:12) d z + 2ˆ λM . Focusing now on the integral, Z ∞−∞ φ ( z ) (cid:12)(cid:12)(cid:12) √ T z − M ˆ λ (cid:12)(cid:12)(cid:12) d z = Z ∞ M ˆ λ/ √ T φ ( z ) (cid:16) √ T z − M ˆ λ (cid:17) d z + Z M ˆ λ/ √ T −∞ φ ( z ) (cid:16) − √ T z + 2 M ˆ λ (cid:17) d z = Z − M ˆ λ/ √ T −∞ φ ( z ) (cid:16) − √ T z − M ˆ λ (cid:17) d z + Z M ˆ λ/ √ T −∞ φ ( z ) (cid:16) − √ T z + 2 M ˆ λ (cid:17) d z, where we negated and ﬂipped the limits of the ﬁrst integral, which is equivalent to negating the odd partof the integrand while preserving the even part. Continuing to integrate, letting Φ( z ) denote the Gaussiancumulative density function, = Z − M ˆ λ/ √ T −∞ − √ T z φ ( z )d z + Z M ˆ λ/ √ T −∞ − √ T z φ ( z )d z + 2 M ˆ λ Z M ˆ λ/ √ T − M ˆ λ/ √ T φ ( z )d z, = h √ T φ ( z ) i − M ˆ λ/ √ T −∞ + h √ T φ ( z ) i M ˆ λ/ √ T −∞ + 2 M ˆ λ (cid:16) Φ( M ˆ λ/ √ T ) − Φ( − M ˆ λ/ √ T ) (cid:17) = 4 √ T φ ( M ˆ λ/ √ T ) + 2 M ˆ λ erf( M ˆ λ/ √ T ) , where we used evenness of φ and the fact that Φ( x ) − Φ( − x ) = erf( x/ √ in the last step. So, putting thisevaluation of the integral into our previous expression, inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ T φ ( M ˆ λ/ √ T ) + 2 M ˆ λ erf( M ˆ λ/ √ T ) + 2ˆ λM . By ﬁnding the critical point of this expression with respect to ˆ λ , we ﬁnd a value of ˆ λ = −√ T erf − ( M ) /M .Using this value for ˆ λ , inf ν,λ P T ( α ) ( ν, λ ) ≤ √ T φ ( −√ − ( M )) − M ˆ λM + 2ˆ λM = 4 √ T φ ( √ − ( M )) .

9e calculate the largest concrete value attained by the upper bound of the preceding lemma:

Lemma 10.

For all α ∈ (0 , , inf ν,λ P T ( α ) ( ν, λ )4 α (1 − α ) ≤ r π = 1 . ... Proof.

By Lemma 9, for α ∈ (0 , , inf ν,λ P T ( α ) ( ν, λ )4 α (1 − α ) ≤ p α (1 − α ) · √ π e − (erf − (2 α − := f ( α ) . Evaluated at α = 1 / , this is equal to p /π , so we just need to show that the upper bound f ( α ) ismaximized at α = 1 / .First we reparameterize g ( M ) = f ( α ) with M = 2 α − so that g ( M ) = 1 p π (1 − M ) e − (erf − ( M )) and we want to show that g is maximized at . Using the product rule to take the derivative of g , since ddM √ − M = M (1 − M ) / and ddM e − (erf − ( M )) = −√ π erf − ( M ) , g ′ ( M ) = 1 p π (1 − M ) M e − (erf − ( M )) − M − √ π erf − ( M ) ! . We take another monotonic reparameterization, introducing erf( x ) for M : g ′ (erf( x )) = 1 p π (1 − erf( x ) ) erf( x ) e − x − erf( x ) − √ πx ! . By Polya [P + erf( x ) < √ − e − x /π so that − erf( x ) ≥ e − x /π , so that, for x < when erf( x ) < , g ′ (erf( x )) ≥ p π (1 − erf( x ) ) (cid:16) erf( x ) e (4 /π − x − √ πx (cid:17) . And by Neuman [Neu13], erf( x ) ≥ x √ π e − x / , so when x < , g ′ (erf( x )) ≥ p π (1 − erf( x ) ) (cid:18) x √ π e (4 /π − / x − √ πx (cid:19) . And as e (4 /π − / x ≤ , this makes it clear that g ′ (erf( x )) is positive when x is negative, which means that g is increasing on the negative part of its domain, which by evenness of g means that g is maximized at . This gives us all the ingredients necessary to prove the main theorem of this section. Proof of Theorem 5.

Combine Lemma 6, Corollary 8, and Lemma 10 with the fact that E S ′ ∈S α cut H ( S ′ ) = ndα (1 − α ) . 10 Analysis for small cuts

In this section, we demonstrate that the number of edges crossing a cut ( S, V − S ) deviates no more fromits expectation than by a . √ d factor with high probability when | S | is small. Theorem 11 (Small Set Regime for Cut Sparsiﬁcation) . There exists suﬃciently large n ≥ and constant d ≥ such that, for any S ⊂ V where | S | = α n and α ≤ , a sample H ∼ G reg n,d admits with probability (1 − o n (1)) (cid:12)(cid:12)(cid:12)(cid:12) cut H ( S ) E H ∼G reg n,d [cut H ( S )] − (cid:12)(cid:12)(cid:12)(cid:12) ≤ . √ d Our analysis will require the use of a Doob martingale.

Deﬁnition 12.

Given random variables A and ( Z ℓ ) Nℓ =1 sampled from a common probability space, theirassociated Doob martingale is given by random variables ( X ℓ ) Nℓ =0 where X = E [ A ] and X ℓ = E [ A | Z , . . . , Z ℓ ] We note that ( Z ℓ ) is often called the ﬁltration that ( X ℓ ) is deﬁned with respect to. For a Doob martingale ( X ℓ ) Nℓ =0 , we denote its martingale diﬀerence sequence by ( Y ℓ ) where Y ℓ = X ℓ − X ℓ − and its quadraticcharacteristic sequence by ( h X i ℓ ) where h X i ℓ = ℓ X r =1 E (cid:2) Y r | Z . . . Z r − (cid:3) As mentioned previously, the small cuts analysis will quantify the number of edges contained entirelywithin a cut and use the fact that, in a regular graph, the number of edges across a cut is uniquely determinedby the number of edges within the cut. For a graph H , we will denote e H ( S ) by the number of edges e ∈ E H with both endpoints contained within S ⊆ V . When H is sampled from a distribution, it is understood that e H ( S ) is a random variable. Consider H a random regular graph drawn from G reg n,d . Enumerate its vertices by i ∈ [ n ] , and its constituentmatchings by m ∈ [ d ] . For S ⊂ V of size | S | = k , we will assume without loss of generality that S = { , . . . , k } .Next, consider the sequence of matching-vertex pairs (cid:0) ( m ℓ , i ℓ ) (cid:1) Nℓ =1 enumerating each ( m, i ) ∈ [ d ] × [ k − where N = d · ( k − . Let us now deﬁne the sequence of random variables ( Z ℓ ) Nℓ =1 where Z ℓ = Z ( m ℓ ,i ℓ ) ∈ V is the vertex that matching m ℓ matches i ℓ ∈ V to in H . Note that e ( S ) = N X ℓ =1 { Z ℓ ∈ [ k ] and Z ℓ > i ℓ } We now construct the Doob martingale on e ( S ) using ( Z ℓ ) as a ﬁltration. The matched edge-vertex revealmartingale ( X ℓ ) Nℓ =0 is given by X ℓ = E [ e ( S ) | Z , . . . , Z ℓ ] . One should think of this martingale as countingthe number of edges contained within S . As an increasing number of Z ℓ are conditioned on, informationregarding what edges exist in H is revealed in an ordered way. The order in which an edge is revealed isgiven by the enumeration of the vertices adjacent to the edge, and the matching the edge belonged to when H was ﬁrst sampled from d random matchings. Additionally, notice that vertex k is excluded from such pairs ( m ℓ , i ℓ ) . This is because m ℓ can only match k to i ℓ < k for the edge to be contained in S . Consequently,revealing edges adjacent to { , . . . , k − } suﬃces to uniquely determine e ( S ) .Our analysis of ( X ℓ ) will now proceed as follows. We ﬁrst determine bounds on the martingale diﬀerenceand quadratic characteristic of ( X ℓ ) . These bounds are then used by a standard martingale concentrationresult to argue that the number of edges contained within S cannot deviate far from its expectation. Finally,we complete the proof of Theorem 11 by using the fact that concentration in the number of edges within S immediately implies concentration in the number of edges in cut H ( S ) when H is a random d regular graph.11 .2 Properties of the Martingale To bound the martingale diﬀerence and quadratic characteristic of ( X ℓ ) , we examine how e ( S ) behaves asan increasing number of Z ℓ are conditioned on. We say that { z , . . . , z ℓ } ⊆ [ n ] is a valid realization of Z ℓ ifthere exists a d regular graph H such that each ( i ℓ , z ℓ ) ∈ E H . When z , . . . , z ℓ are deterministically provided,we can deﬁne the following quantities.1. a ℓ = a ℓ ( z , . . . , z ℓ ) is the number of remaining vertices in S that remain unmatched as a function of z , . . . , z ℓ . We denote a = | S | = k .2. b ℓ = b ℓ ( z , . . . , z ℓ ) is the number of remaining vertices in V that remain unmatched as a function of z , . . . , z ℓ . We denote b = | V | = n .We will also consider a ℓ ( z , . . . , z ℓ − , Z ℓ ) and b ℓ ( z , . . . , z ℓ − , Z ℓ ) where Z ℓ is sampled according to theﬁltration speciﬁed in X ℓ . In this case, a ℓ and b ℓ are random variables distributed according to that of therandom variable Z ℓ . When z , . . . , z ℓ are a valid realization, we can demonstrate a bound on the ratio a ℓ b ℓ . Lemma 13.

Let H ∼ G reg n,d be a random regular graph, S ⊆ V such that | S | = k < n , and N = d · ( k − .For any ≤ ℓ ≤ N and valid realization z , . . . , z ℓ , it happens that a ℓ b ℓ ≤ kn Proof.

We proceed via induction on ℓ . For the base case, ℓ = 0 implies we have a b = kn . Let us now assumethe lemma holds for ℓ − . Notice that any choice of z ℓ admits one of three cases.1. z ℓ ∈ [ k ] and z ℓ > i ℓ . This corresponds to z ℓ revealing the existence of an edge not previously known tobe in S when considering only z , . . . , z ℓ − . Hence a ℓ = a ℓ − − and b ℓ = b ℓ − − and a ℓ b ℓ = a ℓ − − b ℓ − − ≤ a ℓ − b ℓ − ≤ kn with the last inequality following by the inductive hypothesis.2. z ℓ ∈ [ k ] however z ℓ < i ℓ . This corresponds to i ℓ having already been matched to j ∈ [ k ] as revealed by z j for j < ℓ . Thus, a ℓ = a ℓ − and b ℓ = b ℓ − and the inductive hypothesis is maintained.3. z ℓ / ∈ [ k ] however z ℓ > i ℓ . This corresponds to m ℓ matching i ℓ to a vertex not in S . Thus a ℓ = a ℓ − and b ℓ = b ℓ − and so a ℓ b ℓ = a ℓ − − b ℓ − − ≤ a ℓ − − b ℓ − − n/k = a ℓ − − k/n · n/kb ℓ − − n/k < nk where the second inequality follows as k ≤ n and the last inequality follows by the following principle: pq < r implies p − rwq − w < r for all p, q, r, w ∈ Z ≥ and we choose p = a ℓ − , q = b ℓ − , r = kn , and w = nk .In all cases, we have that the lemma holds for ℓ , thus completing the induction.We now bound the martingale diﬀerence of ( X ℓ ) . Lemma 14.

Let H ∼ G reg n,d be a random regular graph, S ⊆ V such that | S | = k < n , and N = d · ( k − .Then Y ℓ associated with ( X ℓ ) Nℓ =0 admits | Y ℓ | ≤ for all i ∈ [ N ] .Proof. As the d constituent matchings of H are sampled independently and uniformly at random, it suﬃcesto assume d = 1 , and hence N = k − . Now let φ ( a, b ) be the expected number of edges contained inside asubset of a vertices in a uniformly sampled perfect matching on b vertices. φ ( a, b ) is the quantity φ ( a, b ) = (cid:18) a (cid:19) · b − ℓ , we begin by ﬁxing a valid realization of random variables Z = z , . . . , Z ℓ = z ℓ and observethat X ℓ − can be computed as X ℓ − = E [ e ( S ) | Z = z , . . . , Z ℓ − = z ℓ − ]= E (cid:20) N X r =1 { Z r ∈ [ k ] and Z r > r } (cid:12)(cid:12)(cid:12)(cid:12) Z = z , . . . , Z ℓ − = z ℓ − (cid:21) = ℓ − X r =1 { z r ∈ [ k ] and z r > r } + φ ( a ℓ − , b ℓ − ) where we have used linearity of expectations to separate terms of e ( S ) that have been conditioned to be z r , and those that remain random. X ℓ is similarly given by the following. X ℓ = ℓ X r =1 { z r ∈ [ k ] and z r > r } + φ ( a ℓ , b ℓ ) We can now compute Y ℓ as Y ℓ = X ℓ − X ℓ − = { z ℓ ∈ [ k ] and z ℓ > ℓ } + (cid:0) φ ( a ℓ , b ℓ ) − φ ( a ℓ − , b ℓ − ) (cid:1) Let us denote w ℓ = { z ℓ ∈ [ k ] and z ℓ > ℓ } . It is either the case that w ℓ = 1 or w ℓ = 0 . Assuming w ℓ = 1 ,we ﬁrst demonstrate that Y ℓ ≤ . In this case vertex ℓ is adjacent to z ℓ ∈ S . Consequently, a ℓ = a ℓ − − and b ℓ = b ℓ − − and we have Y ℓ = w ℓ + (cid:0) φ ( a ℓ − − , b ℓ − − − φ ( a ℓ − , b ℓ − ) (cid:1) = 1 + (cid:18) a ℓ − − (cid:19) · b ℓ − − − (cid:18) a ℓ − (cid:19) · b ℓ − −

1= 1 + (cid:18) a ℓ − − (cid:19) · (cid:18) b ℓ − − − b ℓ − − (cid:19) − a ℓ − − b ℓ − − ≤ a ℓ − − b ℓ − − − a ℓ − − b ℓ − −

1= 1 as required. Completing the analysis for w ℓ = 1 , we demonstrate that Y ℓ ≥ . Y ℓ = 1 + (cid:18) a ℓ − − (cid:19) · (cid:18) b ℓ − − − b ℓ − − (cid:19) − a ℓ − − b ℓ − − ≥ − a ℓ − b ℓ − ≥ − kn The last inequality follows from an application of Lemma 13. Suppose now that w ℓ = 0 . Since ( X ℓ ) Nℓ =1 is a Doob martingale, E [ Y ℓ ] = 0 for all ℓ . This implies that Y ℓ < < since in fact Y ℓ > whenever w ℓ = 1 .All that remains to demonstrate is that Y ℓ > − . Observe that w ℓ = 0 implies one of two cases.1. z ℓ ∈ [ k ] however z ℓ < ℓ . Then a ℓ = a ℓ − and b ℓ = b ℓ − implying Y ℓ = 0 .2. z ℓ / ∈ [ k ] however z ℓ > ℓ . Then a ℓ = a ℓ − − and b ℓ = b ℓ − − . We then compute Y ℓ as Y ℓ = w ℓ + (cid:0) φ ( a ℓ − − , b ℓ − − − φ ( a ℓ − , b ℓ − ) (cid:1) (cid:18) a ℓ − − (cid:19) · b ℓ − − − (cid:18) a ℓ − (cid:19) · b ℓ − − (cid:18) a ℓ − − (cid:19) · (cid:18) b ℓ − − − b ℓ − − (cid:19) − a ℓ − − b ℓ − − ≥ − a ℓ − b ℓ − ≥ − kn where the last inequality follows from Lemma 13.In both cases, Y ℓ > − since k ≤ n , thus completing the proof.Lemma 14 precisely computes how X ℓ behaves as ℓ increases. If it is revealed that m ℓ matches Z ℓ to i ℓ < Z ℓ (thus within S ), then X ℓ increases by some amount in the interval [1 − kn , . Otherwise X ℓ decreasesby an amount in [ − kn , . Using this enables us to bound the quadratic characteristic, and understand howthe variance of e ( S ) accumulates as subsequent Z ℓ are conditioned on. Lemma 15.

Let H ∼ G reg n,d be a random regular graph, S ⊆ V such that | S | = k < n , N = d · ( k − . For ( X ℓ ) Nℓ =0 , we have h X i N ≤ k ( k − dn − k with probability 1.Proof. It is suﬃcient to demonstrate h X i ℓ − h X i ℓ − ≤ kn − k for all ℓ ∈ [ N ] as we would have h X i N = N X ℓ =2 (cid:0) h X i ℓ − h X i ℓ − (cid:1) ≤ N · kn − k ≤ k ( k − dn − k Assume without loss of generality that d = 1 and ﬁx ℓ along with a valid realization Z = z , . . . , Z ℓ − = z ℓ − . One can calculate the following fact h X i ℓ − h X i ℓ − = Var[ { Z ℓ ∈ [ k ] and Z ℓ > ℓ } = 1] Denote the indicator random variable W ℓ = { Z ℓ ∈ [ k ] and Z ℓ > ℓ } . To bound the variance of theindicator, we seek to determine Pr[ W ℓ = 1] with randomness taken over choice of Z ℓ . Recall that Y ℓ = W ℓ + (cid:0) φ ( a ℓ , b ℓ ) − φ ( a ℓ − , b ℓ − ) (cid:1) Note Y ℓ is a random quantity since W ℓ , a ℓ = a ℓ ( z , . . . , z ℓ − , Z ℓ ) , and b ℓ = b ℓ ( z , . . . , z ℓ − , Z ℓ ) eachdepend on a sample Z ℓ . It remains however that E [ Y ℓ ] = 0 implying (cid:2) W ℓ = 1 (cid:3) + E (cid:2)(cid:0) φ ( a ℓ , b ℓ ) − φ ( a ℓ − , b ℓ − ) (cid:1)(cid:3) and hence Pr (cid:2) W ℓ = 1 (cid:3) = E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) (cid:3) Let us condition the expectation as follows. Pr (cid:2) W ℓ = 1 (cid:3) = Pr[ W ℓ = 0] · E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) + Pr[ W ℓ = 1] · E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) ≤ E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) + Pr[ W ℓ = 1] · E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) Implying Pr (cid:2) W ℓ = 1 (cid:3) ≤ E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) − E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) Y ℓ ≥ − kn if W ℓ = 1 , while Y ℓ ≥ − kn if W ℓ = 0 . This means E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) ≤ kn E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) ≥ kn and thus we have Pr (cid:2) W ℓ = 1 (cid:3) ≤ kn − k Finally, as W ℓ is an indicator random variable, its variance is at most that given by a Bernoulli randomvariable with success probability kn − k . We conclude with h X i ℓ − h X i ℓ − = Var[ W ℓ ] ≤ Pr[ W ℓ = 1] ≤ kn − k as required. We now determine how ( X ℓ ) concentrates. In [FGL12], the following Azuma-like inequality is proven formartingales. Theorem 16 (Remark 2.1 combined with equations (11) and (13) of [FGL12]) . Let ( X ℓ ) Nℓ =0 be a martingalewith martingale diﬀerences ( Y ℓ ) satisfying Y ℓ ≤ for all ≤ ℓ ≤ N . For every ≤ x ≤ N and ν ≥ , wehave Pr h | X N − X | ≥ x and h X i N ≤ ν i ≤ · (cid:18) ν x + ν (cid:19) x + ν e x The concentration inequalities of [FGL12] are one-sided inequalities as they are stated for supermartin-gales. We use the double-sided version, incurring an additional factor of 2 after taking a union bound withthe negative of ( X ℓ ) . We start with a generic application of Theorem 16 to ﬁt our setting. Lemma 17.

For H a random regular graph drawn from G reg n,d , S ⊆ V such that | S | = k < n , and δ > , wehave the following. Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ · E [ e ( S )] i ≤ (cid:26) − E [ e ( S )] · (cid:20) ( δ + C ) · ln (cid:18) δC + 1 (cid:19) − δ (cid:21)(cid:27) where C = n − n − k Proof.

Let x and ν be given by the following. x = δ · E [ e ( S )] = δ · (cid:18) k (cid:19) dn − ν = k ( k − dn − k = (cid:18) kn (cid:19) dn − · n − n − k By Lemma 15, we have that h X i N ≤ ν with probability one. Hence Pr h | X − X N | ≥ x and h X i n i = Pr h | X − X N | ≥ x i = Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i Applying Theorem 16 for the choice of x, ν above then concludes with the required bound.15s mentioned previously, the purpose of choosing to study edges contained entirely in a set S is becausethe number of edges contained entirely within S can be written as a sum of fewer indicator random variablesthan the number of edges crossing the cut ( S, V − S ) . The diﬀerence between (cid:0) k (cid:1) and k ( n − k ) is notnegligible (in particular for k small) and we take advantage by further splitting our analysis of small cutsdepending on the size of k .A critical point is that one can apply tighter approximations of the exponentiated term in Lemma 17depending on the size of k . When k ≥ O ( n/ √ d ) , applying Lemma 30 yields tighter concentration, while itis better to approximate via Lemma 31 when k ≤ O ( n/ √ d ) . We further remark that though we study thenumber of edges contained entirely in S , justifying that H cut sparsiﬁes G still requires us to compute thedeviation of the number of edges crossing ( S, V − S ) . Consequently, a nk − term will appear in our choiceof δ due to scaling between edges contained within S and crossing ( S, V − S ) . Let us now summarize theconcentration bounds we use in each case via the following lemma. Lemma 18.

There exists a suﬃciently large choice of n ≥ and d ≥ constant such that given a randomdraw H ∼ G reg n,d and any S ⊂ V such that | S | = k where ≤ k ≤ n , the following statements hold1. If δC < , then Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ · E [ e ( S )] i ≤ (cid:18) − · k dn · δ (cid:19) (6)

2. If δC ≥ , then Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ · E [ e ( S )] i ≤ (cid:18) − · k dn · δ ln δ (cid:19) (7) where C = n − n − k Proof.

When δC < , we can apply Lemma 30 to approximate ( δ + C ) · ln (cid:0) δC + 1 (cid:1) in Lemma 17 as follows Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:26) − E [ e ( S )] · (cid:20) ( δ + C ) · ln (cid:18) δC + 1 (cid:19) − δ (cid:21)(cid:27) ≤ (cid:26) − E [ e ( S )] · (cid:20) δ + δ C − δ (cid:21)(cid:27) = 2 exp (cid:26) − E [ e ( S )] · δ C (cid:27) Expanding C and the expectation, we derive exp (cid:26) − E [ e ( S )] · δ C (cid:27) = exp (cid:26) − (cid:18) k (cid:19) · dn − · δ · n − k n − (cid:27) = exp (cid:26) − k dn · δ · · (cid:18) − k (cid:19) · (cid:18) n − (cid:19) · (cid:18) − kn (cid:19)(cid:27) Noticing that with large enough n , and as k ≤ n , we have that exp (cid:26) − k dn · δ · · (cid:18) − k (cid:19) · (cid:18) n − (cid:19) · (cid:18) − kn (cid:19)(cid:27) ≤ exp (cid:26) − k dn · δ · . · · (cid:27) ≤ exp (cid:18) − · k dn · δ (cid:19)

16s required. If δC ≥ , then we can apply Lemma 31 to approximate ( δ + C ) · ln (cid:0) δC + 1 (cid:1) as follows Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) − E [ e ( S )]2 C · δ ln δ (cid:19) Expanding C and the expectation, we derive exp (cid:18) − E [ e ( S )]2 C · δ ln δ (cid:19) = exp (cid:26) − (cid:18) k (cid:19) · dn − · n − k n − · δ ln δ · (cid:27) = exp (cid:26) − k dn · δ ln δ · (cid:18) k − k (cid:19) · (cid:18) nn − (cid:19) · (cid:18) − kn (cid:19) · (cid:27) Since ≤ k ≤ n , and for large enough n , we have that exp (cid:26) − k dn · δ ln δ · (cid:18) k − k (cid:19) · (cid:18) nn − (cid:19) · (cid:18) − kn (cid:19) · (cid:27) ≤ exp (cid:26) − k dn · δ ln δ · · · (cid:27) = exp (cid:18) − · k dn · δ ln δ (cid:19) as required.We now compute the probability that the number of edges contained within S deviates far from itsexpectation. We remark that our choice of C and δ = (cid:0) nk − (cid:1) · . √ d imply that δC grows approximately as nk √ d . In the subsequent proof of Lemma 19, the case of δC < is analogous to when k ≥ Ω( n/ √ d ) while δC ≤ corresponds to k ≤ O ( n/ √ d ) . Lemma 19.

There exists a suﬃciently large choice of n ≥ and d ≥ constant such that for H ∼ G reg n,d andany S ⊂ V such that | S | = k where ≤ k ≤ n , we have Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) nk (cid:19) − . where δ = (cid:0) nk − (cid:1) · . √ d Proof.

With C = n − n − k , suppose δC < , expanding δ in the bound given by equation 6, we have Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) − · k dn · δ (cid:19) = 2 exp (cid:26) − · k dn · (cid:18) nk − (cid:19) · . d (cid:27) We now demonstrate how to upper bound this quantity by (cid:0) nk (cid:1) − . . It is equivalent to demonstrate (cid:18) nk (cid:19) . ≤ exp (cid:26) · k dn · (cid:18) nk − (cid:19) · . d (cid:27) Taking the natural logarithm of both sides, and performing a change of variables α = kn , we have that . · ln (cid:18)(cid:18) nαn (cid:19)(cid:19) ≤ · . · n · α (cid:18) α − (cid:19) As ln (cid:0)(cid:0) nαn (cid:1)(cid:1) ≤ n · H ( α ) where H denotes the binary entropy function, it is suﬃcient to demonstrate H ( α ) ≤ . · (1 − α ) α ≤ . Now suppose δC ≥ . Expanding δ in equation 7, we have the following Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) − · k dn · δ ln δ (cid:19) = 2 exp (cid:26) − · k dn · (cid:18) nk − (cid:19) · . √ d · ln (cid:18)(cid:18) nk − (cid:19) · . √ d (cid:19)(cid:27) Notice that since k ≤ n , we have that − kn ≥ meaning the expression can be upper bounded by exp (cid:26) − · k dn · (cid:18) nk − (cid:19) · . √ d · ln (cid:18)(cid:18) nk − (cid:19) · . √ d (cid:19)(cid:27) ≤ exp (cid:26) − · · . · k √ d · ln (cid:18) . nk √ d · (cid:19)(cid:27) We next claim the following intermediate upper bound. exp (cid:26) − · · . · k √ d · ln (cid:18) . nk √ d · (cid:19)(cid:27) ≤ (cid:18) n k √ d · e . · · (cid:19) − . It is again equivalent to demonstrate the following . · ln (cid:18)(cid:18) n k √ d · e . · · (cid:19)(cid:19) ≤ · · . · k √ d · ln (cid:18) . nk √ d · (cid:19) However, because ln (cid:0) nk (cid:1) ≤ k ln (cid:0) enk (cid:1) , it suﬃces to demonstrate . · k √ d . · · e · ln (cid:18) · . nk √ d · (cid:19) ≤ · · . · k √ d · ln (cid:18) . nk √ d · (cid:19) which is equivalent to ln (cid:18) · . nk √ d · (cid:19) ≤ · (cid:18) (cid:19) · . · . · e · ln (cid:18) . nk √ d · (cid:19) Because lower bounds the constant on the right hand side, it is enough to show ln (cid:18) · . nk √ d · (cid:19) ≤ · ln (cid:18) . nk √ d · (cid:19) As . nk √ d · ≥ δ ≥ C ≥ , we can choose a large enough n such that the above holds. Finally, we show (cid:18) n k √ d · e . · · (cid:19) − . ≤ (cid:18) nk (cid:19) − . by choosing a d large enough since δC ≥ . A choice of d ≥ (cid:0) . · · e (cid:1) suﬃces. Finishing the analysis of the small cuts regime, we now show that the number of edges crossing ( S, V − S ) deviates no more from its expectation than by a . √ d factor with high probability.18 roof of Theorem 11. Denote | S | = k . If k = 1 , then cut H ( S ) = d for any random d regular graph H thus cut H ( S ) − E [cut H ( S )] = 0 . Now consider any ≤ k ≤ n . Because H is d regular, we have cut H ( S ) = kd − · e H ( S ) Thus the event {| cut H ( S ) − E [cut H ( S )] | ≥ . √ d · E [cut H ( S )] } occurs if and only if (cid:12)(cid:12) cut H ( S ) − E [cut H ( S )] (cid:12)(cid:12) ≥ . √ d · E [cut H ( S )] (cid:12)(cid:12) kd − · e H ( S ) − E [ kd − · e H ( S )] (cid:12)(cid:12) ≥ . √ d · E [ kd − · e H ( S )] (cid:12)(cid:12) E [ e H ( S )] − e H ( S ) (cid:12)(cid:12) ≥ . √ d · E [ · e H ( S )] · (cid:18) kd E [ e H ( S )] − (cid:19)(cid:12)(cid:12) e H ( S ) − E [ e H ( S )] (cid:12)(cid:12) ≥ . √ d · E [ · e H ( S )] · (cid:18) n − k − − (cid:19) Now, n − k − − nk − (cid:0) k − (cid:1) ≥ nk − since k ≥ . The probability of the above occurring is at most Pr (cid:20)(cid:12)(cid:12) cut H ( S ) − E [cut( S )] (cid:12)(cid:12) ≥ . √ d · E [ · cut H ( S )] (cid:21) ≤ Pr (cid:20)(cid:12)(cid:12) e H ( S ) − E [ e H ( S )] (cid:12)(cid:12) ≥ . √ d · (cid:18) nk − (cid:19) · E [ · e H ( S )] (cid:21) Applying Lemma 19 using δ = . √ d · (cid:0) nk − (cid:1) implies that the right hand side is at most o n,d (cid:0)(cid:0) nk (cid:1) − (cid:1) .Performing a union bound over at most (cid:0) nk (cid:1) cuts of size k then completes the proof. In the following, if H = ( V, E H , w H ) is an undirected weighted graph and v ∈ V is a vertex, we call the combinatorial degree of v the number of edges incident on v , and we call the weighted degree of v the sumof the weights of the edges incident on v . A random walk in a graph is a process in which we move amongthe vertices of a graph and, at every step, we move from the current node u to a neighbor v of the u withprobability proportional to the weight of the edge ( u, v ) . A non-backtracking random walk is like the aboveprocess except that if at a certain step we move from u to v , then at the subsequent step it is not allowed togo from v to u . For example, a non-backtracking walk in a cycle is a process that, after the ﬁrst step, movesdeterministically around the cycle, either always clockwise or always counterclockwise.In this section we prove the following result. Theorem 20 (Lower Bound for Spectral Sparsiﬁcation) . Let H = ( V, E, w ) be a weighted graph on n verticesand with dn/ edges, so that H has average combinatorial degree d . Let ¯ K n be a clique on V with every edgeweighted /n . Suppose that H is an ǫ spectral sparsiﬁer of ¯ K n and make the following deﬁnition: • Let B be a bound such that for every vertex v ∈ V , at most B vertices of H are reachable from v viapaths of combinatorial length at most g ; • Call V ′ the set of vertices r such that the subgraph induced by the vertices at combinatorial distance atmost g from r contains no cycle. Call V ′′ the set of vertices r such that the same property holds forthe vertices at combinatorial distance at most g . Call n − F the cardinality of V ′′ .Then ǫ ≥ √ d − O (cid:18) g √ d + gd + g ( B + F ) n (cid:19) For example, if the girth of the graph is at least g + 1 and the graph has maximum degree ∆ , then B ≤ ∆ g +1 and F = 0 . If g = d / , and B and F are of size o ( n ) , then the bound on ǫ is ǫ ≥ √ d − O ( d − / ) − o (1) .19 .1 The Test Vectors The condition that H is an ǫ spectral sparsiﬁer of L ¯ K n can be written as ∀ x ∈ R V (1 − ǫ ) x T L ¯ K n x ≤ x T L H ≤ (1 + ǫ ) x T L ¯ K n x which can be written as ∀ x ∈ R V (1 − ǫ ) ( x · x T ) • L ¯ K n ≤ ( x · x T ) • L H ≤ (1 + ǫ ) ( x · x T ) • L ¯ K n where A • B = P i,j A i,j B i,j is the Frobenius inner product between real-valued square matrices. The abovecondition is equivalent to ∀ X (cid:23) (1 − ǫ ) X • L ¯ K n ≤ X • L H ≤ (1 + ǫ ) X • L ¯ K n because all positive semideﬁnite matrices X (cid:23) are convex combinations of rank-1 symmetric matrices ofthe form xx T . We will then be looking for positive semideﬁnite matrices X for which X • L ¯ K n is noticeablydiﬀerent from X • L H . This approach is equivalent to the approach of considering probability distributionsover test vectors x which is taken in [ST18].As in [ST18], we will make a number of assumptions on the structure of H . Such assumptions can bemade without loss of generality, because if they fail then there are simple proofs (given in [ST18] of theconclusion of Theorem 20). The assumptions are the following:1. Every vertex of H has combinatorial degree at least d/ ;2. Every vertex of H has weighted degree between − / √ d and √ d ;3. Every edge ( u, v ) of H has weight at most / √ d .Under all the above assumptions, we will construct two PSD matrices X and Y such that ǫ − ǫ ≥ X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≥ √ d + O (cid:18) g √ d + gd + g ( B + F ) n (cid:19) (8)which will imply the conclusion of the Theorem.For every two vertices r and v , let Pr[ r nb → ℓ v ] be the probability that a non-backtracking ℓ -step randomwalk in H (performed by following edges with probability proportional to their weight) reaches v in the laststep. For every r , deﬁne the vectors f r , h r as f r ( v ) = g X ℓ =0 ( − ℓ q Pr[ r nb → ℓ v ] h r ( v ) = g X ℓ =0 q Pr[ r nb → ℓ v ] Our two PSD matrices are X = X r ∈ V f r f Tr , Y = X r ∈ V h r h Tr To understand the intuition of the above deﬁnition, if r ∈ V ′ then, for every v , there can be at mostone way to reach v from r with a non-backtracking walk of length ≤ g , because otherwise we would see acycle in the subgraph induced by the nodes at distance ≤ g from v contradicting the deﬁnition of V ′ . Thismeans that f r ( v ) = h r ( v ) = 0 if v is at distance more than g from r . If v is at distance ℓ ≤ g from r then f r ( v ) = ( − ℓ q Pr[ r nb → ℓ v ] and h r ( v ) = q Pr[ r nb → ℓ v ] , so that, in particular, f r ( v ) = h r ( v ) .We collect some properties that will be useful 20 act 21. If r ∈ V ′ , then || f r || = || h r || = g + 1 .Proof. If r ∈ V ′ , then for every v , we have f r ( v ) = h r ( v ) = g X ℓ =0 Pr[ r nb → ℓ v ] and | f r || = || h r || = X v g X ℓ =0 Pr[ r nb → ℓ v ] = g + 1 because, for every ﬁxed ℓ , we have X v Pr[ r nb → ℓ v ] = 1 Fact 22. If r V ′ , then ≤ || f r || ≤ ( g + 1) ≤ || h r || ≤ ( g + 1) Proof.

We have || h r || = X v g X ℓ =0 h r ( v ) ! ≤ X v ( g + 1) · g X ℓ =0 h r ( v ) = ( g + 1) where we used Cauchy-Schwarz. The same calculation applies to f r . Fact 23. − Fn ≤ I • Xn · ( g + 1) ≤ gFn and − Fn ≤ I • Yn · ( g + 1) ≤ gFn Proof.

We have I • Y = X r || h r || ≥ X r ∈ V ′ || h r || ≥ ( n − F ) · ( g + 1) I • Y = X r || h r || = X r ∈ V ′ || h r || + X r V ′ || h r || ≤ ( n − F ) · ( g + 1) + F · ( g + 1) and the same calculation for the f r . Fact 24. ≤ Y • J ≤ ( g + 1) Bn Proof. Y • J = X r h f r , i = X r X v g X ℓ =0 q Pr[ r nb → ℓ v ] ! ≤ ( g + 1) B X r X v g X ℓ =0 Pr[ r nb → ℓ v ]= ( g + 1) Bn where we used Cauchy-Schwarz and the fact that, for every r , there are at most B vertices v that arereachable with non-zero probability from r using walks of lengths ≤ g .21 .2 Outline of the Proof We will show that:1. both X • L ¯ K n and Y • L ¯ K n are (1 ± o (1)) ng ;2. Y • D H ≤ (1 + o (1)) ng ;3. X • L H /Y • L H ≥ − o (1)) · ( Y − X ) • A H /Y • D H ;4. ( Y − X ) • A H ≥ (1 − o (1))4 ng/ √ d .So that: X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≥ − o (1) √ d as in (8), where the “ o (1) ” notation refers to terms that are at most an absolute constant times /d / provided that g = d / and B and F are at most n/d .The claims 1, 2 and 3 above are proved using simple properties of the functions f r and h r mentionedabove, and the crux of the argument is the fourth claim, which will follow by showing ( Y − X ) • A H ≥ (1 − o (1)) g P a,b w . a,b P a,b w a,b where w a,b is the weight of edge ( a, b ) in the graph H . We know that P a,b w a,b = (1 ± o (1)) n and that thereare dn pairs a, b such that the edge ( a, b ) has non-zero weight. The convexity of the function x → x . canthen be used to deduce that the expression is minimized when all the non-zero weights are the same X a,b w . a,b ≥ dn · (cid:18) P a,b w a,b dn (cid:19) . from which the fourth claim above will follow. We will now prove the four claims that we made in the previous section.

Lemma 25. Y • L ¯ K n X • L ¯ K n ≥ − O (cid:18) g · ( F + B ) n (cid:19) Proof.

Recall that L ¯ K n = I − n J where J = · T is the matrix that is one everywhere. This means that Y • L ¯ K n = Y • I − n Y • J ≥ n ( g + 1) · (cid:18) − Fn − ( g + 1) Bn (cid:19) X • L ¯ K n = X • I − n X • J ≤ X • I ≤ n ( g + 1) · (cid:18) gFn (cid:19) Lemma 26. (cid:18) − √ d (cid:19) · (cid:18) − Fn (cid:19) ≤ Y • D H n · ( g + 1) ≤ (cid:18) √ d (cid:19) · (cid:18) gFn (cid:19) roof. Follows from a previous claim on Y • I and on the fact that the degree condition on H can be expressedas (cid:18) − √ d (cid:19) · I (cid:22) D H (cid:22) (cid:18) √ d (cid:19) · I Lemma 27. X • L H Y • L H ≥ Y − X ) • A H Y • D H − O (cid:18) gFn (cid:19) Proof. X • L H Y • L H − X − Y ) • L H Y • L H = ( Y − X ) • A H − ( Y − X ) • D H Y • D H − Y • A H ≥ ( Y − X ) • A H − ( Y − X ) • D H Y • D H so that X • L H Y • L H ≥ Y − X ) • A H Y • D H − ( Y − X ) • D H Y • D H We have ( Y − X ) • D H = X r X v w ( v ) · ( h r ( v ) − f r ( v ))= X r V ′ X v w ( v ) · ( h r ( v ) − f r ( v )) ≤ X r V ′ X v w ( v ) · h r ≤ (cid:18) √ d (cid:19) X r V ′ X v h r ≤ (cid:18) √ d (cid:19) · F · ( g + 1) where we used the facts, proved above, that h r ( v ) = f r ( v ) when r ∈ V ′ , that all weighted degrees are atmost / √ d , and that || h r || ≤ ( g + 1) .On the other hand, Y • D H ≥ (cid:18) − √ d (cid:19) · Y • I ≥ (cid:18) − √ d (cid:19) (cid:18) − Fn (cid:19) n · ( g + 1) Lemma 28 (Main) . ( Y − X ) • A H ≥ − O g √ d + gF √ dn !! gn · √ d roof. Finally we come to the main argument. We have ( Y − X ) • A H = X r ∈ V ′ h Tr A H h r − f Tr A H f r + X r V ′ h Tr A H h r − f Tr A H f r where X r V ′ h Tr A H h r − f Tr A H f r ≥ − X r V ′ f Tr A H f r ≥ − X r V ′ || f r || · || A H ||≥ −| F | ( g + 1) (cid:18) √ d (cid:19) so that it remains to study P r ∈ V ′ h Tr A H h r − f Tr A H f r . X r ∈ V ′ h Tr A H h r − f Tr A H f r = X r ∈ V ′ X a,b w a,b · ( h r ( a ) h r ( b ) − f r ( a ) f r ( b ))= 2 X r ∈ V ′ X a,b w a,b h r ( a ) h r ( b ) To motivate the last step, we note that w a,b h r ( a ) h r ( b ) is non-zero iﬀ there is some ℓ ≤ g − such that a isat distance ℓ from r and b is at distance ℓ + 1 from r (or vice versa) and so h r ( a ) h r ( b ) = q Pr[ r nb → ℓ a ] q Pr[ r nb → ℓ +1 b ] and f r ( a ) f r ( b ) = ( − ℓ q Pr[ r nb → ℓ a ]( − ℓ +1 q Pr[ r nb → ℓ +1 b ] = − h r ( a ) h r ( b ) Let us call T r the BFS tree rooted at r and of depth g , and assume that its edges are directed from parentto child. Then we can rewrite X r ∈ V ′ h Tr A H h r − f Tr A H f r = 4 X r ∈ V ′ X ( a,b ) ∈ T r w a,b h r ( a ) h r ( b )= 4 X r ∈ V ′ g − X ℓ =0 X ( a,b ) ∈ T r : dist ( r,a )= ℓ w a,b q Pr[ r nb → ℓ a ] q Pr[ r nb → ℓ +1 b ] ≥ X r ∈ V ′ g − X ℓ =0 X ( a,b ) ∈ T r : dist ( r,a )= ℓ w a,b p Pr[ r → ℓ a ] p Pr[ r → ℓ +1 b ] Pr[ u → t v ] denotes the probability that a t -step standard random walk (in which edges are followedwith probability proportional to their weight) started at u ends at v . We have used the fact that if there isa unique shortest path from u to v and the length of such path is t , then we have Pr[ u nb → t v ] ≥ Pr[ u → t v ] .Another observation is that, in the particular circumstance in which r ∈ V ′ , ( a, b ) ∈ T r , and a hasdistance ℓ from r and b has distance ℓ + 1 from r , we have Pr[ r → ℓ +1 b ] = Pr[ r → ℓ a ] · w a,b w ( a ) and, together with our assumptions on the degrees of the vertices, X r ∈ V ′ h Tr A H h r − f Tr A H f r ≥ (cid:18) − O (cid:18) √ d (cid:19)(cid:19) X r ∈ V ′ g − X ℓ =0 X ( a,b ) ∈ T r : dist ( r,a )= ℓ w . a,b Pr[ r → ℓ a ] where dist ( r, a ) is the length (number of edges) of a shortest path from r to a . Now let us consider theabove inner summation over all pairs a, b such that a is at distance ℓ from r and the edge ( a, b ) exists in T r ,meaning that b is further from r than a is. If p is the predecessor of a in the unique path of length ℓ from r to a , then we have X b = p w . a,b ≥ (cid:18) − O (cid:18) d / (cid:19)(cid:19) X b w . a,b because P b w . a,b ≥ Ω(1) and w . a,p ≤ O (1 /d / ) .We can thus conclude that X r ∈ V ′ h Tr A H h r − f Tr A H f r ≥ (cid:18) − O (cid:18) √ d (cid:19)(cid:19) X r ∈ V ′ g − X ℓ =0 X a : dist ( a,r )= ℓ Pr[ r → ℓ a ] X b w . a,b The next observation is that if a is in V ′′ and r is at distance ≤ g − from a , then r is in V ′ , so we have X r ∈ V ′ g − X ℓ =0 X a : dist ( a,r )= ℓ Pr[ r → ℓ a ] X b w . a,b ≥ X a ∈ V ′′ g − X ℓ =0 X r : dist ( a,r )= ℓ P r [ r → ℓ a ] X b w . a,b ≥ (cid:18) − O (cid:18) g √ d (cid:19)(cid:19) X a ∈ V ′′ g − X ℓ =0 X r : dist ( a,r )= ℓ P r [ a → ℓ r ] X b w . a,b ≥ (cid:18) − O (cid:18) g √ d (cid:19)(cid:19) · g · X a ∈ V ′′ X b w . a,b By a convexity argument mentioned above, X a,b w . a,b ≥ dn (cid:18) P a,b w a,b dn (cid:19) . ≥ (cid:18) − O (cid:18) √ d (cid:19)(cid:19) n √ d and X a V ′′ X b w . a,b ≤ O ( F ) X a ∈ V ′′ X b w . a,b ≥ − O √ d + F √ dn !! n √ d and putting everything together X r ∈ V ′ h Tr A H h r − f Tr A H f r ≥ − O g √ d + F √ dn !! gn · √ d and ( Y − X ) • A H ≥ − O g √ d + gF √ dn !! gn · √ d We can now prove Theorem 20.

Proof of Theorem 20.

Given a graph H that satisﬁes the assumptions of the Theorem and that is an ǫ spectral sparsiﬁer of ¯ K n , deﬁne PSD matrices X and Y as in Section 4.1. We have X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≤ ǫ − ǫ ≤ ǫ + O ( ǫ ) If ǫ > / √ d there is nothing else left to prove. If ǫ ≤ / √ d , then ǫ ≥ · (cid:18) X • L H X • L ¯ K n · Y • L ¯ K n Y • L H − (cid:19) − O (cid:18) d (cid:19) Now, from Lemma 25 and Lemma 27, we have X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≥ (cid:18) Y − X ) • A H Y • D H − O (cid:18) gFn (cid:19)(cid:19) · (cid:18) − O (cid:18) g · ( F + B ) n (cid:19)(cid:19) From Lemma 26 and Lemma 28 we have ( Y − X ) • A H Y • D H ≥ (cid:16) − O (cid:16) g √ d + gF √ dn (cid:17)(cid:17) gn · √ d n · ( g + 1) · (cid:16) √ d (cid:17) · (cid:16) gFn (cid:17) ≥ √ d · − O g + g √ d + gF √ dn !! Putting everything together, ǫ ≥ · √ d · − O g + g √ d + g √ d · ( F + B ) n !! We will now show a separation between cut and spectral sparsiﬁcation of random log n -regular graphs.First, we demonstrate that a random log n regular graph satisﬁes the “pseudo-girth” conditions requiredby Theorem 20. 26 heorem 29. If G is a random regular graph drawn from G reg n, log n , and g is a ﬁxed constant then the followingoccur.1. With probability , for every vertex v of G , the number of vertices of G reachable from v via paths oflength at most g is O ((log n ) g )

2. If we call V ′′ the set of vertices v such that there is no cycle in the vertex-induced subgraph of G inducedby the vertices at distance at most g from v , then with probability − o n (1) over the choice of G , | V ′′ | ≥ n − O ((log n ) g +1 ) Proof.

The ﬁrst property immediately follows from the fact that the combinatorial degree is at most log n .For the second part, ﬁx a vertex v and consider the probability, over the choice of G , that v V ′′ . By applyingthe principle of deferred decision, we ﬁrst generate the log n neighbors of v , then the additional neighbors ofthose neighbors, and so on. Every time we make a decision about how to match a particular vertex x in oneof the log n matchings, the probability of hitting a previously seen vertex is at most O ((log n ) g /n ) and sothe probability that we create a cycle is at most O ((log n ) g /n ) . The conclusion of the theorem follows byapplying Markov’s inequality.We now prove the separation between cut and spectral sparsiﬁcation. We restate Theorem 4 from theintroduction Theorem 4.

16] Alexandr Andoni, Jiecao Chen, Robert Krauthgamer, Bo Qin, David P. Woodruﬀ, and Qin Zhang.On sketching quadratic forms. In

Proceedings of the 2016 ACM Conference on Innovations inTheoretical Computer Science , pages 311–319, 2016.[AZLO15] Zeyuan Allen Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsiﬁcation and regret mini-mization beyond matrix multiplicative updates. In

Proceedings of the Forty-Seventh Annual ACMon Symposium on Theory of Computing, STOC 2015 , pages 237–245, 2015.[BK96] András A. Benczúr and David R. Karger. Approximating s-t minimum cuts in Õ ( n

2) time.In

Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing,Philadelphia, Pennsylvania, USA, May 22-24, 1996 , pages 47–55, 1996.[Bor19] Charles Bordenave. A new proof of Friedman’s second eigenvalue theorem and its extension torandom lifts. In

Annales scientiﬁques de l’Ecole normale supérieure , 2019.[BSS09] Joshua D. Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-Ramanujan sparsiﬁers. In

Proceedings of the 41st ACM Symposium on Theory of Computing , pages 255–262, 2009.[CR02] A. Crisanti and T. Rizzo. Analysis of the ∞ -replica symmetry breaking solution of the sherrington-kirkpatrick model. Phys. Rev. E , 65:046137, Apr 2002.[DMS +

17] Amir Dembo, Andrea Montanari, Subhabrata Sen, et al. Extremal cuts of sparse random graphs.

The Annals of Probability , 45(2):1190–1217, 2017.[Eva10] Lawrence C. Evans.

Partial diﬀerential equations . American Mathematical Society, Providence,R.I., 2010.[FGL12] Xiequan Fan, Ion Grama, and Quansheng Liu. Hoeﬀding’s inequality for supermartingales.

Stochastic Processes and their Applications , 122(10):3545–3559, 2012.[Gue03] Francesco Guerra. Broken replica symmetry bounds in the mean ﬁeld spin glass model.

Commu-nications in mathematical physics , 233(1):1–12, 2003.[JS17] Aukosh Jagannath and Subhabrata Sen. On the unbalanced cut problem and the generalizedsherrington-kirkpatrick model. arXiv preprint arXiv:1707.09042 , 2017.[LS17] Yin Tat Lee and He Sun. An SDP-based algorithm for linear-sized spectral sparsiﬁcation. In

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017 ,pages 678–687, 2017.[Mal19] Enrico M Malatesta.

Random Combinatorial Optimization Problems: Mean Field and Finite-Dimensional Results . PhD thesis, Universit‘a degli Studi di Milano, 2019.[Neu13] Edward Neuman. Inequalities and bounds for the incomplete gamma function.

Results in Math-ematics , 63(3-4):1209–1214, 2013.[P +

45] George Pólya et al. Remarks on computing the probability integral in one and two dimensions.In

Proceedings of the 1st Berkeley Symposium on Mathematical Statistics and Probability , pages63–78, 1945.[Pan14] Dmitry Panchenko. The Parisi formula for mixed p-spin models.

Annals of Probability ,42(3):946–958, 2014.[Par80] G Parisi. A sequence of approximated solutions to the s-k model for spin glasses.

Journal ofPhysics A: Mathematical and General , 13(4):L115–L121, apr 1980.28Sen18] Subhabrata Sen. Optimization on sparse random hypergraphs and spin glasses.

Random Struc-tures & Algorithms , 53(3):504–536, 2018.[SS11] Daniel A. Spielman and Nikhil Srivastava. Graph sparsiﬁcation by eﬀective resistances.

SIAM J.Comput. , 40(6):1913–1926, 2011.[ST11] Daniel Spielman and Shang-Hua Teng. Spectral sparsiﬁcation of graphs.

SIAM Journal onComputing , 40(4):981–1025, 2011.[ST18] Nikhil Srivastava and Luca Trevisan. An Alon-Boppana type bound for weighted graphs andlowerbounds for spectral sparsiﬁcation. In

Proceedings of the Twenty-Ninth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA , pages 1306–1315, 2018.[Tal06] Michel Talagrand. The Parisi formula.

Annals of mathematics , pages 221–263, 2006.29

Analytic Inequalities

In this section, we prove two analytic inequalities required by our martingale concentration analysis. Theﬁrst lemma is used to approximate the exponent in the concentration bound provided by Lemma 17 in thecase where δC is small. Lemma 30.

For any δ, C ≥ such that δC ≤ , we have that (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) ≥ δ + δ C Proof.

Proceed by expanding δ ln (cid:0) δC + 1 (cid:1) via its Taylor approximation δ ln (cid:18) δC + 1 (cid:19) = δ · (cid:18) ∞ X t =1 ( − t +1 δ t C t t (cid:19) = ∞ X t =1 ( − t +1 δ t +1 C t t similarly for C ln (cid:0) δc + 1 (cid:1) , we have C · ln (cid:18) δC + 1 (cid:19) = C · (cid:18) ∞ X t =1 ( − t +1 δ t C t t (cid:19) = ∞ X t =1 ( − t +1 δ t C t − t = δ + ∞ X t =1 ( − t δ t +1 C t ( t + 1) Combining the two expansions, we derive (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) = δ + ∞ X t =1 ( − t +1 (cid:18) δ t +1 C t (cid:19)(cid:18) t − t + 1 (cid:19) ≥ δ + δ C The second lemma is used to approximate the exponent in Lemma 17 when δC is large. Lemma 31.

For any δ ≥ C ≥ , we have (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) − δ ≥ C · δ ln δ Proof.

Denote f ( C, δ ) = (cid:0) δ + C (cid:1) ln (cid:0) δC + 1 (cid:1) − δ − C · δ ln δ . It suﬃces to demonstrate f ( C, δ ) ≥ for all δ ≥ C ≥ . To see this, ﬁrst note f ( C, δ ) ≥ for all δ = C ≥ as we have f ( C, δ ) = (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) − δ − C · δ ln δ = 2 δ · ln(2 δ ) − δ − ln δ which is true for any δ ≥ . We next compute ∂f∂δ as follows. ∂f∂δ = ln (cid:18) δC + 1 (cid:19) − C · (cid:0) ln δ + 1 (cid:1) = ln (cid:18) δ/C + 1 δ / C (cid:19) − C If we can show that ∂f∂δ ≥ for all δ ≥ C ≥ , then we would have that f is non-negative along δ = C ,and non-decreasing along the positive δ direction past the δ = C line. It must then be that f is non-negativefor all δ ≥ C ≥ . Towards this, observe it is equivalent to demonstrate (cid:18) δC + 1 (cid:19) C ≥ δe With g ( C, δ ) = (cid:0) δC + 1 (cid:1) c − δe , we notice that for all δ = C ≥ we have g ( C, δ ) = (cid:18) δC + 1 (cid:19) C − δC = 2 δ − δe ≥ δC ≥ . Meanwhile, observe that ∂g∂δ = 2 (cid:18) δC + 1 (cid:19) C − − e ≥ · − − e ≥ Consequently, g ( C, δ ) ≥ implying (cid:0) δC + 1 (cid:1) C ≥ δe implying ∂f∂δ ≥0