Cut Sparsification of the Clique Beyond the Ramanujan Bound: A Separation of Cut Versus Spectral Sparsification
aa r X i v : . [ c s . D S ] A ug Cut Sparsification of the Clique Beyond the Ramanujan Bound
Antares ChenUniversity of Chicago Jonathan ShiBocconi University Luca TrevisanBocconi UniversityAugust 14, 2020
Abstract
We prove that a random d -regular graph, with high probability, is a cut sparsifier of the clique withapproximation error at most (cid:16) q π + o n,d (1) (cid:17) / √ d , where q π = 1 . . . . and o n,d (1) denotes an errorterm that depends on n and d and goes to zero if we first take the limit n → ∞ and then the limit d → ∞ .This is established by analyzing linear-size cuts using techniques of Jagannath and Sen [JS17] derivedfrom ideas from statistical physics and analyzing small cuts via martingale inequalities.We also prove that every spectral sparsifier of the clique having average degree d and a certainhigh “pseudo-girth” property has an approximation error that is at least the “Ramanujan bound” (2 − o n,d (1)) / √ d , which is met by d -regular Ramanujan graphs, generalizing a lower bound of Srivastava andTrevisan [ST18].Together, these results imply a separation between spectral sparsification and cut sparsification. If G is a random log n -regular graph on n vertices, we show that, with high probability, G admits a (weightedsubgraph) cut sparsifier of average degree d and approximation error at most (1 . . . . + o n,d (1)) / √ d ,while every (weighted subgraph) spectral sparsifier of G having average degree d has approximation errorat least (2 − o n,d (1)) / √ d . If G = ( V, E G , w G ) is a, possibly weighted, undirected graph, a cut sparsifier of G with error ǫ is a weightedgraph H = ( V, E H , w H ) over the same vertex set of G and such that ∀ S ⊆ V (1 − ǫ ) cut G ( S ) ≤ cut H ( S ) ≤ (1 + ǫ ) cut G ( S ) (1)where cut G ( S ) denotes the number of edges in G with one endpoint in S and one endpoint in V − S , orthe total weight of such edges in the case of weighted graphs. This definition is due to Benczur and Karger[BK96].Spielman and Teng [ST11] introduced the stronger definition of spectral sparsification . A weighted graph H = ( V, E H , w H ) is a spectral sparsifier of G = ( V, E G , w G ) with error ǫ if ∀ x ∈ R V (1 − ǫ ) x T L G x ≤ x T L H x ≤ (1 + ǫ ) x T L G x (2)where L G is the Laplacian matrix of the graph G . If A G is the adjacency matrix of G and D G is the diagonalmatrix of weighted degrees, then the Laplacian matrix is L G = D G − A G and it has the property that, forevery vector x ∈ R V , x T L G x = X ( u,v ) ∈ E G w u,v · ( x u − x v ) The definition of spectral sparsifier is stronger than the definition of cut sparsifier because, if x = S is the0/1 indicator vector of a set S , then we have x T L G x = cut G ( S ) . So we see that the definition in (1) isequivalent to the specialization of the definition of (2) to the case of Boolean vectors x ∈ { , } V .1n all the known constructions of sparsifiers, the edge set E H of the sparsifier is a subset of the edge set E G of the graph G . We will take this condition to be part of the definition of sparsifier.A cut sparsifier H of a graph G has, approximately, the same cut structure of G , so that, if we areinterested in approximately solving a problem involving cuts or flows in G , we may instead solve the problemon H and be guaranteed that an approximate solution computed for H is also an approximate solution for G . As the name suggests, for every graph G it is possible to find a cut sparsifier H of G which is very sparse,and running an algorithm on a sparse graph yields a faster running time than running it on G , if G is notsparse itself.A spectral sparsifier H of G has all the properties of a cut sparsifier, and, furthermore, it can be substitutedfor G in some additional applications. For example, if we approximately solve the Laplacian linear system L H x = b , and H is a good spectral sparsifier of G , then the resulting solution will also be an approximatesolution to the Laplacian linear system L G x = b . If the matrix L H is sparser than the matrix L G , solving L H x = b will be faster than solving L G x = b .Benczur and Karger [BK96] showed that, for every graph G , a cut sparsifier with error ǫ having O ( ǫ − n log n ) edges can be computed in nearly linear time. Spielman and Teng [ST11] proved that a spectral sparsifierwith error ǫ having O ( ǫ − n (log n ) O (1) ) edges can be computed in nearly linear time. Spielman and Srivas-tava [SS11] improved the number of edges that suffice to construct a spectral sparsifier to O ( ǫ − n log n ) , andBatson, Spielman and Srivastava [BSS09] reduced it to O ( ǫ − n ) . Up to the constant in the big-Oh notation,the O ( ǫ − n ) bound is best possible, because every ǫ cut sparsifier of the clique (and, for a stronger reason,every ǫ spectral sparsifier of the clique) requires Ω( ǫ − n ) edges [ACK + O ( ǫ − n ) edges running in nearly quadratic time [AZLO15] and nearly linear time [LS17].In this paper we focus on the combinatorial problem of understanding the minimum number of edgesthat suffice to achieve cut and spectral sparsification, regardless of the efficiency of the construction. Inparticular, we aim to understand the best possible constant in the Θ( ǫ − n ) bound mentioned above.Currently, the construction (or even non-constructive existence proof) of cut sparsifiers for general graphswith the smallest number of edges is the one due to Batson, Spielman and Srivastava, which also achievesspectral sparsification with the same parameters. In particular, prior to this work, there was no evidencethat cut sparsification is “easier” than spectral sparsification, in the sense of requiring a smaller number ofedges. In this paper we show that random log n -regular graphs, with high probability, can be cut-sparsifiedwith better parameters than they can be spectrally-sparsified, if one requires the sparsifier to use a subset ofthe edges of the graph to be sparisified. Under a conjecture of Srivastava and Trevisan, the same separationwould apply to sparsifiers of the clique.In the following, instead of referring to the number of edges in the sparsifier as a function of the errorparameter ǫ and of the number of vertices n , it will be cleaner to refer to the error parameter ǫ as a functionof the average degree d of the sparsifier (that is, we call dn/ the number of edges of the sparsifier).The construction of Batson, Spielman and Srivastava achieves error (2 √ / √ d with a sparsifier of averagedegree d , for general graphs. Batson, Spielman and Srivastava also show that every sparsifier of the cliqueof average degree d has error at least / √ d . Srivastava and Trevisan [ST18] prove that every sparsifier ofthe clique of average degree d and girth ω n (1) (that is, with girth that grows with the number of edges)that spectrally sparsifies the clique has error at least (2 − o n,d (1)) / √ d . An appropriately scaled d -regularRamanujan graph is a spectral sparsifier of the clique with error (2 + o n,d (1)) / √ d , so we will refer to / √ d as the Ramanujan bound for sparsification. Srivastava and Trevisan conjecture that the Ramanujan boundis best possible for all graphs that sparsify the clique.
Conjecture 1 (Srivastava and Trevisan) . Every family of weighted graphs of average degree d that are ǫ spectral sparsifiers of the clique satisfy ǫ > (2 − o d (1)) / √ d . .1 Our Results Our main result is that it is possible to do better than the Ramanujan bound for cut sparsification of theclique.In the following, we use G reg n,d to denote the distribution over random d -regular multigraphs on n verticescreated by taking the disjoint union of d random perfect matchings. We will always assume that n is even. Theorem 2 (Main) . With − o n (1) probability, a random regular graph drawn from G reg n,d , in which all edgesare weighted ( n − /d , is a (cid:16) q π + o n,d (1) (cid:17) / √ d cut sparsifier of the clique, where q π = 1 . ... Together with Conjecture 1, the above theorem gives a conditional separation between the error-densitytradeoffs of cut sparsification versus spectral sparsification of the clique.In order to achieve an unconditional separation, we prove a generalization of the result of Srivastava andTrevisan to families of graphs that satisfy a property that is weaker than the property of having large girth(it is enough that most vertices, rather than all vertices, see no cycles within a certain distance) which wethen use to prove the following result.
Theorem 3. If G is a random regular graph drawn from G reg n, log n , then the following happens with highprobability over the choices of G : for all graphs H of average degree d which are weighted edge-subgraphs of G , if H is an ǫ -spectral sparsifier of G then ǫ ≥ (2 − o n,d (1)) / √ d . Using the fact that a random log n -regular graph is, with high probability, a O (1 / √ log n ) spectral spar-sifier of the clique, and that a random log n -regular graph contains a random d -regular graph as a subgraph,we have our separation. Theorem 4.
Let G be a random regular graph drawn from G reg n, log n . Then with probability − o n (1) over thechoice of G the following happens for every d :1. There is a weighted subgraph H of G with dn/ edges such that H is an ǫ cut sparsifier of G with ǫ ≤ (1 . ... + o n,d (1)) / √ d ;2. For every weighted subgraph H of G with dn/ edges, if H is an ǫ spectral sparsifier of G then ǫ ≥ (2 − o n,d (1)) / √ d . Our main result, Theorem 2, is established by analyzing cuts of linear size using rigorous techniques thathave been derived from statistical physics [JS17] and by analyzing sublinear size cuts using martingaleconcentration bounds.For a fixed set S of k = αn ≤ n/ vertices, the average number of edges that leave S in a random d -regular graph is dn − · k · ( n − k ) and we are interested in showing that for every such set the deviationfrom the expectation is at most ǫ dn − · k · ( n − k ) , for ǫ ≤ . .../ √ d . One approach is to set up a martingale and apply an Azuma-like inequality. In this approach, it is betterto study the deviation from the expectation of the number of edges that are entirely contained in S . Thisis because, in a regular graph, the deviation from the expectation of the number of edges crossing thecut ( S, V − S ) is entirely determined by the deviation from the expectation of the number of edges entirelycontained in S , and the latter can be written as a sum of fewer random variables (that is, (cid:0) k (cid:1) versus k · ( n − k ) ),especially for small k . After setting up the appropriate Doob martingale, we can prove that the probabilitythat the cut ( S, V − S ) deviates from the expectation by more than . .../ √ d times the expectation is atmost e − Ω( n ) if k ≥ Ω( n/ √ d ) and at most e − Ω( dk log( n/dk )) for k ≤ O ( n/ √ d ) . In particular, there is an α > such that for all k ≤ α n the probability of having a large deviation is much smaller than / (cid:0) nk (cid:1) , in a waythat enables a union bound. These calculations are carried out in Section 3.3nfortunately, such “first moment” calculations cannot be pushed all the way to α = 1 / . This isbecause our calculations with deviation bounds and union bounds are equivalent to estimating the averagenumber of cuts that have a relative error bigger than . .../ √ d , with the goal of showing that such averagenumber is much smaller than one. Unfortunately, the average number of balanced cuts that have a relativeerror bigger than / √ d is bigger than one, so we cannot hope to get a separation from the spectral boundswith first moment calculations. We then turn to techniques derived from statistical physics in order to analyze large cuts. To illustrate thisapproach, consider the classical problem of bounding the typical value of the max cut optimum in Erdős-Rényi random graphs G n, / , up to o ( n . ) error terms. This is equivalent to the problem of understandingthe typical value of max σ ∈{± } n σ T M σ (3)where M is a random symmetric matrix with independent uniform ± entries off the diagonal and zerodiagonal.A first step is to prove, by an interpolation argument, that, up to lower order o ( n . ) additive error, theoptimum of (3) is the same as the optimum of max σ ∈{± } n σ T W σ (4)where W is a Wigner matrix, a random symmetric matrix with zero diagonal and independent and standardnormally distributed off-diagonal entries.Finding the optimum of (4) up to an additive error o ( n . ) is a standard problem in statistical physics: itis the problem of determining the zero-temperature free energy of a spin-glass model called the Sherrington-Kirkpatrick model, or SK model for short.Parisi [Par80] defined a family of differential equations, and presented a heuristic argument accordingto which the infimum of the solutions of those differential equations, would give the free energy of the SKmodel. That infimum is now called the Parisi formula. Parisi’s approach was extremely influential andwidely generalized. Guerra [Gue03] rigorously proved that a solution to each of the differential equationsgives an upper bound on the free energy, and, in a monumental work, Talagrand [Tal06] rigorously provedthe stronger claim that the Parisi formula is equal to the free energy of the SK model. Talagrand’s work wasfurther generalized by Panchenko [Pan14].Dembo, Montanari and Sen [DMS +
17] proved an interpolation result showing that the solution to (4)can also be used to bound the max cut in random sparse graphs of constant average degree d , including bothrandom d -regular graphs G reg n,d and Erdős-Rényi random graphs G n,d/n . Jagannath and Sen [JS17] provedinterpolation theorems for the problem of determining the max cut out of sets of size αn , for fixed constant α , in G n,d/n and in G reg n,d graph, and they proved that the two models have different asymptotic bounds when < α < / .In particular, to find the maximum (and the minimum) over all sets S of cardinality αn of cut G ( S ) in arandom d -regular graph, Jagannath and Sen prove that one has to study max σ ∈ S n ( α ) σ T Π T W Π σ (5)where S n ( α ) is the subset of vectors σ ∈ {± } n that contain exactly αn ones, and Π = I − n J is the matrixthat projects on the space orthogonal to (1 , , . . . , . The restriction to S n ( α ) models the restriction to cuts ( S, V − S ) where | S | = αn , and the projection defines a matrix Π T W Π such that all rows and all columnssum to zero, in analogy to the fact that, in a regular graph, all rows and all columns of the adjacency matrixhave the same sum.Jagganath and Sen also define a Parisi-type family of differential equations and they rigorously provethat a solution to any of those equations provides an upper bound to (5). Since their goal is to compare cuts4n regular graphs to cuts in Erdős-Rényi graphs, rather than bounding cut sizes in random regular graphs,they do not provide solutions to their Parisi-type equations. In Section 2 we compute the replica-symmetric solution and get an explicit bound.From the bound, we get that, for every fixed α , with high probability, sets of size αn in a random d -regulargraph satisfy the definition of ǫ cut sparsification of the clique with ǫ ≤ r π + o n,d (1) ! · √ d = 1 . . . . + o n,d (1) √ d A tight upper bound on ǫ , which would come from an exact solution of (5), is likely to be / √ d times thevalue of the Parisi formula evaluated at zero temperature and no external field (approximately . / √ d [CR02]), although we have not attempted to prove this. As discussed above, we established that a random d -regular graph is an ǫ cut sparsifier with ǫ ≤ (1 . ... + o (1)) / √ d . Under Conjecture 1, this gives a conditional separation between the error-vs-density tradeoff forcut sparsification of the clique compared to spectral sparsification of the clique.If we consider sparsifiers that are weighted edge-subgraphs of the graph to sparsify, we can obtain anunconditional separation if we can find a random family of graphs that: • Contain random d -regular graphs as edge-induced subgraphs • Are, with high probability, o n,d (1 / √ d ) spectral sparsifiers of the clique • Are such that, with high probability, no weighted edge-induced subgraph of average degree d can be aspectral sparsifier of the clique with error smaller than (2 + o n,d (1)) / √ d Then, if we consider the graphs G n in this family, and let H n be a random d -regular graph containedin G n , we have that the following happens with high probability: H n is a graph of average degree d that isa (1 . ... + o n,d (1)) / √ d cut sparsifier of the clique and also a (1 . ... + o n,d (1)) / √ d cut sparisifier of G n ,but every edge-weighted subgraph of G n of average degree d which is an ǫ spectral sparsifier of G n is alsoan ǫ + o n,d (1 / √ d ) spectral sparsifier of the clique and hence satisfies ǫ ≥ (2 − o n,d (1)) / √ d .Srivastava and Trevisan prove that the Ramunajan bound is optimal for families of graphs of growinggirth, but it is not possible to use this property in the above plan because a family of random graphs cannot,with high probability, both contain random d -regular graphs and have large girth. To overcome this difficulty,we generalize the result of Srivastava and Trevisan to graphs that have a large “pseudo-girth” g , that is, thatare such that for a − o (1) fraction of the nodes v there is no cycle in the ball centered at v of radius g/ . We define test vectors for every graph, and show that, if the graph satisfies the pseudogirth conditionwith g = d / , then the test vectors show that, if the graph is an ǫ spectral sparsifier of the clique then ǫ ≥ / √ d − O ( d − / ) − o n (1) .The pseudogirth condition is satisfied by several families of random regular graphs and Erdős-Rényirandom graphs. In particular, random ∆ n -regular graphs, for any choice of the degree ∆ n in the range d . ≤ ∆ n ≤ n / d satisfies the three conditions above and can be used to establish the separation. Forconcreteness, we have stated our result for ∆ n = log n . The notions of cut sparsifier and of spectral sparsifier of the clique are interesting generalizations of the notionof expander graph, and they allow graphs that are possibly weighted and irregular. As with expander graphs,it seems worthwhile to study sparsifiers as fundamental combinatorial objects, beyond their applications tothe design of efficient graph algorithms.A proof of Conjecture 1 would give us a significant generalization of the Alon-Boppana theorem, and itwould be a very interesting result. 5t is plausible that the clique is the hardest graph to sparsify, both for cut sparsification and for spectralsparsification. This would mean that the error in the construction of Batson, Spielman and Srivastava can beimproved from √ / √ d to / √ d , up to lower order terms, and that there is a construction (or perhaps a non-constructive existence proof) of cut sparsifiers of general graphs with error smaller than . / √ d , up to lowererror terms. At present, unfortunately, there is no promising approach to construct (or non-constructivelyprove existence) of cut sparsifiers of general graphs error below / √ d , or even below √ / √ d . We show that random regular graphs are good cut sparsifiers of the clique over cuts with vertex set S oflinear size, so that | S | = αn for constant α . Theorem 5 (Linear Set Regime for Cut Sparsification) . For every constant α ∈ (0 , , almost always over H ∼ G reg n,d , (cid:12)(cid:12)(cid:12)(cid:12) cut H ( S ) E S ′ ∈S α cut H ( S ′ ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ √ d r π + o n,d (1) ! , where S α = { S ⊆ V | | S | = αn } and q π = 1 . . . . . First we refer to a lemma showing that the maximum cut with relative cut volume α concentrates aroundits expectation, so that we reduce the problem to understanding the expected value of the maximum cut. Wealso state its version for minimum cuts, derived by flipping signs and using sign symmetries in the statementand proof of the lemma, in accordance with [JS17, Remark 1]. Lemma 6 (Lemma 2.1 of [JS17]) . Pr H ∼G reg n,d "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) max S ∈S α n cut H ( S ) − E H ′ ∼G reg n,d (cid:20) max S ′ ∈S α n cut H ′ ( S ′ ) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε ≤ e − nε /d , Pr H ∼G reg n,d "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) min S ∈S α n cut H ( S ) − E H ′ ∼G reg n,d (cid:20) min S ′ ∈S α n cut H ′ ( S ′ ) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ε ≤ e − nε /d . As discussed in Section 1.2.2, we now invoke techniques of statistical mechanics developed in the studyof spin glasses, specifically the SK model and its generalizations.After the Parisi formula was proven to solve the SK model, Denbo, Montanari, and Sen [DMS + cut( S ) where | S | is a constanttimes n , as we study here, relating these problems to a generalization of the SK model.The SK model has internal energy σ T W σ/ √ n for W ∈ R n × n a symmetric Wigner matrix with stan-dard Gaussian entries on the off-diagonals and zero on the diagonals, to be optimized over configurations σ ∈ {± } n . The generalization studies the optimization problem with the same matrix W and the sameconfiguration space {± } n but with internal energy H (1) W ( σ ) = 1 √ n σ T Π W Π σ, where Π is the orthogonal projection away from the all-ones vector. In this model, finding the extremalcuts of a given relative vertex density α corresponds to optimizing that energy over the restricted set ofconfigurations S n ( α ) = ( σ ∈ {± } n : X i σ i = n (2 α − ) . This definition corresponds to that used in [JS17], and is larger by a factor of than a convention used in some other places.
6e may formulate this equivalently as optimizing H (0) W ( σ ) = 1 √ n σ T W σ over a different alphabet σ ∈ {± − (2 α − } , with graph cuts of relative vertex density α corresponding tothe set of configurations A n ( T ( α ) , ε n ) = ( σ ∈ {± − (2 α − } : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i σ i − T ( α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ε n ) , with T ( α ) = 4 α (1 − α ) and setting ε n = 0 to achieve the equivalence.Finally, Jagannath and Sen [JS17] used an analytical annealing approach to solve this generalized model,yielding the generalization of the Parisi formula stated here: Definition 7.
Let a ν be a measure over [0 , T ] of the form ν = m ( t )d t + cδ T with m ( t ) non-negative, non-decreasing, and everywhere right-continuous with left limits (cadlag), where d t is the uniform measure and δ T is the Dirac delta function at t = T . Then for λ ∈ R and T ( α ) = 4 α (1 − α ) , we define the ground stateenergy functional P T ( α ) ( ν, λ ) = u ν,λ (0 , − λT ( α ) − Z T ( α )0 s d ν ( s ) where u ν,λ is the solution to the differential equation with boundary condition ∂u∂t + 2 ∂ u∂x + m ( t ) (cid:18) ∂u∂x (cid:19) = 0 , ( t, x ) ∈ [0 , T ( α )) × R ,u ( x, T ( α )) = max ζ ∈{± − M } ζx + ( λ + 2 c ) ζ , where M = 2 α − . This definition reduces to the original Parisi formula at zero temperature and external field in the casethat T = 1 and M = 0 and when the infimum over ν is taken.This generalized Parisi formula relates to average extremal cuts on random regular graphs in the followingway. Theorem 8 (Combination of Theorem 1.2 and Lemma 2.2 of [JS17]) . Let T ( α ) = 4 α (1 − α ) . As n → ∞ , E H ∼G reg n,d max S ∈S α | cut H ( S ) − dα (1 − α ) | ≤ √ d n inf ν,λ P T ( α ) ( ν, λ ) + o d (1 / √ d ) . Proof.
By [JS17, Lemma 2.2], almost always over the randomness of W as n → ∞ , E H ′ (cid:20) max S ′ ∈S α n cut H ′ ( S ′ ) (cid:21) = dα (1 − α ) + 14 √ d E W (cid:20) max σ ∈ S n ( α ) n H (1) W ( σ ) (cid:21) + o d ( √ d ) . As alluded to in [JS17, Remark 1], Lemma 2.2 of [JS17] holds also for minimum cuts: this requires onlychanging some signs and invoking a few instances of sign-flip symmetry in the proof. E H ′ (cid:20) min S ′ ∈S α n cut H ′ ( S ′ ) (cid:21) = dα (1 − α ) − √ d E W (cid:20) max σ ∈ S n ( α ) n H (1) W ( σ ) (cid:21) − o d ( √ d ) . By the equivalence described earlier in this section and the fact that A n ( T ( α ) , ⊆ A n ( T ( α ) , ε n ) for anysequence of ε n > , max σ ∈ S n ( α ) n H (1) W ( σ ) ≤ max σ ∈ A n ( T ( α ) ,ε n ) n H (0) W ( σ )
7y [JS17, Theorem 1.2], for ε n → slowly enough as n → ∞ , it holds almost surely over W that lim n →∞ max σ ∈ A n ( T ( α ) ,ε n ) n H (0) W ( σ ) = inf ν,λ P T ( α ) ( ν, λ ) . Combining the above equations yields the theorem statement.It is not yet known how to efficiently compute the exact value of the Parisi formula or its generalization.We circumvent this issue by providing an upper bound, by choosing a particularly simple measure ν to boundthe infimum inf ν,λ P T ( α ) ( ν, λ ) . Specifically, the choice of ν = cδ T with m ( t ) = 0 is known as the replica-symmetric ansatz [Mal19, Chapter 2], corresponding to the first of Parisi’s original sequence of estimates. Lemma 9. inf ν,λ P T ( α ) ( ν, λ ) ≤ p α (1 − α ) · √ π e − (erf − (2 α − , where erf is the Gauss error function erf( x ) = √ π R x − x e − x d x .Proof. First we express R T s d ν ( s ) = cT + R T tm ( t )d t and reparameterize ˆ λ = λ + 2 c so that we can write inf ν,λ P T ( α ) ( ν, λ ) = inf ν, ˆ λ ˆ u ν, ˆ λ (0 , − ˆ λT − Z T tm ( t )d t where ˆ u ν, ˆ λ is the solution to ∂u∂t + 2 ∂ u∂x + m ( t ) (cid:18) ∂u∂x (cid:19) = 0 , ( t, x ) ∈ [0 , T ) × R ,u ( x, T ) = max ζ ∈{± − M } ζx + ˆ λζ , with M = 2 α − .By taking ν ( t ) = cδ T so that m ( t ) = 0 , we can upper-bound the infimum over ν , so that inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ ˆ u ˆ λ (0 , − ˆ λT and ˆ u ˆ λ is the solution to ∂u∂t + 2 ∂ u∂x = 0 , ( t, x ) ∈ [0 , T ) × R ,u ( x, T ) = max ζ ∈{± − M } ζx + ˆ λζ . By reparameterizing t as − t here, we can see that u ( x, is simply the result of evolving u ( x, T ) according tothe heat equation with diffusivity constant for a time of T . Evolution of the heat equation with diffusivity k over a time of T is equivalent to convolution with the Gaussian heat kernel exp( − x / (4 kT )) / √ πkT [Eva10,Chapter 2.3], so ˆ u ˆ λ ( x,
0) = 1 √ πT Z ∞−∞ e − z / (8 T ) (cid:18) max ζ ∈{± − M } ζ ( z + x ) + ˆ λζ (cid:19) d z. Thus inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ πT Z ∞−∞ e − z / (8 T ) (cid:18) max ζ ∈{± − M } ζz + ˆ λζ (cid:19) d z − ˆ λT. max ζ ∈{± − M } ζz + ˆ λζ = max (cid:16) − z − M z + ˆ λ (1 + 2 M + M ) , z − M z + ˆ λ (1 − M + M ) (cid:17) = − M z + ˆ λ (1 + M ) + max( − z + 2 M ˆ λ, z − M ˆ λ )= − M z + ˆ λ (1 + M ) + | z − M ˆ λ | , so that inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ πT Z ∞−∞ e − z / (8 T ) (cid:16) − M z + ˆ λ (1 + M ) + (cid:12)(cid:12)(cid:12) z − M ˆ λ (cid:12)(cid:12)(cid:12)(cid:17) d z − ˆ λT. Partially evaluating the integral using the facts that a Gaussian probability density function integrates to 1and, by oddness of the integrand, R ∞−∞ ze − z / (8 T ) d z = 0 , inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ πT Z ∞−∞ e − z / (8 T ) (cid:12)(cid:12)(cid:12) z − M ˆ λ (cid:12)(cid:12)(cid:12) d z + ˆ λ (1 − T + M ) . Employing a change of variables z → √ T z to write the integral in terms of the normal Gaussian probabilitydensity φ ( z ) = √ π e − z / and also applying the identity − T = 1 + 4 α − α = M , inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ Z ∞−∞ φ ( z ) (cid:12)(cid:12)(cid:12) √ T z − M ˆ λ (cid:12)(cid:12)(cid:12) d z + 2ˆ λM . Focusing now on the integral, Z ∞−∞ φ ( z ) (cid:12)(cid:12)(cid:12) √ T z − M ˆ λ (cid:12)(cid:12)(cid:12) d z = Z ∞ M ˆ λ/ √ T φ ( z ) (cid:16) √ T z − M ˆ λ (cid:17) d z + Z M ˆ λ/ √ T −∞ φ ( z ) (cid:16) − √ T z + 2 M ˆ λ (cid:17) d z = Z − M ˆ λ/ √ T −∞ φ ( z ) (cid:16) − √ T z − M ˆ λ (cid:17) d z + Z M ˆ λ/ √ T −∞ φ ( z ) (cid:16) − √ T z + 2 M ˆ λ (cid:17) d z, where we negated and flipped the limits of the first integral, which is equivalent to negating the odd partof the integrand while preserving the even part. Continuing to integrate, letting Φ( z ) denote the Gaussiancumulative density function, = Z − M ˆ λ/ √ T −∞ − √ T z φ ( z )d z + Z M ˆ λ/ √ T −∞ − √ T z φ ( z )d z + 2 M ˆ λ Z M ˆ λ/ √ T − M ˆ λ/ √ T φ ( z )d z, = h √ T φ ( z ) i − M ˆ λ/ √ T −∞ + h √ T φ ( z ) i M ˆ λ/ √ T −∞ + 2 M ˆ λ (cid:16) Φ( M ˆ λ/ √ T ) − Φ( − M ˆ λ/ √ T ) (cid:17) = 4 √ T φ ( M ˆ λ/ √ T ) + 2 M ˆ λ erf( M ˆ λ/ √ T ) , where we used evenness of φ and the fact that Φ( x ) − Φ( − x ) = erf( x/ √ in the last step. So, putting thisevaluation of the integral into our previous expression, inf ν,λ P T ( α ) ( ν, λ ) ≤ inf ˆ λ √ T φ ( M ˆ λ/ √ T ) + 2 M ˆ λ erf( M ˆ λ/ √ T ) + 2ˆ λM . By finding the critical point of this expression with respect to ˆ λ , we find a value of ˆ λ = −√ T erf − ( M ) /M .Using this value for ˆ λ , inf ν,λ P T ( α ) ( ν, λ ) ≤ √ T φ ( −√ − ( M )) − M ˆ λM + 2ˆ λM = 4 √ T φ ( √ − ( M )) .
9e calculate the largest concrete value attained by the upper bound of the preceding lemma:
Lemma 10.
For all α ∈ (0 , , inf ν,λ P T ( α ) ( ν, λ )4 α (1 − α ) ≤ r π = 1 . ... Proof.
By Lemma 9, for α ∈ (0 , , inf ν,λ P T ( α ) ( ν, λ )4 α (1 − α ) ≤ p α (1 − α ) · √ π e − (erf − (2 α − := f ( α ) . Evaluated at α = 1 / , this is equal to p /π , so we just need to show that the upper bound f ( α ) ismaximized at α = 1 / .First we reparameterize g ( M ) = f ( α ) with M = 2 α − so that g ( M ) = 1 p π (1 − M ) e − (erf − ( M )) and we want to show that g is maximized at . Using the product rule to take the derivative of g , since ddM √ − M = M (1 − M ) / and ddM e − (erf − ( M )) = −√ π erf − ( M ) , g ′ ( M ) = 1 p π (1 − M ) M e − (erf − ( M )) − M − √ π erf − ( M ) ! . We take another monotonic reparameterization, introducing erf( x ) for M : g ′ (erf( x )) = 1 p π (1 − erf( x ) ) erf( x ) e − x − erf( x ) − √ πx ! . By Polya [P + erf( x ) < √ − e − x /π so that − erf( x ) ≥ e − x /π , so that, for x < when erf( x ) < , g ′ (erf( x )) ≥ p π (1 − erf( x ) ) (cid:16) erf( x ) e (4 /π − x − √ πx (cid:17) . And by Neuman [Neu13], erf( x ) ≥ x √ π e − x / , so when x < , g ′ (erf( x )) ≥ p π (1 − erf( x ) ) (cid:18) x √ π e (4 /π − / x − √ πx (cid:19) . And as e (4 /π − / x ≤ , this makes it clear that g ′ (erf( x )) is positive when x is negative, which means that g is increasing on the negative part of its domain, which by evenness of g means that g is maximized at . This gives us all the ingredients necessary to prove the main theorem of this section. Proof of Theorem 5.
Combine Lemma 6, Corollary 8, and Lemma 10 with the fact that E S ′ ∈S α cut H ( S ′ ) = ndα (1 − α ) . 10 Analysis for small cuts
In this section, we demonstrate that the number of edges crossing a cut ( S, V − S ) deviates no more fromits expectation than by a . √ d factor with high probability when | S | is small. Theorem 11 (Small Set Regime for Cut Sparsification) . There exists sufficiently large n ≥ and constant d ≥ such that, for any S ⊂ V where | S | = α n and α ≤ , a sample H ∼ G reg n,d admits with probability (1 − o n (1)) (cid:12)(cid:12)(cid:12)(cid:12) cut H ( S ) E H ∼G reg n,d [cut H ( S )] − (cid:12)(cid:12)(cid:12)(cid:12) ≤ . √ d Our analysis will require the use of a Doob martingale.
Definition 12.
Given random variables A and ( Z ℓ ) Nℓ =1 sampled from a common probability space, theirassociated Doob martingale is given by random variables ( X ℓ ) Nℓ =0 where X = E [ A ] and X ℓ = E [ A | Z , . . . , Z ℓ ] We note that ( Z ℓ ) is often called the filtration that ( X ℓ ) is defined with respect to. For a Doob martingale ( X ℓ ) Nℓ =0 , we denote its martingale difference sequence by ( Y ℓ ) where Y ℓ = X ℓ − X ℓ − and its quadraticcharacteristic sequence by ( h X i ℓ ) where h X i ℓ = ℓ X r =1 E (cid:2) Y r | Z . . . Z r − (cid:3) As mentioned previously, the small cuts analysis will quantify the number of edges contained entirelywithin a cut and use the fact that, in a regular graph, the number of edges across a cut is uniquely determinedby the number of edges within the cut. For a graph H , we will denote e H ( S ) by the number of edges e ∈ E H with both endpoints contained within S ⊆ V . When H is sampled from a distribution, it is understood that e H ( S ) is a random variable. Consider H a random regular graph drawn from G reg n,d . Enumerate its vertices by i ∈ [ n ] , and its constituentmatchings by m ∈ [ d ] . For S ⊂ V of size | S | = k , we will assume without loss of generality that S = { , . . . , k } .Next, consider the sequence of matching-vertex pairs (cid:0) ( m ℓ , i ℓ ) (cid:1) Nℓ =1 enumerating each ( m, i ) ∈ [ d ] × [ k − where N = d · ( k − . Let us now define the sequence of random variables ( Z ℓ ) Nℓ =1 where Z ℓ = Z ( m ℓ ,i ℓ ) ∈ V is the vertex that matching m ℓ matches i ℓ ∈ V to in H . Note that e ( S ) = N X ℓ =1 { Z ℓ ∈ [ k ] and Z ℓ > i ℓ } We now construct the Doob martingale on e ( S ) using ( Z ℓ ) as a filtration. The matched edge-vertex revealmartingale ( X ℓ ) Nℓ =0 is given by X ℓ = E [ e ( S ) | Z , . . . , Z ℓ ] . One should think of this martingale as countingthe number of edges contained within S . As an increasing number of Z ℓ are conditioned on, informationregarding what edges exist in H is revealed in an ordered way. The order in which an edge is revealed isgiven by the enumeration of the vertices adjacent to the edge, and the matching the edge belonged to when H was first sampled from d random matchings. Additionally, notice that vertex k is excluded from such pairs ( m ℓ , i ℓ ) . This is because m ℓ can only match k to i ℓ < k for the edge to be contained in S . Consequently,revealing edges adjacent to { , . . . , k − } suffices to uniquely determine e ( S ) .Our analysis of ( X ℓ ) will now proceed as follows. We first determine bounds on the martingale differenceand quadratic characteristic of ( X ℓ ) . These bounds are then used by a standard martingale concentrationresult to argue that the number of edges contained within S cannot deviate far from its expectation. Finally,we complete the proof of Theorem 11 by using the fact that concentration in the number of edges within S immediately implies concentration in the number of edges in cut H ( S ) when H is a random d regular graph.11 .2 Properties of the Martingale To bound the martingale difference and quadratic characteristic of ( X ℓ ) , we examine how e ( S ) behaves asan increasing number of Z ℓ are conditioned on. We say that { z , . . . , z ℓ } ⊆ [ n ] is a valid realization of Z ℓ ifthere exists a d regular graph H such that each ( i ℓ , z ℓ ) ∈ E H . When z , . . . , z ℓ are deterministically provided,we can define the following quantities.1. a ℓ = a ℓ ( z , . . . , z ℓ ) is the number of remaining vertices in S that remain unmatched as a function of z , . . . , z ℓ . We denote a = | S | = k .2. b ℓ = b ℓ ( z , . . . , z ℓ ) is the number of remaining vertices in V that remain unmatched as a function of z , . . . , z ℓ . We denote b = | V | = n .We will also consider a ℓ ( z , . . . , z ℓ − , Z ℓ ) and b ℓ ( z , . . . , z ℓ − , Z ℓ ) where Z ℓ is sampled according to thefiltration specified in X ℓ . In this case, a ℓ and b ℓ are random variables distributed according to that of therandom variable Z ℓ . When z , . . . , z ℓ are a valid realization, we can demonstrate a bound on the ratio a ℓ b ℓ . Lemma 13.
Let H ∼ G reg n,d be a random regular graph, S ⊆ V such that | S | = k < n , and N = d · ( k − .For any ≤ ℓ ≤ N and valid realization z , . . . , z ℓ , it happens that a ℓ b ℓ ≤ kn Proof.
We proceed via induction on ℓ . For the base case, ℓ = 0 implies we have a b = kn . Let us now assumethe lemma holds for ℓ − . Notice that any choice of z ℓ admits one of three cases.1. z ℓ ∈ [ k ] and z ℓ > i ℓ . This corresponds to z ℓ revealing the existence of an edge not previously known tobe in S when considering only z , . . . , z ℓ − . Hence a ℓ = a ℓ − − and b ℓ = b ℓ − − and a ℓ b ℓ = a ℓ − − b ℓ − − ≤ a ℓ − b ℓ − ≤ kn with the last inequality following by the inductive hypothesis.2. z ℓ ∈ [ k ] however z ℓ < i ℓ . This corresponds to i ℓ having already been matched to j ∈ [ k ] as revealed by z j for j < ℓ . Thus, a ℓ = a ℓ − and b ℓ = b ℓ − and the inductive hypothesis is maintained.3. z ℓ / ∈ [ k ] however z ℓ > i ℓ . This corresponds to m ℓ matching i ℓ to a vertex not in S . Thus a ℓ = a ℓ − and b ℓ = b ℓ − and so a ℓ b ℓ = a ℓ − − b ℓ − − ≤ a ℓ − − b ℓ − − n/k = a ℓ − − k/n · n/kb ℓ − − n/k < nk where the second inequality follows as k ≤ n and the last inequality follows by the following principle: pq < r implies p − rwq − w < r for all p, q, r, w ∈ Z ≥ and we choose p = a ℓ − , q = b ℓ − , r = kn , and w = nk .In all cases, we have that the lemma holds for ℓ , thus completing the induction.We now bound the martingale difference of ( X ℓ ) . Lemma 14.
Let H ∼ G reg n,d be a random regular graph, S ⊆ V such that | S | = k < n , and N = d · ( k − .Then Y ℓ associated with ( X ℓ ) Nℓ =0 admits | Y ℓ | ≤ for all i ∈ [ N ] .Proof. As the d constituent matchings of H are sampled independently and uniformly at random, it sufficesto assume d = 1 , and hence N = k − . Now let φ ( a, b ) be the expected number of edges contained inside asubset of a vertices in a uniformly sampled perfect matching on b vertices. φ ( a, b ) is the quantity φ ( a, b ) = (cid:18) a (cid:19) · b − ℓ , we begin by fixing a valid realization of random variables Z = z , . . . , Z ℓ = z ℓ and observethat X ℓ − can be computed as X ℓ − = E [ e ( S ) | Z = z , . . . , Z ℓ − = z ℓ − ]= E (cid:20) N X r =1 { Z r ∈ [ k ] and Z r > r } (cid:12)(cid:12)(cid:12)(cid:12) Z = z , . . . , Z ℓ − = z ℓ − (cid:21) = ℓ − X r =1 { z r ∈ [ k ] and z r > r } + φ ( a ℓ − , b ℓ − ) where we have used linearity of expectations to separate terms of e ( S ) that have been conditioned to be z r , and those that remain random. X ℓ is similarly given by the following. X ℓ = ℓ X r =1 { z r ∈ [ k ] and z r > r } + φ ( a ℓ , b ℓ ) We can now compute Y ℓ as Y ℓ = X ℓ − X ℓ − = { z ℓ ∈ [ k ] and z ℓ > ℓ } + (cid:0) φ ( a ℓ , b ℓ ) − φ ( a ℓ − , b ℓ − ) (cid:1) Let us denote w ℓ = { z ℓ ∈ [ k ] and z ℓ > ℓ } . It is either the case that w ℓ = 1 or w ℓ = 0 . Assuming w ℓ = 1 ,we first demonstrate that Y ℓ ≤ . In this case vertex ℓ is adjacent to z ℓ ∈ S . Consequently, a ℓ = a ℓ − − and b ℓ = b ℓ − − and we have Y ℓ = w ℓ + (cid:0) φ ( a ℓ − − , b ℓ − − − φ ( a ℓ − , b ℓ − ) (cid:1) = 1 + (cid:18) a ℓ − − (cid:19) · b ℓ − − − (cid:18) a ℓ − (cid:19) · b ℓ − −
1= 1 + (cid:18) a ℓ − − (cid:19) · (cid:18) b ℓ − − − b ℓ − − (cid:19) − a ℓ − − b ℓ − − ≤ a ℓ − − b ℓ − − − a ℓ − − b ℓ − −
1= 1 as required. Completing the analysis for w ℓ = 1 , we demonstrate that Y ℓ ≥ . Y ℓ = 1 + (cid:18) a ℓ − − (cid:19) · (cid:18) b ℓ − − − b ℓ − − (cid:19) − a ℓ − − b ℓ − − ≥ − a ℓ − b ℓ − ≥ − kn The last inequality follows from an application of Lemma 13. Suppose now that w ℓ = 0 . Since ( X ℓ ) Nℓ =1 is a Doob martingale, E [ Y ℓ ] = 0 for all ℓ . This implies that Y ℓ < < since in fact Y ℓ > whenever w ℓ = 1 .All that remains to demonstrate is that Y ℓ > − . Observe that w ℓ = 0 implies one of two cases.1. z ℓ ∈ [ k ] however z ℓ < ℓ . Then a ℓ = a ℓ − and b ℓ = b ℓ − implying Y ℓ = 0 .2. z ℓ / ∈ [ k ] however z ℓ > ℓ . Then a ℓ = a ℓ − − and b ℓ = b ℓ − − . We then compute Y ℓ as Y ℓ = w ℓ + (cid:0) φ ( a ℓ − − , b ℓ − − − φ ( a ℓ − , b ℓ − ) (cid:1) (cid:18) a ℓ − − (cid:19) · b ℓ − − − (cid:18) a ℓ − (cid:19) · b ℓ − − (cid:18) a ℓ − − (cid:19) · (cid:18) b ℓ − − − b ℓ − − (cid:19) − a ℓ − − b ℓ − − ≥ − a ℓ − b ℓ − ≥ − kn where the last inequality follows from Lemma 13.In both cases, Y ℓ > − since k ≤ n , thus completing the proof.Lemma 14 precisely computes how X ℓ behaves as ℓ increases. If it is revealed that m ℓ matches Z ℓ to i ℓ < Z ℓ (thus within S ), then X ℓ increases by some amount in the interval [1 − kn , . Otherwise X ℓ decreasesby an amount in [ − kn , . Using this enables us to bound the quadratic characteristic, and understand howthe variance of e ( S ) accumulates as subsequent Z ℓ are conditioned on. Lemma 15.
Let H ∼ G reg n,d be a random regular graph, S ⊆ V such that | S | = k < n , N = d · ( k − . For ( X ℓ ) Nℓ =0 , we have h X i N ≤ k ( k − dn − k with probability 1.Proof. It is sufficient to demonstrate h X i ℓ − h X i ℓ − ≤ kn − k for all ℓ ∈ [ N ] as we would have h X i N = N X ℓ =2 (cid:0) h X i ℓ − h X i ℓ − (cid:1) ≤ N · kn − k ≤ k ( k − dn − k Assume without loss of generality that d = 1 and fix ℓ along with a valid realization Z = z , . . . , Z ℓ − = z ℓ − . One can calculate the following fact h X i ℓ − h X i ℓ − = Var[ { Z ℓ ∈ [ k ] and Z ℓ > ℓ } = 1] Denote the indicator random variable W ℓ = { Z ℓ ∈ [ k ] and Z ℓ > ℓ } . To bound the variance of theindicator, we seek to determine Pr[ W ℓ = 1] with randomness taken over choice of Z ℓ . Recall that Y ℓ = W ℓ + (cid:0) φ ( a ℓ , b ℓ ) − φ ( a ℓ − , b ℓ − ) (cid:1) Note Y ℓ is a random quantity since W ℓ , a ℓ = a ℓ ( z , . . . , z ℓ − , Z ℓ ) , and b ℓ = b ℓ ( z , . . . , z ℓ − , Z ℓ ) eachdepend on a sample Z ℓ . It remains however that E [ Y ℓ ] = 0 implying (cid:2) W ℓ = 1 (cid:3) + E (cid:2)(cid:0) φ ( a ℓ , b ℓ ) − φ ( a ℓ − , b ℓ − ) (cid:1)(cid:3) and hence Pr (cid:2) W ℓ = 1 (cid:3) = E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) (cid:3) Let us condition the expectation as follows. Pr (cid:2) W ℓ = 1 (cid:3) = Pr[ W ℓ = 0] · E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) + Pr[ W ℓ = 1] · E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) ≤ E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) + Pr[ W ℓ = 1] · E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) Implying Pr (cid:2) W ℓ = 1 (cid:3) ≤ E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) − E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) Y ℓ ≥ − kn if W ℓ = 1 , while Y ℓ ≥ − kn if W ℓ = 0 . This means E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 0 (cid:3) ≤ kn E (cid:2) φ ( a ℓ − , b ℓ − ) − φ ( a ℓ , b ℓ ) | W ℓ = 1 (cid:3) ≥ kn and thus we have Pr (cid:2) W ℓ = 1 (cid:3) ≤ kn − k Finally, as W ℓ is an indicator random variable, its variance is at most that given by a Bernoulli randomvariable with success probability kn − k . We conclude with h X i ℓ − h X i ℓ − = Var[ W ℓ ] ≤ Pr[ W ℓ = 1] ≤ kn − k as required. We now determine how ( X ℓ ) concentrates. In [FGL12], the following Azuma-like inequality is proven formartingales. Theorem 16 (Remark 2.1 combined with equations (11) and (13) of [FGL12]) . Let ( X ℓ ) Nℓ =0 be a martingalewith martingale differences ( Y ℓ ) satisfying Y ℓ ≤ for all ≤ ℓ ≤ N . For every ≤ x ≤ N and ν ≥ , wehave Pr h | X N − X | ≥ x and h X i N ≤ ν i ≤ · (cid:18) ν x + ν (cid:19) x + ν e x The concentration inequalities of [FGL12] are one-sided inequalities as they are stated for supermartin-gales. We use the double-sided version, incurring an additional factor of 2 after taking a union bound withthe negative of ( X ℓ ) . We start with a generic application of Theorem 16 to fit our setting. Lemma 17.
For H a random regular graph drawn from G reg n,d , S ⊆ V such that | S | = k < n , and δ > , wehave the following. Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ · E [ e ( S )] i ≤ (cid:26) − E [ e ( S )] · (cid:20) ( δ + C ) · ln (cid:18) δC + 1 (cid:19) − δ (cid:21)(cid:27) where C = n − n − k Proof.
Let x and ν be given by the following. x = δ · E [ e ( S )] = δ · (cid:18) k (cid:19) dn − ν = k ( k − dn − k = (cid:18) kn (cid:19) dn − · n − n − k By Lemma 15, we have that h X i N ≤ ν with probability one. Hence Pr h | X − X N | ≥ x and h X i n i = Pr h | X − X N | ≥ x i = Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i Applying Theorem 16 for the choice of x, ν above then concludes with the required bound.15s mentioned previously, the purpose of choosing to study edges contained entirely in a set S is becausethe number of edges contained entirely within S can be written as a sum of fewer indicator random variablesthan the number of edges crossing the cut ( S, V − S ) . The difference between (cid:0) k (cid:1) and k ( n − k ) is notnegligible (in particular for k small) and we take advantage by further splitting our analysis of small cutsdepending on the size of k .A critical point is that one can apply tighter approximations of the exponentiated term in Lemma 17depending on the size of k . When k ≥ O ( n/ √ d ) , applying Lemma 30 yields tighter concentration, while itis better to approximate via Lemma 31 when k ≤ O ( n/ √ d ) . We further remark that though we study thenumber of edges contained entirely in S , justifying that H cut sparsifies G still requires us to compute thedeviation of the number of edges crossing ( S, V − S ) . Consequently, a nk − term will appear in our choiceof δ due to scaling between edges contained within S and crossing ( S, V − S ) . Let us now summarize theconcentration bounds we use in each case via the following lemma. Lemma 18.
There exists a sufficiently large choice of n ≥ and d ≥ constant such that given a randomdraw H ∼ G reg n,d and any S ⊂ V such that | S | = k where ≤ k ≤ n , the following statements hold1. If δC < , then Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ · E [ e ( S )] i ≤ (cid:18) − · k dn · δ (cid:19) (6)
2. If δC ≥ , then Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ · E [ e ( S )] i ≤ (cid:18) − · k dn · δ ln δ (cid:19) (7) where C = n − n − k Proof.
When δC < , we can apply Lemma 30 to approximate ( δ + C ) · ln (cid:0) δC + 1 (cid:1) in Lemma 17 as follows Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:26) − E [ e ( S )] · (cid:20) ( δ + C ) · ln (cid:18) δC + 1 (cid:19) − δ (cid:21)(cid:27) ≤ (cid:26) − E [ e ( S )] · (cid:20) δ + δ C − δ (cid:21)(cid:27) = 2 exp (cid:26) − E [ e ( S )] · δ C (cid:27) Expanding C and the expectation, we derive exp (cid:26) − E [ e ( S )] · δ C (cid:27) = exp (cid:26) − (cid:18) k (cid:19) · dn − · δ · n − k n − (cid:27) = exp (cid:26) − k dn · δ · · (cid:18) − k (cid:19) · (cid:18) n − (cid:19) · (cid:18) − kn (cid:19)(cid:27) Noticing that with large enough n , and as k ≤ n , we have that exp (cid:26) − k dn · δ · · (cid:18) − k (cid:19) · (cid:18) n − (cid:19) · (cid:18) − kn (cid:19)(cid:27) ≤ exp (cid:26) − k dn · δ · . · · (cid:27) ≤ exp (cid:18) − · k dn · δ (cid:19)
16s required. If δC ≥ , then we can apply Lemma 31 to approximate ( δ + C ) · ln (cid:0) δC + 1 (cid:1) as follows Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) − E [ e ( S )]2 C · δ ln δ (cid:19) Expanding C and the expectation, we derive exp (cid:18) − E [ e ( S )]2 C · δ ln δ (cid:19) = exp (cid:26) − (cid:18) k (cid:19) · dn − · n − k n − · δ ln δ · (cid:27) = exp (cid:26) − k dn · δ ln δ · (cid:18) k − k (cid:19) · (cid:18) nn − (cid:19) · (cid:18) − kn (cid:19) · (cid:27) Since ≤ k ≤ n , and for large enough n , we have that exp (cid:26) − k dn · δ ln δ · (cid:18) k − k (cid:19) · (cid:18) nn − (cid:19) · (cid:18) − kn (cid:19) · (cid:27) ≤ exp (cid:26) − k dn · δ ln δ · · · (cid:27) = exp (cid:18) − · k dn · δ ln δ (cid:19) as required.We now compute the probability that the number of edges contained within S deviates far from itsexpectation. We remark that our choice of C and δ = (cid:0) nk − (cid:1) · . √ d imply that δC grows approximately as nk √ d . In the subsequent proof of Lemma 19, the case of δC < is analogous to when k ≥ Ω( n/ √ d ) while δC ≤ corresponds to k ≤ O ( n/ √ d ) . Lemma 19.
There exists a sufficiently large choice of n ≥ and d ≥ constant such that for H ∼ G reg n,d andany S ⊂ V such that | S | = k where ≤ k ≤ n , we have Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) nk (cid:19) − . where δ = (cid:0) nk − (cid:1) · . √ d Proof.
With C = n − n − k , suppose δC < , expanding δ in the bound given by equation 6, we have Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) − · k dn · δ (cid:19) = 2 exp (cid:26) − · k dn · (cid:18) nk − (cid:19) · . d (cid:27) We now demonstrate how to upper bound this quantity by (cid:0) nk (cid:1) − . . It is equivalent to demonstrate (cid:18) nk (cid:19) . ≤ exp (cid:26) · k dn · (cid:18) nk − (cid:19) · . d (cid:27) Taking the natural logarithm of both sides, and performing a change of variables α = kn , we have that . · ln (cid:18)(cid:18) nαn (cid:19)(cid:19) ≤ · . · n · α (cid:18) α − (cid:19) As ln (cid:0)(cid:0) nαn (cid:1)(cid:1) ≤ n · H ( α ) where H denotes the binary entropy function, it is sufficient to demonstrate H ( α ) ≤ . · (1 − α ) α ≤ . Now suppose δC ≥ . Expanding δ in equation 7, we have the following Pr h(cid:12)(cid:12) e ( S ) − E [ e ( S )] (cid:12)(cid:12) ≥ δ E [ e ( S )] i ≤ (cid:18) − · k dn · δ ln δ (cid:19) = 2 exp (cid:26) − · k dn · (cid:18) nk − (cid:19) · . √ d · ln (cid:18)(cid:18) nk − (cid:19) · . √ d (cid:19)(cid:27) Notice that since k ≤ n , we have that − kn ≥ meaning the expression can be upper bounded by exp (cid:26) − · k dn · (cid:18) nk − (cid:19) · . √ d · ln (cid:18)(cid:18) nk − (cid:19) · . √ d (cid:19)(cid:27) ≤ exp (cid:26) − · · . · k √ d · ln (cid:18) . nk √ d · (cid:19)(cid:27) We next claim the following intermediate upper bound. exp (cid:26) − · · . · k √ d · ln (cid:18) . nk √ d · (cid:19)(cid:27) ≤ (cid:18) n k √ d · e . · · (cid:19) − . It is again equivalent to demonstrate the following . · ln (cid:18)(cid:18) n k √ d · e . · · (cid:19)(cid:19) ≤ · · . · k √ d · ln (cid:18) . nk √ d · (cid:19) However, because ln (cid:0) nk (cid:1) ≤ k ln (cid:0) enk (cid:1) , it suffices to demonstrate . · k √ d . · · e · ln (cid:18) · . nk √ d · (cid:19) ≤ · · . · k √ d · ln (cid:18) . nk √ d · (cid:19) which is equivalent to ln (cid:18) · . nk √ d · (cid:19) ≤ · (cid:18) (cid:19) · . · . · e · ln (cid:18) . nk √ d · (cid:19) Because lower bounds the constant on the right hand side, it is enough to show ln (cid:18) · . nk √ d · (cid:19) ≤ · ln (cid:18) . nk √ d · (cid:19) As . nk √ d · ≥ δ ≥ C ≥ , we can choose a large enough n such that the above holds. Finally, we show (cid:18) n k √ d · e . · · (cid:19) − . ≤ (cid:18) nk (cid:19) − . by choosing a d large enough since δC ≥ . A choice of d ≥ (cid:0) . · · e (cid:1) suffices. Finishing the analysis of the small cuts regime, we now show that the number of edges crossing ( S, V − S ) deviates no more from its expectation than by a . √ d factor with high probability.18 roof of Theorem 11. Denote | S | = k . If k = 1 , then cut H ( S ) = d for any random d regular graph H thus cut H ( S ) − E [cut H ( S )] = 0 . Now consider any ≤ k ≤ n . Because H is d regular, we have cut H ( S ) = kd − · e H ( S ) Thus the event {| cut H ( S ) − E [cut H ( S )] | ≥ . √ d · E [cut H ( S )] } occurs if and only if (cid:12)(cid:12) cut H ( S ) − E [cut H ( S )] (cid:12)(cid:12) ≥ . √ d · E [cut H ( S )] (cid:12)(cid:12) kd − · e H ( S ) − E [ kd − · e H ( S )] (cid:12)(cid:12) ≥ . √ d · E [ kd − · e H ( S )] (cid:12)(cid:12) E [ e H ( S )] − e H ( S ) (cid:12)(cid:12) ≥ . √ d · E [ · e H ( S )] · (cid:18) kd E [ e H ( S )] − (cid:19)(cid:12)(cid:12) e H ( S ) − E [ e H ( S )] (cid:12)(cid:12) ≥ . √ d · E [ · e H ( S )] · (cid:18) n − k − − (cid:19) Now, n − k − − nk − (cid:0) k − (cid:1) ≥ nk − since k ≥ . The probability of the above occurring is at most Pr (cid:20)(cid:12)(cid:12) cut H ( S ) − E [cut( S )] (cid:12)(cid:12) ≥ . √ d · E [ · cut H ( S )] (cid:21) ≤ Pr (cid:20)(cid:12)(cid:12) e H ( S ) − E [ e H ( S )] (cid:12)(cid:12) ≥ . √ d · (cid:18) nk − (cid:19) · E [ · e H ( S )] (cid:21) Applying Lemma 19 using δ = . √ d · (cid:0) nk − (cid:1) implies that the right hand side is at most o n,d (cid:0)(cid:0) nk (cid:1) − (cid:1) .Performing a union bound over at most (cid:0) nk (cid:1) cuts of size k then completes the proof. In the following, if H = ( V, E H , w H ) is an undirected weighted graph and v ∈ V is a vertex, we call the combinatorial degree of v the number of edges incident on v , and we call the weighted degree of v the sumof the weights of the edges incident on v . A random walk in a graph is a process in which we move amongthe vertices of a graph and, at every step, we move from the current node u to a neighbor v of the u withprobability proportional to the weight of the edge ( u, v ) . A non-backtracking random walk is like the aboveprocess except that if at a certain step we move from u to v , then at the subsequent step it is not allowed togo from v to u . For example, a non-backtracking walk in a cycle is a process that, after the first step, movesdeterministically around the cycle, either always clockwise or always counterclockwise.In this section we prove the following result. Theorem 20 (Lower Bound for Spectral Sparsification) . Let H = ( V, E, w ) be a weighted graph on n verticesand with dn/ edges, so that H has average combinatorial degree d . Let ¯ K n be a clique on V with every edgeweighted /n . Suppose that H is an ǫ spectral sparsifier of ¯ K n and make the following definition: • Let B be a bound such that for every vertex v ∈ V , at most B vertices of H are reachable from v viapaths of combinatorial length at most g ; • Call V ′ the set of vertices r such that the subgraph induced by the vertices at combinatorial distance atmost g from r contains no cycle. Call V ′′ the set of vertices r such that the same property holds forthe vertices at combinatorial distance at most g . Call n − F the cardinality of V ′′ .Then ǫ ≥ √ d − O (cid:18) g √ d + gd + g ( B + F ) n (cid:19) For example, if the girth of the graph is at least g + 1 and the graph has maximum degree ∆ , then B ≤ ∆ g +1 and F = 0 . If g = d / , and B and F are of size o ( n ) , then the bound on ǫ is ǫ ≥ √ d − O ( d − / ) − o (1) .19 .1 The Test Vectors The condition that H is an ǫ spectral sparsifier of L ¯ K n can be written as ∀ x ∈ R V (1 − ǫ ) x T L ¯ K n x ≤ x T L H ≤ (1 + ǫ ) x T L ¯ K n x which can be written as ∀ x ∈ R V (1 − ǫ ) ( x · x T ) • L ¯ K n ≤ ( x · x T ) • L H ≤ (1 + ǫ ) ( x · x T ) • L ¯ K n where A • B = P i,j A i,j B i,j is the Frobenius inner product between real-valued square matrices. The abovecondition is equivalent to ∀ X (cid:23) (1 − ǫ ) X • L ¯ K n ≤ X • L H ≤ (1 + ǫ ) X • L ¯ K n because all positive semidefinite matrices X (cid:23) are convex combinations of rank-1 symmetric matrices ofthe form xx T . We will then be looking for positive semidefinite matrices X for which X • L ¯ K n is noticeablydifferent from X • L H . This approach is equivalent to the approach of considering probability distributionsover test vectors x which is taken in [ST18].As in [ST18], we will make a number of assumptions on the structure of H . Such assumptions can bemade without loss of generality, because if they fail then there are simple proofs (given in [ST18] of theconclusion of Theorem 20). The assumptions are the following:1. Every vertex of H has combinatorial degree at least d/ ;2. Every vertex of H has weighted degree between − / √ d and √ d ;3. Every edge ( u, v ) of H has weight at most / √ d .Under all the above assumptions, we will construct two PSD matrices X and Y such that ǫ − ǫ ≥ X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≥ √ d + O (cid:18) g √ d + gd + g ( B + F ) n (cid:19) (8)which will imply the conclusion of the Theorem.For every two vertices r and v , let Pr[ r nb → ℓ v ] be the probability that a non-backtracking ℓ -step randomwalk in H (performed by following edges with probability proportional to their weight) reaches v in the laststep. For every r , define the vectors f r , h r as f r ( v ) = g X ℓ =0 ( − ℓ q Pr[ r nb → ℓ v ] h r ( v ) = g X ℓ =0 q Pr[ r nb → ℓ v ] Our two PSD matrices are X = X r ∈ V f r f Tr , Y = X r ∈ V h r h Tr To understand the intuition of the above definition, if r ∈ V ′ then, for every v , there can be at mostone way to reach v from r with a non-backtracking walk of length ≤ g , because otherwise we would see acycle in the subgraph induced by the nodes at distance ≤ g from v contradicting the definition of V ′ . Thismeans that f r ( v ) = h r ( v ) = 0 if v is at distance more than g from r . If v is at distance ℓ ≤ g from r then f r ( v ) = ( − ℓ q Pr[ r nb → ℓ v ] and h r ( v ) = q Pr[ r nb → ℓ v ] , so that, in particular, f r ( v ) = h r ( v ) .We collect some properties that will be useful 20 act 21. If r ∈ V ′ , then || f r || = || h r || = g + 1 .Proof. If r ∈ V ′ , then for every v , we have f r ( v ) = h r ( v ) = g X ℓ =0 Pr[ r nb → ℓ v ] and | f r || = || h r || = X v g X ℓ =0 Pr[ r nb → ℓ v ] = g + 1 because, for every fixed ℓ , we have X v Pr[ r nb → ℓ v ] = 1 Fact 22. If r V ′ , then ≤ || f r || ≤ ( g + 1) ≤ || h r || ≤ ( g + 1) Proof.
We have || h r || = X v g X ℓ =0 h r ( v ) ! ≤ X v ( g + 1) · g X ℓ =0 h r ( v ) = ( g + 1) where we used Cauchy-Schwarz. The same calculation applies to f r . Fact 23. − Fn ≤ I • Xn · ( g + 1) ≤ gFn and − Fn ≤ I • Yn · ( g + 1) ≤ gFn Proof.
We have I • Y = X r || h r || ≥ X r ∈ V ′ || h r || ≥ ( n − F ) · ( g + 1) I • Y = X r || h r || = X r ∈ V ′ || h r || + X r V ′ || h r || ≤ ( n − F ) · ( g + 1) + F · ( g + 1) and the same calculation for the f r . Fact 24. ≤ Y • J ≤ ( g + 1) Bn Proof. Y • J = X r h f r , i = X r X v g X ℓ =0 q Pr[ r nb → ℓ v ] ! ≤ ( g + 1) B X r X v g X ℓ =0 Pr[ r nb → ℓ v ]= ( g + 1) Bn where we used Cauchy-Schwarz and the fact that, for every r , there are at most B vertices v that arereachable with non-zero probability from r using walks of lengths ≤ g .21 .2 Outline of the Proof We will show that:1. both X • L ¯ K n and Y • L ¯ K n are (1 ± o (1)) ng ;2. Y • D H ≤ (1 + o (1)) ng ;3. X • L H /Y • L H ≥ − o (1)) · ( Y − X ) • A H /Y • D H ;4. ( Y − X ) • A H ≥ (1 − o (1))4 ng/ √ d .So that: X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≥ − o (1) √ d as in (8), where the “ o (1) ” notation refers to terms that are at most an absolute constant times /d / provided that g = d / and B and F are at most n/d .The claims 1, 2 and 3 above are proved using simple properties of the functions f r and h r mentionedabove, and the crux of the argument is the fourth claim, which will follow by showing ( Y − X ) • A H ≥ (1 − o (1)) g P a,b w . a,b P a,b w a,b where w a,b is the weight of edge ( a, b ) in the graph H . We know that P a,b w a,b = (1 ± o (1)) n and that thereare dn pairs a, b such that the edge ( a, b ) has non-zero weight. The convexity of the function x → x . canthen be used to deduce that the expression is minimized when all the non-zero weights are the same X a,b w . a,b ≥ dn · (cid:18) P a,b w a,b dn (cid:19) . from which the fourth claim above will follow. We will now prove the four claims that we made in the previous section.
Lemma 25. Y • L ¯ K n X • L ¯ K n ≥ − O (cid:18) g · ( F + B ) n (cid:19) Proof.
Recall that L ¯ K n = I − n J where J = · T is the matrix that is one everywhere. This means that Y • L ¯ K n = Y • I − n Y • J ≥ n ( g + 1) · (cid:18) − Fn − ( g + 1) Bn (cid:19) X • L ¯ K n = X • I − n X • J ≤ X • I ≤ n ( g + 1) · (cid:18) gFn (cid:19) Lemma 26. (cid:18) − √ d (cid:19) · (cid:18) − Fn (cid:19) ≤ Y • D H n · ( g + 1) ≤ (cid:18) √ d (cid:19) · (cid:18) gFn (cid:19) roof. Follows from a previous claim on Y • I and on the fact that the degree condition on H can be expressedas (cid:18) − √ d (cid:19) · I (cid:22) D H (cid:22) (cid:18) √ d (cid:19) · I Lemma 27. X • L H Y • L H ≥ Y − X ) • A H Y • D H − O (cid:18) gFn (cid:19) Proof. X • L H Y • L H − X − Y ) • L H Y • L H = ( Y − X ) • A H − ( Y − X ) • D H Y • D H − Y • A H ≥ ( Y − X ) • A H − ( Y − X ) • D H Y • D H so that X • L H Y • L H ≥ Y − X ) • A H Y • D H − ( Y − X ) • D H Y • D H We have ( Y − X ) • D H = X r X v w ( v ) · ( h r ( v ) − f r ( v ))= X r V ′ X v w ( v ) · ( h r ( v ) − f r ( v )) ≤ X r V ′ X v w ( v ) · h r ≤ (cid:18) √ d (cid:19) X r V ′ X v h r ≤ (cid:18) √ d (cid:19) · F · ( g + 1) where we used the facts, proved above, that h r ( v ) = f r ( v ) when r ∈ V ′ , that all weighted degrees are atmost / √ d , and that || h r || ≤ ( g + 1) .On the other hand, Y • D H ≥ (cid:18) − √ d (cid:19) · Y • I ≥ (cid:18) − √ d (cid:19) (cid:18) − Fn (cid:19) n · ( g + 1) Lemma 28 (Main) . ( Y − X ) • A H ≥ − O g √ d + gF √ dn !! gn · √ d roof. Finally we come to the main argument. We have ( Y − X ) • A H = X r ∈ V ′ h Tr A H h r − f Tr A H f r + X r V ′ h Tr A H h r − f Tr A H f r where X r V ′ h Tr A H h r − f Tr A H f r ≥ − X r V ′ f Tr A H f r ≥ − X r V ′ || f r || · || A H ||≥ −| F | ( g + 1) (cid:18) √ d (cid:19) so that it remains to study P r ∈ V ′ h Tr A H h r − f Tr A H f r . X r ∈ V ′ h Tr A H h r − f Tr A H f r = X r ∈ V ′ X a,b w a,b · ( h r ( a ) h r ( b ) − f r ( a ) f r ( b ))= 2 X r ∈ V ′ X a,b w a,b h r ( a ) h r ( b ) To motivate the last step, we note that w a,b h r ( a ) h r ( b ) is non-zero iff there is some ℓ ≤ g − such that a isat distance ℓ from r and b is at distance ℓ + 1 from r (or vice versa) and so h r ( a ) h r ( b ) = q Pr[ r nb → ℓ a ] q Pr[ r nb → ℓ +1 b ] and f r ( a ) f r ( b ) = ( − ℓ q Pr[ r nb → ℓ a ]( − ℓ +1 q Pr[ r nb → ℓ +1 b ] = − h r ( a ) h r ( b ) Let us call T r the BFS tree rooted at r and of depth g , and assume that its edges are directed from parentto child. Then we can rewrite X r ∈ V ′ h Tr A H h r − f Tr A H f r = 4 X r ∈ V ′ X ( a,b ) ∈ T r w a,b h r ( a ) h r ( b )= 4 X r ∈ V ′ g − X ℓ =0 X ( a,b ) ∈ T r : dist ( r,a )= ℓ w a,b q Pr[ r nb → ℓ a ] q Pr[ r nb → ℓ +1 b ] ≥ X r ∈ V ′ g − X ℓ =0 X ( a,b ) ∈ T r : dist ( r,a )= ℓ w a,b p Pr[ r → ℓ a ] p Pr[ r → ℓ +1 b ] Pr[ u → t v ] denotes the probability that a t -step standard random walk (in which edges are followedwith probability proportional to their weight) started at u ends at v . We have used the fact that if there isa unique shortest path from u to v and the length of such path is t , then we have Pr[ u nb → t v ] ≥ Pr[ u → t v ] .Another observation is that, in the particular circumstance in which r ∈ V ′ , ( a, b ) ∈ T r , and a hasdistance ℓ from r and b has distance ℓ + 1 from r , we have Pr[ r → ℓ +1 b ] = Pr[ r → ℓ a ] · w a,b w ( a ) and, together with our assumptions on the degrees of the vertices, X r ∈ V ′ h Tr A H h r − f Tr A H f r ≥ (cid:18) − O (cid:18) √ d (cid:19)(cid:19) X r ∈ V ′ g − X ℓ =0 X ( a,b ) ∈ T r : dist ( r,a )= ℓ w . a,b Pr[ r → ℓ a ] where dist ( r, a ) is the length (number of edges) of a shortest path from r to a . Now let us consider theabove inner summation over all pairs a, b such that a is at distance ℓ from r and the edge ( a, b ) exists in T r ,meaning that b is further from r than a is. If p is the predecessor of a in the unique path of length ℓ from r to a , then we have X b = p w . a,b ≥ (cid:18) − O (cid:18) d / (cid:19)(cid:19) X b w . a,b because P b w . a,b ≥ Ω(1) and w . a,p ≤ O (1 /d / ) .We can thus conclude that X r ∈ V ′ h Tr A H h r − f Tr A H f r ≥ (cid:18) − O (cid:18) √ d (cid:19)(cid:19) X r ∈ V ′ g − X ℓ =0 X a : dist ( a,r )= ℓ Pr[ r → ℓ a ] X b w . a,b The next observation is that if a is in V ′′ and r is at distance ≤ g − from a , then r is in V ′ , so we have X r ∈ V ′ g − X ℓ =0 X a : dist ( a,r )= ℓ Pr[ r → ℓ a ] X b w . a,b ≥ X a ∈ V ′′ g − X ℓ =0 X r : dist ( a,r )= ℓ P r [ r → ℓ a ] X b w . a,b ≥ (cid:18) − O (cid:18) g √ d (cid:19)(cid:19) X a ∈ V ′′ g − X ℓ =0 X r : dist ( a,r )= ℓ P r [ a → ℓ r ] X b w . a,b ≥ (cid:18) − O (cid:18) g √ d (cid:19)(cid:19) · g · X a ∈ V ′′ X b w . a,b By a convexity argument mentioned above, X a,b w . a,b ≥ dn (cid:18) P a,b w a,b dn (cid:19) . ≥ (cid:18) − O (cid:18) √ d (cid:19)(cid:19) n √ d and X a V ′′ X b w . a,b ≤ O ( F ) X a ∈ V ′′ X b w . a,b ≥ − O √ d + F √ dn !! n √ d and putting everything together X r ∈ V ′ h Tr A H h r − f Tr A H f r ≥ − O g √ d + F √ dn !! gn · √ d and ( Y − X ) • A H ≥ − O g √ d + gF √ dn !! gn · √ d We can now prove Theorem 20.
Proof of Theorem 20.
Given a graph H that satisfies the assumptions of the Theorem and that is an ǫ spectral sparsifier of ¯ K n , define PSD matrices X and Y as in Section 4.1. We have X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≤ ǫ − ǫ ≤ ǫ + O ( ǫ ) If ǫ > / √ d there is nothing else left to prove. If ǫ ≤ / √ d , then ǫ ≥ · (cid:18) X • L H X • L ¯ K n · Y • L ¯ K n Y • L H − (cid:19) − O (cid:18) d (cid:19) Now, from Lemma 25 and Lemma 27, we have X • L H X • L ¯ K n · Y • L ¯ K n Y • L H ≥ (cid:18) Y − X ) • A H Y • D H − O (cid:18) gFn (cid:19)(cid:19) · (cid:18) − O (cid:18) g · ( F + B ) n (cid:19)(cid:19) From Lemma 26 and Lemma 28 we have ( Y − X ) • A H Y • D H ≥ (cid:16) − O (cid:16) g √ d + gF √ dn (cid:17)(cid:17) gn · √ d n · ( g + 1) · (cid:16) √ d (cid:17) · (cid:16) gFn (cid:17) ≥ √ d · − O g + g √ d + gF √ dn !! Putting everything together, ǫ ≥ · √ d · − O g + g √ d + g √ d · ( F + B ) n !! We will now show a separation between cut and spectral sparsification of random log n -regular graphs.First, we demonstrate that a random log n regular graph satisfies the “pseudo-girth” conditions requiredby Theorem 20. 26 heorem 29. If G is a random regular graph drawn from G reg n, log n , and g is a fixed constant then the followingoccur.1. With probability , for every vertex v of G , the number of vertices of G reachable from v via paths oflength at most g is O ((log n ) g )
2. If we call V ′′ the set of vertices v such that there is no cycle in the vertex-induced subgraph of G inducedby the vertices at distance at most g from v , then with probability − o n (1) over the choice of G , | V ′′ | ≥ n − O ((log n ) g +1 ) Proof.
The first property immediately follows from the fact that the combinatorial degree is at most log n .For the second part, fix a vertex v and consider the probability, over the choice of G , that v V ′′ . By applyingthe principle of deferred decision, we first generate the log n neighbors of v , then the additional neighbors ofthose neighbors, and so on. Every time we make a decision about how to match a particular vertex x in oneof the log n matchings, the probability of hitting a previously seen vertex is at most O ((log n ) g /n ) and sothe probability that we create a cycle is at most O ((log n ) g /n ) . The conclusion of the theorem follows byapplying Markov’s inequality.We now prove the separation between cut and spectral sparsification. We restate Theorem 4 from theintroduction Theorem 4.
Let G be a random regular graph drawn from G reg n, log n . Then with probability − o n (1) over thechoice of G the following happens for every d :1. There is a weighted subgraph H of G with dn/ edges such that H is an ǫ cut sparsifier of G with ǫ ≤ (1 . ... + o n,d (1)) / √ d ;2. For every weighted subgraph H of G with dn/ edges, if H is an ǫ spectral sparsifier of G then ǫ ≥ (2 − o n,d (1)) / √ d .Proof. Let us fix d . If G is a random log n -regular graph drawn from G reg n, log n , then, for every fixed d there isa − o n (1) probability that there are o n ( n ) nodes that see a cycle within distance d / and there are o n ( n ) nodes in the ball of radius d / around each node. Note that the above properties will also hold for anyedge-subgraph H of G .From Theorem 20 we have that, with − o n (1) probability over the choice of G , if a weighted edge-inducedsubgraph H of G of average degree d is an ǫ -spectral sparsifier of the clique, then ǫ ≥ (2 − O ( d − / ) − o n (1)) / √ d .From [Bor19] we have that with − o n (1) probability the graph G is an O (1 / √ log n ) spectral sparsifier (andalso cut sparsifier) of the clique, and so if a weighted edge-induced subgraph H of G of average degree d isan ǫ -cut sparsifier of the G , then again ǫ ≥ (2 − O ( d − / ) − o n (1)) / √ d Since we constructed G as the union of log n random matchings, G contains, for large enough n , arandom d -regular graph from G reg n,d as an edge-induced subgraph (for example, consider the first d of the log n matchings used to construct G ). We can deduce from Theorem 2 that, with − o n (1) probability, G containsas a weighted edge-induced subgraph a graph H that has average degree d and is a (1 . . . . + o n,d ) / √ d cutsparsifier of the clique.We conclude that with − o n (1) probability over the choice of G , there is a weighted edge-inducedsubgraph H of G such that H has average degreee d and is a (1 . . . . + o n,d ) / √ d cut sparsifier of G The work of JS and LT on this project has received funding from the European Research Council (ERC)under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 834861).Most of this work was done while AC was a visiting student at Bocconi University. The authors would like tothank Andrea Montanari for pointing us to [JS17], and the Physics and Machine Learning group at Bocconi,particularly Enrico Malatesta, for patiently explaining the Parisi equations and the replica method to us.27 eferences [ACK +
16] Alexandr Andoni, Jiecao Chen, Robert Krauthgamer, Bo Qin, David P. Woodruff, and Qin Zhang.On sketching quadratic forms. In
Proceedings of the 2016 ACM Conference on Innovations inTheoretical Computer Science , pages 311–319, 2016.[AZLO15] Zeyuan Allen Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and regret mini-mization beyond matrix multiplicative updates. In
Proceedings of the Forty-Seventh Annual ACMon Symposium on Theory of Computing, STOC 2015 , pages 237–245, 2015.[BK96] András A. Benczúr and David R. Karger. Approximating s-t minimum cuts in Õ ( n
2) time.In
Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing,Philadelphia, Pennsylvania, USA, May 22-24, 1996 , pages 47–55, 1996.[Bor19] Charles Bordenave. A new proof of Friedman’s second eigenvalue theorem and its extension torandom lifts. In
Annales scientifiques de l’Ecole normale supérieure , 2019.[BSS09] Joshua D. Batson, Daniel A. Spielman, and Nikhil Srivastava. Twice-Ramanujan sparsifiers. In
Proceedings of the 41st ACM Symposium on Theory of Computing , pages 255–262, 2009.[CR02] A. Crisanti and T. Rizzo. Analysis of the ∞ -replica symmetry breaking solution of the sherrington-kirkpatrick model. Phys. Rev. E , 65:046137, Apr 2002.[DMS +
17] Amir Dembo, Andrea Montanari, Subhabrata Sen, et al. Extremal cuts of sparse random graphs.
The Annals of Probability , 45(2):1190–1217, 2017.[Eva10] Lawrence C. Evans.
Partial differential equations . American Mathematical Society, Providence,R.I., 2010.[FGL12] Xiequan Fan, Ion Grama, and Quansheng Liu. Hoeffding’s inequality for supermartingales.
Stochastic Processes and their Applications , 122(10):3545–3559, 2012.[Gue03] Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model.
Commu-nications in mathematical physics , 233(1):1–12, 2003.[JS17] Aukosh Jagannath and Subhabrata Sen. On the unbalanced cut problem and the generalizedsherrington-kirkpatrick model. arXiv preprint arXiv:1707.09042 , 2017.[LS17] Yin Tat Lee and He Sun. An SDP-based algorithm for linear-sized spectral sparsification. In
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017 ,pages 678–687, 2017.[Mal19] Enrico M Malatesta.
Random Combinatorial Optimization Problems: Mean Field and Finite-Dimensional Results . PhD thesis, Universit‘a degli Studi di Milano, 2019.[Neu13] Edward Neuman. Inequalities and bounds for the incomplete gamma function.
Results in Math-ematics , 63(3-4):1209–1214, 2013.[P +
45] George Pólya et al. Remarks on computing the probability integral in one and two dimensions.In
Proceedings of the 1st Berkeley Symposium on Mathematical Statistics and Probability , pages63–78, 1945.[Pan14] Dmitry Panchenko. The Parisi formula for mixed p-spin models.
Annals of Probability ,42(3):946–958, 2014.[Par80] G Parisi. A sequence of approximated solutions to the s-k model for spin glasses.
Journal ofPhysics A: Mathematical and General , 13(4):L115–L121, apr 1980.28Sen18] Subhabrata Sen. Optimization on sparse random hypergraphs and spin glasses.
Random Struc-tures & Algorithms , 53(3):504–536, 2018.[SS11] Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances.
SIAM J.Comput. , 40(6):1913–1926, 2011.[ST11] Daniel Spielman and Shang-Hua Teng. Spectral sparsification of graphs.
SIAM Journal onComputing , 40(4):981–1025, 2011.[ST18] Nikhil Srivastava and Luca Trevisan. An Alon-Boppana type bound for weighted graphs andlowerbounds for spectral sparsification. In
Proceedings of the Twenty-Ninth Annual ACM-SIAMSymposium on Discrete Algorithms, SODA , pages 1306–1315, 2018.[Tal06] Michel Talagrand. The Parisi formula.
Annals of mathematics , pages 221–263, 2006.29
Analytic Inequalities
In this section, we prove two analytic inequalities required by our martingale concentration analysis. Thefirst lemma is used to approximate the exponent in the concentration bound provided by Lemma 17 in thecase where δC is small. Lemma 30.
For any δ, C ≥ such that δC ≤ , we have that (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) ≥ δ + δ C Proof.
Proceed by expanding δ ln (cid:0) δC + 1 (cid:1) via its Taylor approximation δ ln (cid:18) δC + 1 (cid:19) = δ · (cid:18) ∞ X t =1 ( − t +1 δ t C t t (cid:19) = ∞ X t =1 ( − t +1 δ t +1 C t t similarly for C ln (cid:0) δc + 1 (cid:1) , we have C · ln (cid:18) δC + 1 (cid:19) = C · (cid:18) ∞ X t =1 ( − t +1 δ t C t t (cid:19) = ∞ X t =1 ( − t +1 δ t C t − t = δ + ∞ X t =1 ( − t δ t +1 C t ( t + 1) Combining the two expansions, we derive (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) = δ + ∞ X t =1 ( − t +1 (cid:18) δ t +1 C t (cid:19)(cid:18) t − t + 1 (cid:19) ≥ δ + δ C The second lemma is used to approximate the exponent in Lemma 17 when δC is large. Lemma 31.
For any δ ≥ C ≥ , we have (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) − δ ≥ C · δ ln δ Proof.
Denote f ( C, δ ) = (cid:0) δ + C (cid:1) ln (cid:0) δC + 1 (cid:1) − δ − C · δ ln δ . It suffices to demonstrate f ( C, δ ) ≥ for all δ ≥ C ≥ . To see this, first note f ( C, δ ) ≥ for all δ = C ≥ as we have f ( C, δ ) = (cid:0) δ + C (cid:1) ln (cid:18) δC + 1 (cid:19) − δ − C · δ ln δ = 2 δ · ln(2 δ ) − δ − ln δ which is true for any δ ≥ . We next compute ∂f∂δ as follows. ∂f∂δ = ln (cid:18) δC + 1 (cid:19) − C · (cid:0) ln δ + 1 (cid:1) = ln (cid:18) δ/C + 1 δ / C (cid:19) − C If we can show that ∂f∂δ ≥ for all δ ≥ C ≥ , then we would have that f is non-negative along δ = C ,and non-decreasing along the positive δ direction past the δ = C line. It must then be that f is non-negativefor all δ ≥ C ≥ . Towards this, observe it is equivalent to demonstrate (cid:18) δC + 1 (cid:19) C ≥ δe With g ( C, δ ) = (cid:0) δC + 1 (cid:1) c − δe , we notice that for all δ = C ≥ we have g ( C, δ ) = (cid:18) δC + 1 (cid:19) C − δC = 2 δ − δe ≥ δC ≥ . Meanwhile, observe that ∂g∂δ = 2 (cid:18) δC + 1 (cid:19) C − − e ≥ · − − e ≥ Consequently, g ( C, δ ) ≥ implying (cid:0) δC + 1 (cid:1) C ≥ δe implying ∂f∂δ ≥0