The Spectral Norm of Random Lifts of Matrices
aa r X i v : . [ m a t h . P R ] A ug The Spectral Norm of Random Lifts of Matrices
Afonso S. Bandeira ∗ and Yunzi Ding † Department of Mathematics, ETH Zurich, Switzerland Department of Mathematics, Courant Institute of Mathematical Sciences, New YorkUniversity, USA
Abstract
We study the spectral norm of matrix random lifts A ( k,π ) for a given n × n matrix A and k ≥
2, which is a random symmetric kn × kn matrix whose k × k blocks are obtained bymultiplying A ij by a k × k matrix drawn independently from a distribution π supported on k × k matrices with spectral norm at most 1. Assuming that E π X = 0, we prove that E k A ( k,π ) k . max i sX j A ij + max ij | A ij | p log( kn ) . This result can be viewed as an extension of existing spectral bounds on random matriceswith independent entries, providing further instances where the multiplicative √ log n factor inthe Non-Commutative Khintchine inequality can be removed. We also show an application onrandom k -lifts of graphs (each vertex of the graph is replaced with k vertices, and each edgeis replaced with a random bipartite matching between the two sets of k vertices each). Weprove an upper bound of 2(1 + ǫ ) √ ∆ + O ( p log( kn )) on the new eigenvalues for random k -lifts of a fixed G = ( V, E ) with | V | = n and maximum degree ∆, compared to the previousresult of O ( p ∆ log( kn )) by Oliveira [Oli09] and the recent breakthrough by Bordenave andCollins [BC19] which gives 2 √ ∆ − o (1) as k → ∞ for ∆-regular graph G . ∗ Email: [email protected] . Part of this work was done while ASB was with the Department of Mathematicsat the Courant Institute of Mathematical Sciences, and the Center for Data Science, at New York University; andpartially supported by NSF grants DMS-1712730 and DMS-1719545, and by a grant from the Sloan Foundation. † Email: [email protected] . Partially supported by NSF grant DMS-1712730.
Introduction
The Non-Commutative Khintchine (NCK) inequality, originally introduced by Lust-Piquard andPisier [Pis03], is one of the simplest tool for understanding the spectrum of matrix series, namely X = N X i =1 γ i A i (1)where A i ( i = 1 , , . . . , N ) are n × n real symmetric matrices and γ i ( i = 1 , , . . . , N ) are i.i.d.random variables, usually assumed gaussian or Rademacher. The inequality is stated as follows. Theorem 1.1 (Non-Commutative Khintchine (NCK) inequality) . Let A , A , . . . , A N be n × n symmetric matrices and γ , γ , . . . , γ N be i.i.d. N (0 , random variables, then E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X i =1 γ i A i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ σ p n ) , where σ := (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X i =1 A i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)! / . (2)The NCK inequality and other phenomena of matrix concentration have been proven under varioussettings and extensively studied in [Oli10, Tro12, Tro15]. One particularly important application ofmatrix concentration is on the spectra of random matrices with independent entries. These randommatrices can be represented as matrix series upon a direct entry-wise decomposition, as we showbelow. The study of random matrices with independent entries traces back to the seminal work byWigner [Wig58]. For Wigner matrices (real symmetric or Hermitian random matrices with inde-pendent mean-zero and unit variance entries), a long line of work has established a comprehensiveunderstanding towards its spectral properties over the past decades (see, for example, [FK81, BY88,AGZ10, Tao12]). One of the most important results is the Wigner semicircle law: for n × n Wignermatrix X , E k X k / √ n → X converges to the semicircle π √ − x {− ≤ x ≤ } as n → ∞ .Random matrices with different variances on each of the independent entries, for instance realsymmetric X ∈ R n × n with X ij ∼ N (0 , b ij ) for 1 ≤ i ≤ j ≤ n , have also been studied [DS01, RV10,Ver10]. With the NCK inequality, the following estimate can be obtained: E k X k . σ p log n, where σ := max i sX j b ij . (3)Here the definition of σ is consistent with (2) upon writing X as the following matrix series: X = n X i =1 γ ii b ii E ii + X ≤ i Theorem 1.3. Let A be a symmetric n × n matrix ( n ≥ ) with zero diagonal entries. Suppose π is a measure supported on k × k matrices with spectral norm at most , which satisfies E π X = 0 k .Then there exists a universal constant C , such that for any ǫ ∈ (0 , / , E k A ( k,π ) k ≤ ǫ ) σ + C p log(1 + ǫ ) σ ∗ p log( kn ) . (9) where σ := max i sX j A ij , σ ∗ := max ij | A ij | . Note that Definition 1.2 and Theorem 1.3 only apply to base matrices with zero diagonal entries.For any symmetric base matrix with possibly non-zero diagonal entries, in the lifting process wedraw { Π ii } ≤ i ≤ n i.i.d. from another distribution π ′ on k × k matrices with spectral norm at most1 and E π ′ X = 0 k , and the same result can be easily adapted. Remark 1.4. Upon taking A = { b ij } , k = 1 and π = Uniform {± } , Definition 1.2 and Theorem 1.3include as a special case the real symmetric random matrix X ∈ R n × n with X ij = ǫ ij b ij , where { b ij } are given and ǫ ij are independent Rademacher random variables for ≤ i ≤ j ≤ n , that is, P [ ǫ ij = ± 1] = 1 / . Since [BvH16] showed that the bound O ( σ + σ ∗ √ log n ) captures the optimalscaling of E k X k with respect to σ and σ ∗ √ log n and is in general unimprovable, this implies thesame for our bound (9) on E k A ( k,π ) k . However, Theorem 1.3 does not directly imply the bound (5) ,since gaussian random variables are not compactly supported. Besides the k = 1 case, Theorem 1.3 is also interesting with natural choices such as π being theHaar measure on the orthogonal group O ( k ) or special orthogonal group SO ( k ). One particularapplication is an estimate on the spectrum of random lifts of graphs, which we discuss below.4 .4 Application: random lifts of graphs Given an undirected graph G = ( V, E ) and an integer k ≥ 2, the random k -lift of G , denoted G ( k ) , isobtained by replacing each vertex v ∈ V by k new vertices, and each edge e = ( v , v ) by a random k × k bipartite matching between the k new vertices corresponding to v and those correspondingto v . Here “random” refers to a uniform choice on all k ! possible bipartite matchings. We denote A and A ( k ) the adjacency matrix of G and G ( k ) , respectively.Previous studies on the k -lifts of graphs, under the setting of fixed G and k → ∞ , have re-vealed many properties of the resulting random graph, such as connectivity [AL02], chromaticnumber [ALM02], edge expansion [AL06] and the existence of perfect matching [LR05]. The spec-trum of random k -lifts, namely the new spectrum introduced in the lifting processmax η ∈ spec( A ( k ) ) \ spec( A ) | η | = k A ( k ) − E A ( k ) k (10)was studied by Friedman [Fri03] via the trace method, who showed that with a random d -regulargraph as the base graph, as k → ∞ , (10) is O ( d / ) with high probability. He also conjectured thetight bound 2 √ d − o (1). The high probability upper bound on (10) was improved by Linial andPuder to O ( d / ) in [LP10], then by Lubetzky, Sudakov and Vu [LSV11] to O ( √ d log d ) in the casethat the second eigenvalue of the base graph is O ( √ d ). Later, Addario-Berry and Griffiths [ABG10]and Puder [Pud15] proved that (10) is O ( √ d ), the latter giving 2 √ d − O (1) as an upper bound.Since then, various extensions or alternative proofs of the bound on (10) of the scale O ( √ d ) (someunder slightly different settings) have been carried out with different combinatorial and probabilistictechniques, for example, in [FK14, BLM15, Bor15, ACKM17].We should notice that the above line of work adopted the asymptotic regime k → ∞ and thesetting that the base graph is taken randomly over all d -regular graphs on n vertices. In fact, inthe case that the base graph is a fixed d -regular graph and k ∈ N + is fixed, (10) is not alwaysupper bounded by O ( √ d ). As a counterexample (see [BvH16], Remark 4.8): consider G the unionof n/s cliques of s vertices each, with no edges between different cliques; here s = ⌈√ log n ⌉ , andwe assume that n/s is an integer for simplicity. Seginer [Seg00] showed that E k A (2) − E A (2) k ∼ p log n, whereas the O ( √ d ) bound would incorrectly predict that LHS is O (log / ( n )).Another line of work considers a fixed base graph G with maximum degree ∆ without assumingits randomness. Making use of matrix concentration, Oliveira [Oli09] obtained a high probabil-ity upper bound of O ( p ∆ log( kn )) on (10). The most recent advancement by Bordenave andCollins [BC19] considered the k -lifts problem under a much more general framework, and provedthat (10) is 2 √ d − o (1) for any d -regular base graph G as k → ∞ , finally settling Friedman’sconjecture even without assuming the randomness of the base graph.In what follows, we manage to improve the bound in [Oli09] by removing the multiplicative factor p log( kn ), replacing it with an additive factor. We also improve the constant factor before √ ∆ downto 2, in consistence with Friedman’s theorem and [BC19]. Though our bound is weaker than [BC19]by an additive p log( kn ) factor in the large k limit, we are only using a slight modification of themoment method compared to the sophisticated combinatorial technique in [BC19]. Theorem 1.5. Let A be the adjacency matrix of G = ( V, E ) with | V | = n and maxdeg( G ) = ∆ ,and A ( k ) be the corresponding random k -lift. Then there exists a universal constant C , such thatfor any ǫ ∈ (0 , / E k A ( k ) − E A ( k ) k ≤ ǫ ) √ ∆ + C p log(1 + ǫ ) p log( kn ) . (11)5ur bound is essentially 2(1 + ǫ ) √ ∆ as long as ∆ ≫ log( kn ), i.e. the base graph G is not toosparse. The proof of Theorem 1.5 will follow from our main result, Theorem 1.3. Notation In this paper, for positive quantities A and B , A . B and A & B respectively refer to A ≤ CB and A ≥ CB for some absolute positive constant C . For x ∈ R , ⌈ x ⌉ denotes the minimum integerthat is larger than or equal to x . In this section, we carry out the proof of Theorems 1.3 and 1.5. We begin with the following com-parison argument which links A ( k,π ) to an auxiliary Wigner matrix. This argument is a modificationof Proposition 2.1 in [BvH16], and the auxiliary matrix Y r is same as in the proof of Theorem 4.8in [LvHY18]. Proposition 2.1. Let Y r be the r × r symmetric matrix with zero diagonal and Y ij = √ , w.p. − √ , w.p. independently for all ≤ i < j ≤ r . Under the setting of Theorem 1.3, suppose σ ∗ ≤ , then forevery p ∈ N + there holds E Tr h ( A ( k,π ) ) p i ≤ kn ⌈ σ ⌉ + p E Tr h Y p ⌈ σ ⌉ + p i . To carry out the proof of Proposition 2.1, we start with a set of standard notations adoptedfrom [FK81] and [BvH16]. Following the representation (8), a direct expansion of ( A ( k,π ) ) p yields E Tr h ( A ( k,π ) ) p i = X u ,u ,...,u p ∈ [ n ] p Y j =1 A u j u j +1 E Tr p Y j =1 Π u j u j +1 . (12)Let G n = ([ n ] , E n ) be the complete graph on n points. A cycle u → u → · · · → u p → u oflength 2 p , where u i ∈ [ n ] for all 1 ≤ i ≤ p ( u p +1 := u ), is identified as u = ( u , . . . , u p ) ∈ [ n ] p .Since E [Π ij ] = 0 for any 1 ≤ i, j ≤ n , in the sum of (12) we only need to consider cycles with eachvertex appearing at least twice.We call the shape of a cycle u , denoted s ( u ), a relabeling of the vertices in the order of theirappearance. For example, the shape of u = (4 , , , , , , , 4) is (1 , , , , , , , S p := { s ( u ) : u is a cycle of length 2 p with each vertex appearing at least twice } . For the sake of convenience, we also define the set of cycles with fixed shape and starting point asΓ s ,u := { u ∈ [ n ] p : s ( u ) = s , u = u } . The span of a shape s , denoted by m ( s ), is the largest index in its representation, also the numberof distinct vertices any cycle of shape s visits. A direct observation is m ( s ) ≤ p + 1 for any s ∈ S p .6 roof of Proposition 2.1. Following the expansion (12) we have E Tr h ( A ( k,π ) ) p i = X s ∈S p X u ∈ [ n ] X u ∈ Γ s ,u p Y j =1 A u j u j +1 E Tr p Y j =1 Π u j u j +1 ≤ k X s ∈S p X u ∈ [ n ] X u ∈ Γ s ,u p Y j =1 | A u j u j +1 | ≤ kn X s ∈S p σ m ( s ) − (13)where the first inequality follows from k Q pj =1 Π u j u j +1 k ≤ Tr hQ pj =1 Π u j u j +1 i ≤ k ;and the second inequality owes to the fact that, under σ ∗ ≤ 1, for any u ∈ [ n ] and s ∈ S p , Lemma2.5 in [BvH16] gives X u ∈ Γ s ,u p Y j =1 | A u j u j +1 | ≤ σ m ( s ) − . Meanwhile, for any positive integer r > p , for the auxiliary random matrix Y r we have E Tr (cid:2) Y pr (cid:3) = X u ∈ [ r ] p E Tr p Y j =1 Y u j u j +1 ≥ X s ∈S p X u ∈ [ r ] X u ∈ Γ s ,u E Tr p Y j =1 Y u j u j +1 = X s ∈S p |{ u ∈ [ r ] p : s ( u ) = s }| · E Tr p Y j =1 Y u j u j +1 ≥ X s ∈S p r ( r − · · · ( r − m ( s ) + 1) · E [ Y mij ] ≥ m ≥ 2. Now choosing r = ⌈ σ ⌉ + p , noting that m ( s ) ≤ p + 1 for all s ∈ S p , we have E Tr h Y p ⌈ σ ⌉ + p i ≥ ( ⌈ σ ⌉ + p ) X s ∈S p σ m ( s ) − . (14)Comparing (13) with (14) yields the result.The following lemma gives an upper bound on the moments of the auxiliary random matrix Y r . Lemma 2.2. For Y r defined in Proposition 2.1, there exists an absolute constant C , such that forany positive integer p ≥ , there holds E Tr (cid:2) Y pr (cid:3) ≤ r (2 √ r + C √ p ) p . (15)7 roof. Since E Tr (cid:2) Y pr (cid:3) ≤ r E (cid:2) k Y r k p (cid:3) , we only need to show that there exists an absolute constant C , such that for p ≥ E (cid:2) k Y r k p (cid:3) ≤ (2 √ r + C √ p ) p . (16)The proof of (16) is contained in the proof of Theorem 4.8 in [LvHY18], so we do not repeat ithere. The main steps of the proof are a norm bound for Wigner matrices with non-symmetricallydistributed entries followed by Talagrand’s concentration inequality. Proof of Theorem 1.3. By Proposition 2.1 and Lemma 2.2, assuming σ ∗ ≤ 1, we know that for anypositive integer p ≥ E k A ( k,π ) k ≤ (cid:16) E Tr h ( A ( k,π ) ) p i(cid:17) / p ≤ (cid:18) kn ⌈ σ ⌉ + p E Tr h Y p ⌈ σ ⌉ + p i(cid:19) / p ≤ ( kn ) / p (cid:16) p ⌈ σ ⌉ + p + C √ p (cid:17) If kn ≥ 3, for α ≥ p = ⌈ α log( kn ) ⌉ ≥ E k A ( k,π ) k ≤ e / α (cid:16) p ⌈ σ ⌉ + ⌈ α log( kn ) ⌉ + C p ⌈ α log( kn ) ⌉ (cid:17) ≤ e / α (cid:16) σ + 2 p α log( kn ) + 2 + C p α log( kn ) + 1 (cid:17) . Denote e / α = 1 + ǫ . Since n ≥ k ≥ 1, and ǫ ≤ / α ≥ 1, we have 2 < ≤ α log( kn ), so E k A ( k,π ) k ≤ ǫ ) σ + (1 + ǫ )(4 + 2 C ) s log( kn )2 log(1 + ǫ ) . The remaining case kn < n = 2 and k = 1. In this case π is supported on[ − , E k A ( k,π ) k ≤ E k A ( k,π ) k F ≤ < p kn ) . The spectral bound of random k -lifts of graphs follows as an immediate corollary. Proof of Theorem 1.5. Denote Perm( k ) the collection of all k × k permutation matrices, and G k := { Π − k J k : Π ∈ Perm( k ) } where J k is the k × k matrix with all entries 1. It is easy to verify that k X k ≤ X ∈ G k .Moreover, the adjacency matrix A has σ ≤ ∆ and σ ∗ ≤ E k A ( k ) − E A ( k ) k = E k A ( k, Unif( G k )) k≤ ǫ ) √ ∆ + C p log(1 + ǫ ) p log( kn ) . emark 2.3. In the above proof, we applied Theorem 1.3 on π = Unif( G k ) , where G k is thecentered version of Perm( k ) . One may expect that, under the setting of Theorem 1.3 withoutassuming E π X = 0 k , there still holds E k A ( k,π ) − E A ( k,π ) k ≤ C (cid:16) σ + σ ∗ p log( kn ) (cid:17) . (17) Though we do not have a counterexample for (17) , we must point out that (17) only follows fromTheorem 1.3 when k X − E π X k ≤ for every X ∈ supp( π ) . We note that the proof of Theorem 1.3 is not exploiting any potential structure of the “liftingmatrices” Π ij . In fact, this may explain why Theorem 1.5 is worse than the result in [BC19] by anadditive p log( kn ) factor in the large k limit for d -regular base graphs. One may be able to obtaina stronger result, for instance E k A ( k,π ) − E A ( k,π ) k ≤ √ ∆ + o k (1), with a more careful analysisconsidering that Π ij are permutation matrices. Acknowledgments We are grateful to Ramon van Handel for comments on an early version of the paper, in particularfor pointing us to the latest results on graph k -lifts in [BC19], for directing us to the proof ofTheorem 4.8 in [LvHY18] which allowed us to improve the constant factor before σ in Theorem 1.3to 2, and for making us aware of recent efforts to improve the NCK inequality under more generalsettings. We would also like to thank Jiedong Jiang, Eyal Lubetzky, Ruedi Suter and Joel Troppfor helpful discussions. References [ABG10] Louigi Addario-Berry and Simon Griffiths. The spectrum of random lifts. arXiv preprintarXiv:1012.4097 , 2010.[ACKM17] Naman Agarwal, Karthekeyan Chandrasekaran, Alexandra Kolla, and Vivek Madan.On the expansion of group-based lifts. In Approximation, Randomization, andCombinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017) .Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.[AGZ10] Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to randommatrices , volume 118. Cambridge university press, 2010.[AL02] Alon Amit and Nathan Linial. Random graph coverings i: General theory and graphconnectivity. Combinatorica , 22(1):1–18, 2002.[AL06] Alon Amit and Nathan Linial. Random lifts of graphs: edge expansion. Combinatorics,Probability and Computing , 15(3):317–332, 2006.[ALM02] Alon Amit, Nathan Linial, and Jiˇr´ı Matouˇsek. Random lifts of graphs: independenceand chromatic number. Random Structures & Algorithms , 20(1):1–22, 2002.[Ban15] Afonso S Bandeira. Ten lectures and forty-two open problems in the mathematics ofdata science. Lecture Notes , 2015.[BC19] Charles Bordenave and Benoˆıt Collins. Eigenvalues of random lifts and polynomials ofrandom permutation matrices. Annals of Mathematics , 190(3):811–875, 2019.9BLM15] Charles Bordenave, Marc Lelarge, and Laurent Massouli´e. Non-backtracking spectrumof random graphs: community detection and non-regular ramanujan graphs. In , pages 1347–1357.IEEE, 2015.[Bor15] Charles Bordenave. A new proof of friedman’s second eigenvalue theorem and itsextension to random lifts. arXiv preprint arXiv:1502.04482 , 2015.[BvH16] Afonso S Bandeira and Ramon van Handel. Sharp nonasymptotic bounds on the normof random matrices with independent entries. The Annals of Probability , 44(4):2479–2506, 2016.[BY88] Zhi-Dong Bai and Yong-Qua Yin. Necessary and sufficient conditions for almost sureconvergence of the largest eigenvalue of a wigner matrix. The Annals of Probability ,pages 1729–1741, 1988.[DS01] Kenneth R Davidson and Stanislaw J Szarek. Local operator theory, random matricesand banach spaces. Handbook of the geometry of Banach spaces , 1(317-366):131, 2001.[FK81] Zolt´an F¨uredi and J´anos Koml´os. The eigenvalues of random symmetric matrices. Combinatorica , 1(3):233–241, 1981.[FK14] Joel Friedman and David-Emmanuel Kohler. The relativized second eigenvalue conjec-ture of alon. arXiv preprint arXiv:1403.3462 , 2014.[Fri03] Joel Friedman. Relative expanders or weakly relatively ramanujan graphs. Duke Math-ematical Journal , 118(1):19–35, 2003.[LP10] Nati Linial and Doron Puder. Word maps and spectra of random graph lifts. RandomStructures & Algorithms , 37(1):100–135, 2010.[LR05] Nathan Linial and Eyal Rozenman. Random lifts of graphs: perfect matchings. Com-binatorica , 25(4):407–424, 2005.[LSV11] Eyal Lubetzky, Benny Sudakov, and Van Vu. Spectra of lifted ramanujan graphs. Advances in Mathematics , 227(4):1612–1645, 2011.[LvHY18] Rafa l Lata la, Ramon van Handel, and Pierre Youssef. The dimension-free structure ofnonhomogeneous random matrices. Inventiones mathematicae , 214(3):1031–1080, 2018.[Oli09] Roberto Imbuzeiro Oliveira. The spectrum of random k-lifts of large graphs (withpossibly large k). arXiv preprint arXiv:0911.4741 , 2009.[Oli10] Roberto Imbuzeiro Oliveira. Sums of random hermitian matrices and an inequality byrudelson. Electronic Communications in Probability , 15:203–212, 2010.[Pis03] Gilles Pisier. Introduction to operator space theory , volume 294. Cambridge UniversityPress, 2003.[Pud15] Doron Puder. Expansion of random graphs: New proofs, new results. Inventionesmathematicae , 201(3):845–908, 2015.10RV10] Mark Rudelson and Roman Vershynin. Non-asymptotic theory of random matrices:extreme singular values. In Proceedings of the International Congress of Mathematicians2010 (ICM 2010) (In 4 Volumes) Vol. I: Plenary Lectures and Ceremonies Vols. II–IV:Invited Lectures , pages 1576–1602. World Scientific, 2010.[Seg00] Yoav Seginer. The expected norm of random matrices. Combinatorics, Probability andComputing , 9(2):149–166, 2000.[Tao12] Terence Tao. Topics in random matrix theory , volume 132. American MathematicalSoc., 2012.[Tro12] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations ofcomputational mathematics , 12(4):389–434, 2012.[Tro15] Joel A Tropp. An introduction to matrix concentration inequalities. Foundations andTrends in Machine Learning , 8(1-2):1–230, 2015.[Tro18] Joel A Tropp. Second-order matrix concentration inequalities. Applied and Computa-tional Harmonic Analysis , 44(3):700–736, 2018.[Ver10] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 , 2010.[vH17] Ramon van Handel. On the spectral norm of gaussian random matrices. Transactionsof the American Mathematical Society , 369(11):8161–8178, 2017.[Wig58] Eugene P Wigner. On the distribution of the roots of certain symmetric matrices.