[PDF] Bounds on the Spectral Sparsification of Symmetric and Off-Diagonal Nonnegative Real Matrices

Abstract

We say that a square real matrix M is \emph{off-diagonal nonnegative} if and only if all entries outside its diagonal are nonnegative real numbers. In this note we show that for any off-diagonal nonnegative symmetric matrix M , there exists a nonnegative symmetric matrix M ˆ which is sparse and close in spectrum to M .

Full PDF

aa r X i v : . [ c s . D S ] S e p Bounds on the Spectral Sparsiﬁcation of Symmetricand Oﬀ-Diagonal Nonnegative Real Matrices ∗ Sergio Mercado † Marcos Villagra ‡ Abstract

We say that a square real matrix M is oﬀ-diagonal nonnegative if and onlyif all entries outside its diagonal are nonnegative real numbers. In this note weshow that for any oﬀ-diagonal nonnegative symmetric matrix M , there exists anonnegative symmetric matrix c M which is sparse and close in spectrum to M . Keywords. spectral sparsiﬁcation, symmetric matrices, nonnegative matrices,spectral graph theory

MSC Class.

The run-time of many important algorithms in mathematics and computer sciencedepend on how “sparse” the input data is. One such example is Lanczsos’s algorithm[7] which can be used to compute a set of eigenvalues and eigenvectors of a matrix ofsize n in O ( n ) arithmetical operations in the worst-case. If the average number ofnon-zero entries per row in an input matrix to Lanczsos’s algorithm is bounded by aconstant, then its running-time can be bounded by O ( n ) arithmetic operations.In this note, we show how to construct sparse matrices from a certain class of sym-metric real matrices and present some potential applications with interesting researchdirections. ∗ The authors acknowledge the support of CONACyT research grants POSG17-62, PINV15-706and PINV15-208. † Email: [email protected] . Aﬃliation: Facultad Polit´ecnica, Universidad Nacional deAsunci´on, Campus Universitario, San Lorenzo C.P. 111421, Paraguay. This author is supported bya CONACyT scholarship for graduate studies. ‡ Email: [email protected] . Aﬃliation: Departamento de Matem´aticas, Universidad Na-cional de Asunci´on, Campus Universitario, San Lorenzo C.P. 111421, Paraguay. .2 Results The main approach of this work is to borrow some ideas from spectral graph theoryin order to construct sparse matrices that are close in spectrum to what we call oﬀ-diagonal nonnegative matrices.

Deﬁnition 1.

A real square matrix M is oﬀ-diagonal nonnegative (or simply ODNmatrix) if for all i, j = 1 , . . . , n, with i = j we have tha its entries ( M ) ij ≥ . Note that the diagonal elements of an ODN matrix can be any real number andonly its oﬀ-diagonal elements are nonnegative real numbers.Let M be a symmetric ODN matrix. Deﬁne two matrices A M and D M as follows:(i) ( A M ) ij = ( M ) ij if i = j and ( A M ) ij = 0 if i = j , and (ii) ( D M ) ij = P j ≤ n j = i ( M ) ij if j = i , and ( D M ) ij = 0 if j = i . Deﬁne a third matrix L M = D M − A M . Notethat the matrices A M and L M , as constructed, can be respectively interpreted as theadjacency and Laplacian matrices of some graph.Let us denote by ∆ M and δ M the largest and smallest element in the diagonal of M , respectively. Now we are ready to state our main result. Theorem 1.

Let M ∈ R n × n be an ODN symmetric matrix with m non-zero oﬀ-diagonal entries and a set of eigenvalues λ ≥ · · · ≥ λ n with a respective set ofeigenvectors x , x , · · · , x n . For any ǫ > such that < ǫ ≤ / , there existsan ODN symmetric matrix c M ∈ R n × n with O ( nǫ ) non-zero entries, with a set ofeigenvalues b λ ≥ · · · ≥ b λ n with a respective set of eigenvectors b x , · · · , b x n , such that | λ i − b λ i | ≤ ǫ √ nρ ( L M ) + ∆ M − δ M . Furthermore, if θ i is the angle between the subspaces spanned by eigenvectors x i and b x i , then sin θ i ≤ ǫ √ nρ ( L M ) + (∆ M − δ M ) / {| b λ i − − λ i | , | λ i − b λ i +1 |} . Note that if all diagonal entries in M are close in values, then (∆ M − δ M ) / ǫ √ nρ ( L M ).Theorem 1 is comparable to a result of Zouzias [9]. Given 0 < ǫ <

1, for anyself-adjoint matrix A of size n that is also θ -symmetric diagonally dominant, thereexists a matrix b A with at most O ( nθ log n/ǫ ) non-zero entries and k A − b A k ≤ ǫ k A k . Informally, a matrix A is θ -symmetric diagonally dominant if k A k ∞ = O ( √ θ ).Therefore, the approximation factor in the result of Zouzias [9] has a dependencyon the entry with the largest absolute value in the matrix A . Theorem 1 eliminatesthat dependency at the expense of further restrictions on the matrix we want toapproximate. The “log n ” factor can be dropped using new spectral sparsiﬁcation techniques like Lee andSun [4] .3 Some Applications of Theorem 1 Let A ∈ R n × n be an ODN matrix. We say that a quadratic form Q ( x ) = x T Ax is ODN if its matrix A is ODN. Every quadratic form has a diagonal form Q ( x ) = λ x + · · · λ n x n , where each λ i is an eigenvalue of A . Furthermore, Sylvester’s Law ofInertia tells us that the number of positive and negative coeﬃcients in any diagonalform of Q is an invariant of Q .Let b Q ( x ) = x T b Ax where b A is a matrix obtained from A by means of Theorem 1. If b Q ( x ) = b λ x + · · · b λ n x n where b λ i is an eigenvalue of b A , we have that | Q ( x ) − b Q ( x ) | ≤ ρ ( A ) + (∆ A − δ A ) / ǫ suﬃciently small. Thus, if we are interested in theoptimization of Q ( x ), we can use b A as the input of a quadratic optimization solverinstead of A , and it will result in a solution with the guarantees mentioned in theprevious sentence. State-of-art quadratic optimization solvers can exploit the sparsityof an input matrix, which is especially important in the case of non-convex problems.Optimization of quadratic forms is an NP-hard optimization problem, even withbinary variables. In fact, a single negative eigenvalue suﬃces to make the problem NP-hard [6]. It is also closely related to the optimization of the Ising model in statisticalmechanics. We do not know, however, if the optimization of ODN quadratic forms isNP-hard or not, and we leave this as an open problem. Principal Componnet Analysis or

PCA is a method to reduce the dimensionality ofdata [3]. Given a set of data with correlated atributes, PCA reduces the number ofatributes while it preserves the variance, as best possible, of the data. PCA constructsa new set of non-correlated variables known as principal components .Let x be a vector of n random variables. The ﬁrst principal component is deﬁnedas z = v T x , where v ∈ R n is a vector that maximizes the variance of z , denoted V ar ( z ). The i -th principal component is the variable z i = v ti x , with v i ∈ R n , suchthat each z , z , . . . , z i are pairwise uncorrelated and V ar ( z i ) is maximum. It is knownthat if S is the covariance matrix of x , with eigenvalues λ ≥ · · · λ n , then v i is theeigenvector of S corresponding to λ i and V ar ( z i ) = λ i .Another approach to PCA is to use a correlation matrix M instead of the co-variance matrix S . The matrix M is the covariance matrix of a vector x that isnormalized by subtracting its mean from each entry and then divided by its standarddeviation. If X is a data matrix of size m × n , with no loss of generality, suposse that M = (1 /n ) X T X . Suppose further that M is ODN. By Theorem 1 we can obtain asparse matrix c M that is close in spectrum to M . Let z i and b z i be the i -th principalcomponents of M and c M , respectively. Thus, for the ﬁrst principal component wehave V ar ( z ) − ǫ √ nρ ( L M ) ≤ V ar ( b z ) ≤ V ar ( z ) + ǫ √ nρ ( L M ) .

3n general, for the ﬁrst p principal components we have that p X i =1 V ar ( z i ) − iǫ √ nρ ( L M ) ≤ p X i =1 V ar ( b z i ) ≤ p X i =1 V ar ( z i ) + iǫ √ nρ ( L M ) . This loss in precision is compensated with a gain in speed-up on the computation ofeigenvalues, which is a crucial step in PCA. For example, if we use Lanczos’s algorithm[7], which is sensitive to the sparsity of its input matrix, we can compute eigenvaluesand an orthonormal set of eigenvectors using O ( n /ǫ ) arithmetic operations, if weassume O (1 /ǫ ) non-zero entries per row on average. The remaining of this note is organized as follows. Section 2 presents the notationon linear algebra used throughout this work and brieﬂy reviews the main conceptsand techniques on spectral sparsiﬁcation of graphs. Section 3 presents some technicallemmas that are necessary for our proof of Theorem 1. Finally, Section 4 presents afull proof of Theorem 1.

In this section we introduce the notation used throughout this work and present somebasic facts from linear algebra and spectral sparsiﬁcation of graphs. In the entiretyof this work we use R to denote the set of real numbers and R + to denote the set ofpositive real numbers. Let M be a real matrix. We use ( M ) ij to denote the element in the i -th row and j -thcolumn of M . Let k M k , k M k ∞ and k M k denote the 2-norm, the ∞ -norm and the1-norm, respectively. The spectral radius of M is denoted as ρ ( M ). Recall that if M is symmetric, then k M k = ρ ( M ).The following theorem characterizes the eigenvalues of real symmetric matrices. Theorem (Courant-Fisher) . Let M ∈ R n × n be a symmetric matrix with eigenvalues α ≥ · · · ≥ α n . Then α k = max S ⊂ R ndim ( S )= k min x ∈ S x =0 x T M xx T x = min T ⊂ R ndim ( T )= n − k +1 max x ∈ T x =0 x T M xx T x , where the maximization and minimization are over subspaces S and T of R n . The following theorem due to Davis and Kahan [2] is important in perturbationtheory. The simpliﬁed version of the theorem we present here is by Yu, Wang andSamworth [5]. 4 heorem (Davis-Kahan [2]) . Let A and B be symmetric matrices, and R = A − B . Let α ≥ · · · ≥ α n be the eigenvalues of A with corresponding eigenvectors a , a , ..., a n , and let β ≥ · · · ≥ β n be the eigenvalues of B with corresponding eigen-vectors b , b , ..., b n . Let θ i be the angle between a i and b i , then sin θ i ≤ k R k min {| β i − − α i | , | β i +1 − α i |} . The last theorem gives a bound between eigenvalues of two symmetric matrices A and B from the norm k A − B k ; see the textbook of Bhatia [10], page 63, for a proof. Theorem (Weyl’s Perturbation Theorem) . Let

A, B ∈ R n × n be symmetric matricessuch that α , α , ..., α n , and β , β , ..., β n are the eigenvalues of A and B , respectively.Then, for all i = 1 , ..., n , | α i − β i | ≤ k A − B k . Let G = ( V, E, w ) be a simple undirected graph with vertices v , v , ..., v n ∈ V . Anedge in E between vertices v i and v j is denoted ij . Each edge ij has a weight assignedto it according to a weight function w : E → R + ∪ { } . As a short-hand we use w ij = w ( ij ). The adjacency matrix A G of G is deﬁned as ( A G ) ij = w ij if i = j , and( A G ) ij = 0, if i = j . We also deﬁne the degree matrix D G of G as ( D G ) ij = P nj =1 w ij if j = i , and ( D G ) ij = 0 otherwise. The Laplacian matrix L G of G is deﬁned as L G = D G − A G .Spectral sparsiﬁcation is a method introduced by Spielman and Teng [1] that isused to construct a sparse graph b G from any given graph G such that the spectrum oftheir Laplacian matrices are “close.” For any ǫ ∈ (0 , b G is an ǫ -spectralsparsiﬁer of G if for every x ∈ R n it holds that(1 − ǫ ) x T L G x ≤ xL b G x ≤ (1 + ǫ ) x T L G x. (1)It is clear that if the eigenvalues of L b G are b µ ≥ · · · ≥ b µ n , then by Eq.(1) and theCourant-Fischer theorem (1 − ǫ ) µ i ≤ b µ i ≤ (1 + ǫ ) µ i for all i = 1 , ..., n .For any two square matrices A, B we use A (cid:22) B whenever B − A is positivesemideﬁnite. Thus, we can succinctly write Eq.(1) as(1 − ǫ ) L G (cid:22) L b G (cid:22) (1 + ǫ ) L G . (2)Spielman and Teng [1] proved that every graph with positive weights has an ǫ -spectral sparsiﬁer close to linear-size in the number of vertices. The following theoremis currently the best construction of spectral sparsiﬁers. Theorem (Lee-Sun [4]) . Given any integer q ≥ and < ǫ ≤ / . Let G =( V, E, w ) be an undirected and weighted graph with n vertices and m edges. Then,there exists a (1 + ǫ ) -spectral sparsiﬁer of G with O ( qnǫ ) edges. Tecnical Lemmas

In this section we show some technical lemmas that will help us in proving Theorem1.

Lemma 1.

Let G be a graph of n vertices and b G be an ǫ -spectral sparsiﬁer of G . Let µ , µ , · · · , µ n be the eigenvalues of L G with respective eigenvectors x , x · · · , x n andlet b µ , b µ , · · · , b µ n be the eigenvalues of L b G with respective eigenvectors b x , x , · · · b x n . Then1. k L G − L b G k ≤ ǫρ ( L G ) , and2. if θ i is the angle between x i and b x i , then sin θ i ≤ ǫρ ( L G )min {| µ i − b µ i − | , | µ i − b µ i +1 |} , where we assume | µ i − b µ i ± | 6 = 0 for all i = 1 , . . . , n .Proof. Since b G is an ǫ -spectral sparsiﬁer of G we have that(1 − ǫ ) L G (cid:22) L b G (cid:22) (1 + ǫ ) L G , which implies L G − L b G (cid:22) ǫL G . (3)Note that δ i is an eigenvalue of L G − L b G if and only if − δ i is an eigenvalue of L b G − L G . With no loss of generality suppose that ρ ( L G − L b G ) coincides with the largesteigenvalue of L G − L b G and let z be a normalized eigenvector associated to ρ ( L G − L b G ).Then k L G − L b G k = ρ ( L G − L b G ) ( L G − L b G is symmetric)= z T ( L G − L b G ) z ≤ z T ( ǫL G ) z (from Eq.(3)) ≤ ρ ( ǫL G )= ǫ · k L G k = ǫ · ρ ( L G ) . The second part of the lemma is implied by the Davis-Kahan theorem and the ﬁrstpart of this lemma, thus completing the proof.

Lemma 2.

Let L G = D G − A G y L H = D H − A H be the Laplacian matrices of graphs G and H , respectively. Then k A G − A H k ≤ √ n k L G − L H k . roof. The matrix A G − A H is symmetric, and hence, k A G − A H k ∞ = k A G − A H k .Then, using the inequality of [8, Th.2.11-5] we have that k A G − A H k ≤ k A G − A H k ∞ .On the other hand, k L G − L H k ∞ = k A G − A H k ∞ + max i ≤ n (cid:12)(cid:12) ( D G ) ii − ( D H ) ii (cid:12)(cid:12) , then k A G − A H k ∞ ≤ k L G − L H k ∞ ≤ √ n k L G − L H k , and the lemma thus follows. Let M be an ODN symmetric matrix and let M be another matrix deﬁned as( M ) ij =  ( M ) ij if i = jd if i = j, where d = (∆ M + δ M ) /

2. Then we have that k M − M k = max { ∆ M − d, δ M − d } =(∆ M − δ M ) /

2, and thus, the chosen d is the value that minimizes the norm k M − M k .Now let λ ≥ · · · ≥ λ n and λ i ≥ · · · ≥ λ n be the eigenvalues of M and M ,respectively. By our deﬁnition of M and Weyl’s Perturbation Theorem we have that | λ i − λ i | ≤ ∆ M − δ M . (4)Recall that if we deﬁne two matrices A M and D M where ( A M ) ij = ( M ) ij if i = j ,and ( A M ) ij = 0 for i = j , and ( D M ) ij = P j ≤ n j = i ( M ) ij if j = i , and ( D M ) ij = 0 if j = i , we can see A M and D M as the adjacency and degree matrices of some graph G M . Consequently we have a Laplacian matrix L M = D M − A M . Thus, notice that L M = L M , where analogously we deﬁne L M = D M − A M .By the Lee-Sun theorem we know that given ǫ with 0 < ǫ < / b L M with O ( n/ǫ ) non-zero entries that is an ǫ -spectral sparsiﬁer of L M . Thenby Lemma 1 we have that k L M − b L M k ≤ ǫρ ( L M ) , and hence, by Lemma 2 it follows that k A M − b A M k ≤ ǫ √ nρ ( L M ) . If we let c M = b A M + dI , then from the last inequality we have k M − c M k ≤ ǫ √ nρ ( L M ) . Thus, if b λ ≥ b λ ≥ · · · ≥ b λ n are the eigenvalues of c M , by Weyl’s PerturbationTheorem, we can see that | λ i − b λ i | ≤ ǫ √ nρ ( L M ) . (5) k A k ≤ k A k · k A k ∞ for any matrix A . | λ i − b λ i | = | λ i − λ i + λ i − b λ i |≤ | λ i − λ i | + | λ i − b λ i |≤ ǫ √ nρ ( L M ) + ∆ M − δ M . The fact that c M has O ( nǫ ) non-zero elements follow directly from the Lee-Suntheorem. Finally, the last part of the theorem is obtained by means of the Davis-Kahan theoremsin θ i ≤ k M − c M k min {| b λ i − − λ i | , | λ i − b λ i +1 |} ≤ ǫ √ nρ ( L M ) + (∆ M − δ M ) / {| b λ i − − λ i | , | λ i − b λ i +1 |} . References [1] Spielman, D.A. & Teng, S.H. (2011)

Spectral sparsiﬁcation of graphs.

SIAM Jour-nal on Computing, 40(4) : 981-1025. https://doi.org/10.1137/08074489X [2] Davis, C., & Kahan, W. M. (1970). The rotation of eigenvectors bya perturbation. III. SIAM Journal on Numerical Analysis, 7(1), 1–46. https://doi.org/10.1137/0707001 [3] Jolliﬀe, I. T. (2002)

Principal component analysis, 2nd Edition . Springer. https://doi.org/10.1007/978-1-4757-1904-8_1 [4] Lee, Y.T. & Sun, H. (2018) Constructing linear-sized spectral sparsiﬁca-tion in almost-linear time.

SIAM Journal on Computing (6), 23152336. https://doi.org/10.1137/16M1061850 [5] Yu, Y., Wang, T. & Samworth, R.J. (2015) A useful variant of the DavisKa-han theorem for statisticians. Biometrika , Volume 102, Issue 2, Pages 315-323. https://doi.org/10.1093/biomet/asv008 [6] Pardalos, P. & Vavasis, S.A. (1991) Quadratic programming with one nega-tive eigenvalue is NP-hard. Journal of Global Optimization, Vol. 1, pp.15–22. https://doi.org/10.1007/BF00120662 [7] Lanczsos, C. (1950) An iteration method for the solution of the eigenvalue prob-lem of linear diﬀerential and integral operators. Journal of Research of the Na-tional Bureau of Standards, Vol. 45, No. 4, pp. 255–282.[8] Stewart, G., & Sun, J. G. (1990) Matrix perturbation theory Academic Press.San Diego. 89] Zouzias, A. (2012) A Matrix Hyperbolic Cosine Algorithm and Ap-plications. In Proceedings of the 39th International Colloqium onAutomata, Languages, and Programming (ICALP), pp. 846–858. https://doi.org/10.1007/978-3-642-31594-7_71https://doi.org/10.1007/978-3-642-31594-7_71