Bounds on the Spectral Sparsification of Symmetric and Off-Diagonal Nonnegative Real Matrices
aa r X i v : . [ c s . D S ] S e p Bounds on the Spectral Sparsification of Symmetricand Off-Diagonal Nonnegative Real Matrices ∗ Sergio Mercado † Marcos Villagra ‡ Abstract
We say that a square real matrix M is off-diagonal nonnegative if and onlyif all entries outside its diagonal are nonnegative real numbers. In this note weshow that for any off-diagonal nonnegative symmetric matrix M , there exists anonnegative symmetric matrix c M which is sparse and close in spectrum to M . Keywords. spectral sparsification, symmetric matrices, nonnegative matrices,spectral graph theory
MSC Class.
The run-time of many important algorithms in mathematics and computer sciencedepend on how “sparse” the input data is. One such example is Lanczsos’s algorithm[7] which can be used to compute a set of eigenvalues and eigenvectors of a matrix ofsize n in O ( n ) arithmetical operations in the worst-case. If the average number ofnon-zero entries per row in an input matrix to Lanczsos’s algorithm is bounded by aconstant, then its running-time can be bounded by O ( n ) arithmetic operations.In this note, we show how to construct sparse matrices from a certain class of sym-metric real matrices and present some potential applications with interesting researchdirections. ∗ The authors acknowledge the support of CONACyT research grants POSG17-62, PINV15-706and PINV15-208. † Email: [email protected] . Affiliation: Facultad Polit´ecnica, Universidad Nacional deAsunci´on, Campus Universitario, San Lorenzo C.P. 111421, Paraguay. This author is supported bya CONACyT scholarship for graduate studies. ‡ Email: [email protected] . Affiliation: Departamento de Matem´aticas, Universidad Na-cional de Asunci´on, Campus Universitario, San Lorenzo C.P. 111421, Paraguay. .2 Results The main approach of this work is to borrow some ideas from spectral graph theoryin order to construct sparse matrices that are close in spectrum to what we call off-diagonal nonnegative matrices.
Definition 1.
A real square matrix M is off-diagonal nonnegative (or simply ODNmatrix) if for all i, j = 1 , . . . , n, with i = j we have tha its entries ( M ) ij ≥ . Note that the diagonal elements of an ODN matrix can be any real number andonly its off-diagonal elements are nonnegative real numbers.Let M be a symmetric ODN matrix. Define two matrices A M and D M as follows:(i) ( A M ) ij = ( M ) ij if i = j and ( A M ) ij = 0 if i = j , and (ii) ( D M ) ij = P j ≤ n j = i ( M ) ij if j = i , and ( D M ) ij = 0 if j = i . Define a third matrix L M = D M − A M . Notethat the matrices A M and L M , as constructed, can be respectively interpreted as theadjacency and Laplacian matrices of some graph.Let us denote by ∆ M and δ M the largest and smallest element in the diagonal of M , respectively. Now we are ready to state our main result. Theorem 1.
Let M ∈ R n × n be an ODN symmetric matrix with m non-zero off-diagonal entries and a set of eigenvalues λ ≥ · · · ≥ λ n with a respective set ofeigenvectors x , x , · · · , x n . For any ǫ > such that < ǫ ≤ / , there existsan ODN symmetric matrix c M ∈ R n × n with O ( nǫ ) non-zero entries, with a set ofeigenvalues b λ ≥ · · · ≥ b λ n with a respective set of eigenvectors b x , · · · , b x n , such that | λ i − b λ i | ≤ ǫ √ nρ ( L M ) + ∆ M − δ M . Furthermore, if θ i is the angle between the subspaces spanned by eigenvectors x i and b x i , then sin θ i ≤ ǫ √ nρ ( L M ) + (∆ M − δ M ) / {| b λ i − − λ i | , | λ i − b λ i +1 |} . Note that if all diagonal entries in M are close in values, then (∆ M − δ M ) / ǫ √ nρ ( L M ).Theorem 1 is comparable to a result of Zouzias [9]. Given 0 < ǫ <
1, for anyself-adjoint matrix A of size n that is also θ -symmetric diagonally dominant, thereexists a matrix b A with at most O ( nθ log n/ǫ ) non-zero entries and k A − b A k ≤ ǫ k A k . Informally, a matrix A is θ -symmetric diagonally dominant if k A k ∞ = O ( √ θ ).Therefore, the approximation factor in the result of Zouzias [9] has a dependencyon the entry with the largest absolute value in the matrix A . Theorem 1 eliminatesthat dependency at the expense of further restrictions on the matrix we want toapproximate. The “log n ” factor can be dropped using new spectral sparsification techniques like Lee andSun [4] .3 Some Applications of Theorem 1 Let A ∈ R n × n be an ODN matrix. We say that a quadratic form Q ( x ) = x T Ax is ODN if its matrix A is ODN. Every quadratic form has a diagonal form Q ( x ) = λ x + · · · λ n x n , where each λ i is an eigenvalue of A . Furthermore, Sylvester’s Law ofInertia tells us that the number of positive and negative coefficients in any diagonalform of Q is an invariant of Q .Let b Q ( x ) = x T b Ax where b A is a matrix obtained from A by means of Theorem 1. If b Q ( x ) = b λ x + · · · b λ n x n where b λ i is an eigenvalue of b A , we have that | Q ( x ) − b Q ( x ) | ≤ ρ ( A ) + (∆ A − δ A ) / ǫ sufficiently small. Thus, if we are interested in theoptimization of Q ( x ), we can use b A as the input of a quadratic optimization solverinstead of A , and it will result in a solution with the guarantees mentioned in theprevious sentence. State-of-art quadratic optimization solvers can exploit the sparsityof an input matrix, which is especially important in the case of non-convex problems.Optimization of quadratic forms is an NP-hard optimization problem, even withbinary variables. In fact, a single negative eigenvalue suffices to make the problem NP-hard [6]. It is also closely related to the optimization of the Ising model in statisticalmechanics. We do not know, however, if the optimization of ODN quadratic forms isNP-hard or not, and we leave this as an open problem. Principal Componnet Analysis or
PCA is a method to reduce the dimensionality ofdata [3]. Given a set of data with correlated atributes, PCA reduces the number ofatributes while it preserves the variance, as best possible, of the data. PCA constructsa new set of non-correlated variables known as principal components .Let x be a vector of n random variables. The first principal component is definedas z = v T x , where v ∈ R n is a vector that maximizes the variance of z , denoted V ar ( z ). The i -th principal component is the variable z i = v ti x , with v i ∈ R n , suchthat each z , z , . . . , z i are pairwise uncorrelated and V ar ( z i ) is maximum. It is knownthat if S is the covariance matrix of x , with eigenvalues λ ≥ · · · λ n , then v i is theeigenvector of S corresponding to λ i and V ar ( z i ) = λ i .Another approach to PCA is to use a correlation matrix M instead of the co-variance matrix S . The matrix M is the covariance matrix of a vector x that isnormalized by subtracting its mean from each entry and then divided by its standarddeviation. If X is a data matrix of size m × n , with no loss of generality, suposse that M = (1 /n ) X T X . Suppose further that M is ODN. By Theorem 1 we can obtain asparse matrix c M that is close in spectrum to M . Let z i and b z i be the i -th principalcomponents of M and c M , respectively. Thus, for the first principal component wehave V ar ( z ) − ǫ √ nρ ( L M ) ≤ V ar ( b z ) ≤ V ar ( z ) + ǫ √ nρ ( L M ) .
3n general, for the first p principal components we have that p X i =1 V ar ( z i ) − iǫ √ nρ ( L M ) ≤ p X i =1 V ar ( b z i ) ≤ p X i =1 V ar ( z i ) + iǫ √ nρ ( L M ) . This loss in precision is compensated with a gain in speed-up on the computation ofeigenvalues, which is a crucial step in PCA. For example, if we use Lanczos’s algorithm[7], which is sensitive to the sparsity of its input matrix, we can compute eigenvaluesand an orthonormal set of eigenvectors using O ( n /ǫ ) arithmetic operations, if weassume O (1 /ǫ ) non-zero entries per row on average. The remaining of this note is organized as follows. Section 2 presents the notationon linear algebra used throughout this work and briefly reviews the main conceptsand techniques on spectral sparsification of graphs. Section 3 presents some technicallemmas that are necessary for our proof of Theorem 1. Finally, Section 4 presents afull proof of Theorem 1.
In this section we introduce the notation used throughout this work and present somebasic facts from linear algebra and spectral sparsification of graphs. In the entiretyof this work we use R to denote the set of real numbers and R + to denote the set ofpositive real numbers. Let M be a real matrix. We use ( M ) ij to denote the element in the i -th row and j -thcolumn of M . Let k M k , k M k ∞ and k M k denote the 2-norm, the ∞ -norm and the1-norm, respectively. The spectral radius of M is denoted as ρ ( M ). Recall that if M is symmetric, then k M k = ρ ( M ).The following theorem characterizes the eigenvalues of real symmetric matrices. Theorem (Courant-Fisher) . Let M ∈ R n × n be a symmetric matrix with eigenvalues α ≥ · · · ≥ α n . Then α k = max S ⊂ R ndim ( S )= k min x ∈ S x =0 x T M xx T x = min T ⊂ R ndim ( T )= n − k +1 max x ∈ T x =0 x T M xx T x , where the maximization and minimization are over subspaces S and T of R n . The following theorem due to Davis and Kahan [2] is important in perturbationtheory. The simplified version of the theorem we present here is by Yu, Wang andSamworth [5]. 4 heorem (Davis-Kahan [2]) . Let A and B be symmetric matrices, and R = A − B . Let α ≥ · · · ≥ α n be the eigenvalues of A with corresponding eigenvectors a , a , ..., a n , and let β ≥ · · · ≥ β n be the eigenvalues of B with corresponding eigen-vectors b , b , ..., b n . Let θ i be the angle between a i and b i , then sin θ i ≤ k R k min {| β i − − α i | , | β i +1 − α i |} . The last theorem gives a bound between eigenvalues of two symmetric matrices A and B from the norm k A − B k ; see the textbook of Bhatia [10], page 63, for a proof. Theorem (Weyl’s Perturbation Theorem) . Let
A, B ∈ R n × n be symmetric matricessuch that α , α , ..., α n , and β , β , ..., β n are the eigenvalues of A and B , respectively.Then, for all i = 1 , ..., n , | α i − β i | ≤ k A − B k . Let G = ( V, E, w ) be a simple undirected graph with vertices v , v , ..., v n ∈ V . Anedge in E between vertices v i and v j is denoted ij . Each edge ij has a weight assignedto it according to a weight function w : E → R + ∪ { } . As a short-hand we use w ij = w ( ij ). The adjacency matrix A G of G is defined as ( A G ) ij = w ij if i = j , and( A G ) ij = 0, if i = j . We also define the degree matrix D G of G as ( D G ) ij = P nj =1 w ij if j = i , and ( D G ) ij = 0 otherwise. The Laplacian matrix L G of G is defined as L G = D G − A G .Spectral sparsification is a method introduced by Spielman and Teng [1] that isused to construct a sparse graph b G from any given graph G such that the spectrum oftheir Laplacian matrices are “close.” For any ǫ ∈ (0 , b G is an ǫ -spectralsparsifier of G if for every x ∈ R n it holds that(1 − ǫ ) x T L G x ≤ xL b G x ≤ (1 + ǫ ) x T L G x. (1)It is clear that if the eigenvalues of L b G are b µ ≥ · · · ≥ b µ n , then by Eq.(1) and theCourant-Fischer theorem (1 − ǫ ) µ i ≤ b µ i ≤ (1 + ǫ ) µ i for all i = 1 , ..., n .For any two square matrices A, B we use A (cid:22) B whenever B − A is positivesemidefinite. Thus, we can succinctly write Eq.(1) as(1 − ǫ ) L G (cid:22) L b G (cid:22) (1 + ǫ ) L G . (2)Spielman and Teng [1] proved that every graph with positive weights has an ǫ -spectral sparsifier close to linear-size in the number of vertices. The following theoremis currently the best construction of spectral sparsifiers. Theorem (Lee-Sun [4]) . Given any integer q ≥ and < ǫ ≤ / . Let G =( V, E, w ) be an undirected and weighted graph with n vertices and m edges. Then,there exists a (1 + ǫ ) -spectral sparsifier of G with O ( qnǫ ) edges. Tecnical Lemmas
In this section we show some technical lemmas that will help us in proving Theorem1.
Lemma 1.
Let G be a graph of n vertices and b G be an ǫ -spectral sparsifier of G . Let µ , µ , · · · , µ n be the eigenvalues of L G with respective eigenvectors x , x · · · , x n andlet b µ , b µ , · · · , b µ n be the eigenvalues of L b G with respective eigenvectors b x , x , · · · b x n . Then1. k L G − L b G k ≤ ǫρ ( L G ) , and2. if θ i is the angle between x i and b x i , then sin θ i ≤ ǫρ ( L G )min {| µ i − b µ i − | , | µ i − b µ i +1 |} , where we assume | µ i − b µ i ± | 6 = 0 for all i = 1 , . . . , n .Proof. Since b G is an ǫ -spectral sparsifier of G we have that(1 − ǫ ) L G (cid:22) L b G (cid:22) (1 + ǫ ) L G , which implies L G − L b G (cid:22) ǫL G . (3)Note that δ i is an eigenvalue of L G − L b G if and only if − δ i is an eigenvalue of L b G − L G . With no loss of generality suppose that ρ ( L G − L b G ) coincides with the largesteigenvalue of L G − L b G and let z be a normalized eigenvector associated to ρ ( L G − L b G ).Then k L G − L b G k = ρ ( L G − L b G ) ( L G − L b G is symmetric)= z T ( L G − L b G ) z ≤ z T ( ǫL G ) z (from Eq.(3)) ≤ ρ ( ǫL G )= ǫ · k L G k = ǫ · ρ ( L G ) . The second part of the lemma is implied by the Davis-Kahan theorem and the firstpart of this lemma, thus completing the proof.
Lemma 2.
Let L G = D G − A G y L H = D H − A H be the Laplacian matrices of graphs G and H , respectively. Then k A G − A H k ≤ √ n k L G − L H k . roof. The matrix A G − A H is symmetric, and hence, k A G − A H k ∞ = k A G − A H k .Then, using the inequality of [8, Th.2.11-5] we have that k A G − A H k ≤ k A G − A H k ∞ .On the other hand, k L G − L H k ∞ = k A G − A H k ∞ + max i ≤ n (cid:12)(cid:12) ( D G ) ii − ( D H ) ii (cid:12)(cid:12) , then k A G − A H k ∞ ≤ k L G − L H k ∞ ≤ √ n k L G − L H k , and the lemma thus follows. Let M be an ODN symmetric matrix and let M be another matrix defined as( M ) ij = ( M ) ij if i = jd if i = j, where d = (∆ M + δ M ) /
2. Then we have that k M − M k = max { ∆ M − d, δ M − d } =(∆ M − δ M ) /
2, and thus, the chosen d is the value that minimizes the norm k M − M k .Now let λ ≥ · · · ≥ λ n and λ i ≥ · · · ≥ λ n be the eigenvalues of M and M ,respectively. By our definition of M and Weyl’s Perturbation Theorem we have that | λ i − λ i | ≤ ∆ M − δ M . (4)Recall that if we define two matrices A M and D M where ( A M ) ij = ( M ) ij if i = j ,and ( A M ) ij = 0 for i = j , and ( D M ) ij = P j ≤ n j = i ( M ) ij if j = i , and ( D M ) ij = 0 if j = i , we can see A M and D M as the adjacency and degree matrices of some graph G M . Consequently we have a Laplacian matrix L M = D M − A M . Thus, notice that L M = L M , where analogously we define L M = D M − A M .By the Lee-Sun theorem we know that given ǫ with 0 < ǫ < / b L M with O ( n/ǫ ) non-zero entries that is an ǫ -spectral sparsifier of L M . Thenby Lemma 1 we have that k L M − b L M k ≤ ǫρ ( L M ) , and hence, by Lemma 2 it follows that k A M − b A M k ≤ ǫ √ nρ ( L M ) . If we let c M = b A M + dI , then from the last inequality we have k M − c M k ≤ ǫ √ nρ ( L M ) . Thus, if b λ ≥ b λ ≥ · · · ≥ b λ n are the eigenvalues of c M , by Weyl’s PerturbationTheorem, we can see that | λ i − b λ i | ≤ ǫ √ nρ ( L M ) . (5) k A k ≤ k A k · k A k ∞ for any matrix A . | λ i − b λ i | = | λ i − λ i + λ i − b λ i |≤ | λ i − λ i | + | λ i − b λ i |≤ ǫ √ nρ ( L M ) + ∆ M − δ M . The fact that c M has O ( nǫ ) non-zero elements follow directly from the Lee-Suntheorem. Finally, the last part of the theorem is obtained by means of the Davis-Kahan theoremsin θ i ≤ k M − c M k min {| b λ i − − λ i | , | λ i − b λ i +1 |} ≤ ǫ √ nρ ( L M ) + (∆ M − δ M ) / {| b λ i − − λ i | , | λ i − b λ i +1 |} . References [1] Spielman, D.A. & Teng, S.H. (2011)
Spectral sparsification of graphs.
SIAM Jour-nal on Computing, 40(4) : 981-1025. https://doi.org/10.1137/08074489X [2] Davis, C., & Kahan, W. M. (1970). The rotation of eigenvectors bya perturbation. III. SIAM Journal on Numerical Analysis, 7(1), 1–46. https://doi.org/10.1137/0707001 [3] Jolliffe, I. T. (2002)
Principal component analysis, 2nd Edition . Springer. https://doi.org/10.1007/978-1-4757-1904-8_1 [4] Lee, Y.T. & Sun, H. (2018) Constructing linear-sized spectral sparsifica-tion in almost-linear time.
SIAM Journal on Computing (6), 23152336. https://doi.org/10.1137/16M1061850 [5] Yu, Y., Wang, T. & Samworth, R.J. (2015) A useful variant of the DavisKa-han theorem for statisticians. Biometrika , Volume 102, Issue 2, Pages 315-323. https://doi.org/10.1093/biomet/asv008 [6] Pardalos, P. & Vavasis, S.A. (1991) Quadratic programming with one nega-tive eigenvalue is NP-hard. Journal of Global Optimization, Vol. 1, pp.15–22. https://doi.org/10.1007/BF00120662 [7] Lanczsos, C. (1950) An iteration method for the solution of the eigenvalue prob-lem of linear differential and integral operators. Journal of Research of the Na-tional Bureau of Standards, Vol. 45, No. 4, pp. 255–282.[8] Stewart, G., & Sun, J. G. (1990) Matrix perturbation theory Academic Press.San Diego. 89] Zouzias, A. (2012) A Matrix Hyperbolic Cosine Algorithm and Ap-plications. In Proceedings of the 39th International Colloqium onAutomata, Languages, and Programming (ICALP), pp. 846–858. https://doi.org/10.1007/978-3-642-31594-7_71https://doi.org/10.1007/978-3-642-31594-7_71