Graph partitioning using matrix differential equations
Eleonora Andreotti, Dominik Edelmann, Nicola Guglielmi, Christian Lubich
GGRAPH PARTITIONING USING MATRIX DIFFERENTIAL EQUATIONS
ELEONORA ANDREOTTI ∗ , DOMINIK EDELMANN ‡ ,NICOLA GUGLIELMI ∗ , AND
CHRISTIAN LUBICH ‡ Abstract.
Given a connected undirected weighted graph, we are concerned with problems related to par-titioning the graph. First of all we look for the closest disconnected graph (the minimum cut problem), herewith respect to the Euclidean norm. We are interested in the case of constrained minimum cut problems, whereconstraints include cardinality or membership requirements, which leads to NP-hard combinatorial optimizationproblems. Furthermore, we are interested in ambiguity issues, that is in the robustness of clustering algorithmsthat are based on Fiedler spectral partitioning. The above-mentioned problems are restated as matrix nearnessproblems for the weight matrix of the graph. A key element in the solution of these matrix nearness problems isthe use of a constrained gradient system of matrix differential equations.
Key words.
Constrained minimum cut; spectral graph partitioning; algebraic connectivity; Fiedler vector;matrix nearness problem; constrained gradient flow; matrix differential equation
AMS subject classifications.
1. Introduction.
In this paper we present a novel approach to partitioning a connectedweighted undirected graph. We consider the Frobenius-norm minimum cut problem and allow forconstraints such as prescribing the minimum cardinality of connected components or assigning apriori selected vertices to a component. We use spectral graph theory as pioneered by Fiedler [6],see also the monograph by Chung [3] and the introductory articles [16, 18]. We formulate and usea gradient system of matrix differential equations to drive the smallest nonzero eigenvalue of thegraph Laplacian to zero. Once this eigenvalue becomes zero, the graph is disconnected and thecorresponding eigenvector indicates the membership of vertices to the connected components.This approach can be extended to other partitioning problems beyond the constrained minimumcut problems considered here.The approach of this paper takes basic ideas and techniques of recent algorithms for eigen-value optimization via differential equations, as given for example in [9, 8, 11, 10], to anotherapplication area. A common feature is a two-level procedure, where on the inner level a gradi-ent flow drives perturbations to the original matrix of a fixed size into a (local) minimum of afunctional that depends on eigenvalues and possibly eigenvectors, and in an outer iteration theperturbation size is determined such that the functional becomes zero. As with the previous al-gorithms cited above, the algorithms presented here cannot guarantee to find the global minimumof a non-smooth, non-convex optimization problem, or of an NP-hard combinatorial optimizationproblem. There are cases where our algorithm could get stuck in a local minimum, and we willpresent a contrived example where this happens. Even with this caveat, the presented algorithmperforms remarkably well in the examples from the literature on which we have tested it.As opposed to combinatorial algorithms, the algorithm presented here modifies all weightsof the graph as it proceeds, and only in the end arrives at the cut and the unchanged remainingweights.The proposed algorithm is an iterative algorithm, where in each step the second eigenvalueand the associated eigenvector of the Laplacian of a graph with perturbed weights are computed. ∗ Dipartimento di Ingegneria Scienze Informatiche e Matematica, Universit`a degli Studi di L’ Aquila, Via Vetoio- Loc. Coppito, I-67010 L’ Aquila and Gran Sasso Science Institute, L’Aquila, Italy. Email: [email protected] † Dipartimento di Ingegneria Scienze Informatiche e Matematica, Universit`a degli Studi di L’ Aquila, ViaVetoio - Loc. Coppito, I-67010 L’ Aquila, Italy. Email: [email protected] ‡ Mathematisches Institut, Universit¨at T¨ubingen, Auf der Morgenstelle 10, D–72076 T¨ubingen, Germany.Email: [email protected], [email protected] a r X i v : . [ m a t h . NA ] D ec E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich
In the cardinality- or membership-constrained cases, additionally a linear system with an ex-tended shifted Laplacian is solved in each step. For a large sparse connected graph (where thenumber of edges leaving any vertex is moderately bounded), these computations can be done ina complexity that is linear in the number of vertices. In the known (unconstrained) minimumcut algorithms, the computational complexity is at least quadratic [17]. It is thus conceivablethat for large sparse connected graphs, the proposed iterative algorithm can favorably competewith the classical unconstrained minimum cut algorithms. In constrained cases, it appears thatthe computational complexity is even more favorable in comparison with the existing heuristiccombinatorial algorithms as proposed in [2]. However, as of now no detailed comparisons of therelative merits of the conceptually and algorithmically fundamentally different approaches havebeen made.In Section 2 we formulate the Frobenius-norm minimum cut problem and its cardinality-and membership-constrained variants. This is stated as a matrix nearness problem where it isasked how far, with respect to the Frobenius norm, the weight matrix of the given graph isfrom that of some disconnected graph which should possibly satisfy additional constraints. Wegive basic notation and recall Fiedler’s theorem on graph connectivity. We also formulate anambiguity problem where it is asked how far the given weight matrix is from the weight matrixof a graph for which the second and third eigenvalues of the graph Laplacian coalesce and forwhich therefore graph partitioning based on the Fiedler vector (the eigenvector to the secondeigenvalue) becomes ambiguous.In Section 3 we describe the two-level approach to the unconstrained Frobenius-norm mini-mum cut problem. This is the central section of the paper, where the basic approach is developed.In Section 4 we extend the approach to the cardinality- and membership-constrained mini-mum cut problems, and in Section 5 we extend it to the ambiguity problem.In Section 6 we describe algorithmic aspects such as the discretization of the norm- andinequality-constrained gradient flow, the choice of initial values, and stopping criteria. In partic-ular, since it is known beforehand that the weights of the cut graph are either zero or those ofthe original graph, the iteration need not be carried out to full convergence.Section 7 shows numerical results of the proposed algorithm for some graphs taken from theliterature.
2. Preparations and problem formulation.2.1. The Frobenius-norm minimum cut problem.
Consider a graph with vertex set V = { , . . . , n } and edge set E ⊂ V × V . We assume that the graph is undirected : with ( i, j ) ∈ E ,also ( j, i ) ∈ E . With the undirected graph we associate weights w ij for ( i, j ) ∈ E , such that w ij = w ji ≥ i, j ) ∈ E . The graph is connected if for all i, j ∈ V , there is a path ( i , i ) , ( i , i ) , . . . , ( i (cid:96) − , i (cid:96) ) ∈ E ofarbitrary length (cid:96) , such that i = i and j = i (cid:96) and w i k − ,i k > k = 1 , . . . , (cid:96) .The problem considered in this paper is the following: Given a connected weighted undirectedgraph with weights w ij , we aim to find a disconnected weighted undirected graph with the sameedge set E and modified weights (cid:98) w ij such that (cid:88) ( i,j ) ∈E ( (cid:98) w ij − w ij ) is minimized. (2.1)The solution to this matrix nearness problem is the same as that of finding a cut C , i.e., a set ofedges that yield a disconnected graph when they are removed from E , wherethe cut C is such that (cid:88) ( i,j ) ∈C w ij is minimized. raph partitioning using differential equations w ij instead of w ij appears inthe above sum, this becomes the classical minimum cut problem, for which algorithms withcomplexity O ( |V| log |V| + |V| · |E| ) exist; see Stoer & Wagner [17] and references therein. The above problem will further be consid-ered with additional constraints. In particular, we consider the following cases: • Membership constraint:
It is required that a given set of vertices V + ⊂ V is in oneconnected component and another given set of vertices V − ⊂ V is in the other connectedcomponent. • Cardinality constraint:
It is required that each of the connected components has aprescribed minimum number n of vertices.It is known that cardinality constraints make the problem NP-hard [1, 2]. Setting w ij = 0 for ( i, j ) / ∈ E , wehave the symmetric weight matrix W = ( w ij ) ∈ R n × n . The degrees d i = (cid:80) nj =1 w ij are collected in the diagonal matrix D = diag( d i ) = diag( W ) , where := (1 , . . . , T ∈ R n .The Laplacian matrix L = Lap( W ) is defined by L = D − W, i.e., Lap( W ) = diag( W ) − W. We note that by the Gershgorin circle theorem, all eigenvalues of L are nonnegative, and L = 0,so that λ = 0 is the smallest eigenvalue of L . Remarkably, the connectivity of the graph ischaracterized by the second-smallest eigenvalue of L . Theorem 2.1 (
M. Fiedler [6] ). Let W ∈ R n × n be the weight matrix of an undirected graphand L the corresponding Laplacian matrix. Let λ ≤ λ ≤ . . . ≤ λ n be the eigenvalues of L .Then, the graph is disconnected if and only if λ = 0 . Moreover, if λ < λ , then the entriesof the corresponding eigenvector orthogonal to assume only two different values, of differentsign, which mark the membership to the two connected components. Because of this result, the second smallest eigenvalue λ of L is called algebraic connectivity of W . If λ is a simple eigenvalue, then the corresponding eigenvector is known as the Fiedlervector . Based on Theorem 2.1, a commonand computationally inexpensive strategy for partitioning a graph is to compute the Fiedlervector and to partition the graph according to the values of its entries. This becomes unreliablewhen a small perturbation of the weights yields a coalescence of the eigenvalues λ and λ . It isthen interesting to know the distance of the given weight matrix from the set of weight matriceswith λ = λ .
3. Two-level method for the Frobenius-norm minimum cut problem.3.1. Two-level formulation.
Our approach can be summarized as follows:1. Given ε >
0, we look for a symmetric matrix E = ( e ij ) ∈ R n × n with the same sparsitypattern as W (i.e., e ij = 0 if w ij = 0), of unit Frobenius norm, with W + εE ≥ W + εE ) isminimal. The obtained minimizer is denoted by E ( ε ).2. We look for the smallest value of ε such that the second smallest eigenvalue of Lap( W + εE ( ε )) equals 0. E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich
In order to compute E ( ε ) for a given ε >
0, we make use of a constrained gradient systemfor the functional F ε ( E ) = λ (cid:0) Lap( W + εE ) (cid:1) , (3.1)under the constraints of unit Frobenius norm and W + εE ≥ E .In the outer iteration we compute the optimal ε , denoted ε (cid:63) , by a combined Newton-bisectionmethod.The algorithm computes a partition of the graph as provided by the Fiedler vector corre-sponding to the weight matrix W + ε (cid:63) E ( ε (cid:63) ). This is not guaranteed to yield a global optimumfor the Frobenius-norm minimum cut problem, since the gradient flow might converge only to alocal minimum. In any case, it provides an upper bound for the distance problem (2.1). F ε .3.2.1. Eigenvalue derivatives. We will use the following standard perturbation result foreigenvalues; see, e.g., [12, Section II.1.1]. Here and in the following, we denote ˙ = d/dt . Lemma 3.1.
Consider the differentiable symmetric n × n matrix valued function C ( t ) for t in a neighborhood of . Let λ ( t ) be an eigenvalue of C ( t ) converging to a simple eigenvalue λ of C = C (0) as t → . Let x be the associated eigenvector, with (cid:107) x (cid:107) = 1 . Then λ ( t ) isdifferentiable near t = 0 with ˙ λ (0) = x T ˙ C (0) x . F ε . We denote by (cid:107) · (cid:107) = (cid:107) · (cid:107) F the Frobenius norm on R n × n and by (cid:104) X, Y (cid:105) = trace( X T Y ) the corresponding inner product.We return to the situation of the previous section. For a set of edges E , we define P E as theorthogonal projection from R n × n onto the sparsity pattern determined by E : for A = ( a ij ), P E ( A ) (cid:12)(cid:12) ij := (cid:40) a ij , if ( i, j ) ∈ E , , otherwise.For a fixed given weight matrix W and for ε >
0, we call a matrix E = ( e ij ) ∈ R n × n ε -feasible if the following conditions are satisfied:(i) E is of unit Frobenius norm.(ii) E is symmetric.(iii) E = P E ( E ).(iv) W + εE ≥ E ( t ) of ε -feasible matrices, and denote the corresponding Laplacianmatrix by L ( t ) = Lap( W + εE ( t )) and by λ ( t ) the second smallest eigenvalue of L ( t ). Lemma3.1 applied to the Laplacian matrix L ( t ) yields (omitting the argument t )˙ λ = x T ˙ Lx = (cid:104) xx T , ˙ L (cid:105) , (3.2)where x ( t ) is a corresponding eigenvector of unit Euclidean norm. Next we rearrange (3.2) to anequation ˙ λ = ε (cid:104) G ε ( E ) , ˙ E (cid:105) with an appropriate matrix-valued function G ε , which is the gradientof F ε in the space of symmetric matrices with sparsity pattern E . In the following, Sym( A ) = ( A + A T ) denotes the symmetric part of a quadratic matrix A , and we write x = ( x i ) ∈ R n for the vector of squares of the entries of x = ( x i ) ∈ R n . raph partitioning using differential equations Lemma 3.2.
In the above situation we have ˙ λ = ε (cid:104) G ε ( E ) , ˙ E (cid:105) , where G ε ( E ) = P E (Sym( x T ) − xx T ) (3.3) is symmetric and has the sparsity pattern determined by the set of edges E .Proof . We note that˙ L = Lap (cid:18) ddt (cid:0) W + εE ( t ) (cid:1)(cid:19) = ε Lap( ˙ E ) = ε (diag( ˙ E ) − ˙ E ) . (3.4)Combining (3.2) and (3.4), we obtain˙ λ = ε (cid:16) (cid:104) xx T , diag( ˙ E ) (cid:105) − (cid:104) xx T , ˙ E (cid:105) (cid:17) . (3.5)The second term is already in the desired form. We obtain for the first term (cid:104) xx T , diag( ˙ E ) (cid:105) = n (cid:88) i =1 x i ( ˙ E ) i = n (cid:88) i =1 n (cid:88) j =1 x i j ˙ e ij = (cid:104) x T , ˙ E (cid:105) . (3.6)This yields ˙ λ = ε (cid:104) x T − xx T , ˙ E (cid:105) . Since ˙ E and xx T are symmetric, this can be rewritten as˙ λ = ε (cid:104) Sym( x T ) − xx T , ˙ E (cid:105) . We then have ˙ λ = ε (cid:104) G ε ( E ) , ˙ E (cid:105) with G ε ( E ) = P E (Sym( x T ) − xx T ) . This is in the desired form: G ε ( E ) is symmetric and has the sparsity pattern E . Since E ( t ) is of unit Frobenius norm by condition (i), wehave 0 = 12 ddt (cid:107) E ( t ) (cid:107) = (cid:104) E ( t ) , ˙ E ( t ) (cid:105) . Condition (iv) requires that ˙ e ij ≥ i, j ) ∈ E , where E = E ( εE ) is the set of cut edgesdefined by E := { ( i, j ) ∈ E : w ij + εe ij = 0 } . Conditions (ii) and (iii) are satisfied if the same holds for ˙ E . These four conditions are in factalso sufficient for a matrix to be the time derivative of a path of ε -feasible matrices. Hence, forevery ε -feasible matrix E , a matrix Z = ( z ij ) ∈ R n × n is the derivative at t = 0 of some path of ε -feasible matrices starting at E if and only if the following four conditions are satisfied:(i’) (cid:104) E, Z (cid:105) = 0.(ii’) Z is symmetric.(iii’) Z = P E ( Z ).(iv’) P E ( Z ) ≥ z ij ≥ i, j ) ∈ E . E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich
To determine the admissible direction˙ E of steepest descent from E , we therefore consider the following optimization problem for G = G ε ( E ): min Z (cid:104) G, Z (cid:105) subject to (i’)–(iv’) and (cid:104)
Z, Z (cid:105) = 1. (3.7)The additional constraint (cid:107) Z (cid:107) = 1 just normalizes the descent direction. Problem (3.7) hasa quadratic constraint. We now formulate a quadratic optimization problem with linear con-straints, which is equivalent in the sense that it yields the same descent direction, providedthat a strict descent direction exists, i.e., satisfying (cid:104) G, Z (cid:105) < (cid:104) G, Z (cid:105) <
0, there exists a scaling factor α > (cid:104)
G, αZ (cid:105) = −
1. Consider the following problem:min Z (cid:104) Z, Z (cid:105) subject to (i’)-(iv’) and (cid:104)
G, Z (cid:105) = −
1. (3.8)Both optimization problems yield the same Karush–Kuhn–Tucker (KKT) conditions (apart fromthe normalization). Since the objective function (cid:104)
Z, Z (cid:105) of problem (3.8) is convex and all con-straints are linear, the KKT conditions are not only necessary but also sufficient conditions ([7,Theorem 9.4.1]), that is, a KKT point is already a solution of the optimization problem.The solution of (3.8) satisfies the KKT conditions Z = − G − κE + (cid:88) ( i,j ) ∈E µ ij e i e Tj , (3.9a) µ ij z ij = 0 for all ( i, j ) ∈ E , (3.9b) µ ij ≥ i, j ) ∈ E . (3.9c)In addition, there are conditions (i’)–(iv’). It can be shown (see [5, Lemma 3.2.1]) that thesymmetry and sparsity conditions (ii’) and (iii’) need not be imposed, but are consequences ofconditions (ii) and (iii) on E and the corresponding properties of G = G ε ( E ). The gradient flow of F ε under the constraints (i)–(iv)is the system of differential equations ˙ E ( t ) = Z ( t ) , (3.10)where Z ( t ) solves the KKT system (3.9) with G = G ε ( E ( t )) under the constraints (i’)–(iv’) withthe set of edges E ( t ) = E ( εE ( t )). Lemma 3.3.
On an interval where E ( t ) does not change, the gradient system becomes, with P + = P E\E and omitting the ubiquitous argument t , ˙ E = − P + G ε ( E ) − κP + E with κ = (cid:104)− G ε ( E ) , P + E (cid:105)(cid:107) P + E (cid:107) . (3.11) Proof . The positive Lagrange multipliers µ ij > e ij = 0.With G = G ε ( E ), the gradient system therefore reads˙ E = P + ( − G − κE ) , (3.12)where κ is determined from the constraint (cid:104) E, ˙ E (cid:105) = 0. We then have0 = (cid:104) E, ˙ E (cid:105) = (cid:104) E, P + ( − G − κE ) (cid:105) = −(cid:104) P + E, G (cid:105) − κ (cid:104) P + E, P + E (cid:105) , raph partitioning using differential equations w ij + εe ij = 0 and among them further those edges where the sign of − g ij − κe ij changes.When the active set is changed, then also κ changes in a discontinuous way. Let κ − and κ + bethe values of κ before and after the event of discontinuity, respectively. Then one has generically g ij + κ − e ij > i, j ), but the sign of g ij + κ + e ij may bepositive or negative. In the first case, ( i, j ) leaves E . In the latter case, only a generalizedsolution in the Filippov sense exists, which keeps ( i, j ) ∈ E , i.e., w ij + εe ij = 0. This is enforceduntil g ij + κ + e ij changes sign. From a practical perspective, it appears reasonable just to keep( i, j ) ∈ E for all future time once it has entered E , which means that a cut of an edge is madeirreversible. The following monotonicity result followsdirectly from the construction of the gradient system.
Theorem 3.4.
Let E ( t ) of unit Frobenius norm satisfy the differential equation (3.12) with G ε ( E ) of (3.3) . Then, the second smallest eigenvalue λ ( t ) of the Laplacian matrix Lap( W + εE ( t )) decreases monotonically with t : ˙ λ ( t ) ≤ . Equilibrium points of (3.12) are characterized as follows.
Theorem 3.5.
The following statements are equivalent along solutions of (3.12) :1. ˙ λ = 0 .2. ˙ E = 0 .3. P + E is a real multiple of P + G ε ( E ) .Proof . Using Lemma 3.2 and (3.12) we obtain, with G = G ε ( E ),1 ε ˙ λ = (cid:104) G, ˙ E (cid:105) = (cid:104) G, − P + G − κP + E (cid:105) = −(cid:107) P + G (cid:107) + (cid:104) P + G, P + E (cid:105) (cid:107) P + E (cid:107) . With the strong form of the Cauchy–Schwarz inequality, the result follows.
Let E ( ε ) denote the minimizer of the func-tional F ε . In general we expect that for a given perturbation size ε < ε (cid:63) , the eigenvalue λ ( W + εE ( ε )) > f ( ε ) = F ε ( E ( ε )) is a piecewise smooth function of ε and we can exploit its regularity to obtain a fast iterative method to converge to ε (cid:63) from theleft. Otherwise we can use a bisection technique to approach ε (cid:63) .The following result provides an inexpensive formula for the computation of the derivativeof f ( ε ) = F ε ( E ( ε )), which will be useful in the construction of the outer iteration of the method. Assumption 3.1.
We assume that the second smallest eigenvalue of
Lap( W + εE ( ε )) is simple . Moreover, E ( ε ) is assumed to be a smooth function of ε in some interval, and the set ofzero-weight edges E related to E ( ε ) is independent of ε in the interval. We denote again P + = P E\E . We then have the following result. Lemma 3.6.
Under Assumption , the function f ( ε ) = F ε ( E ( ε )) is differentiable and itsderivative equals (with (cid:48) = d/dε ) f (cid:48) ( ε ) = −(cid:107) P + G ε ( E ( ε )) (cid:107) (cid:107) P + E ( ε ) (cid:107) − ε (cid:107) P + G ε ( E ( ε )) (cid:107)(cid:107) P + E ( ε ) (cid:107) (cid:107) P E W (cid:107) . (3.13) Proof . Under Assumption 3.1, the projection P + remains constant near ε . This means thatwe are effectively working on a reduced set (cid:98) E = E \ E of edges. We set G + ( ε ) = P + G ε ( E ( ε ))and decompose εE ( ε ) = εP + E ( ε ) + R, where R = ( I − P + ) εE ( ε ) = − ( I − P + ) W = − P E W, E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich since w ij + εe ij ( ε ) = 0 for all ( i, j ) ∈ E . In particular, R is independent of ε . Differentiating f ( ε ) = F ε (cid:0) E ( ε ) (cid:1) with respect to ε we obtain f (cid:48) ( ε ) = (cid:10) G ε ( E ( ε )) , P + E ( ε ) + εP + E (cid:48) ( ε ) (cid:11) = (cid:10) G + ( ε ) , P + E ( ε ) + εP + E (cid:48) ( ε ) (cid:11) . (3.14)The conservation of (cid:107) E ( ε ) (cid:107) = 1 and of (cid:107) R (cid:107) for all ε implies (cid:104) P + E ( ε ) , P + E (cid:48) ( ε ) (cid:105) = ddε (cid:107) P + E ( ε ) (cid:107) = ddε (1 − ε − (cid:107) R (cid:107) ) = ε − (cid:107) R (cid:107) . Now we use the property of minimizers as stated by Theorem 3.5, G + ( ε ) (cid:107) G + ( ε ) (cid:107) = ± P + E ( ε ) (cid:107) P + E ( ε ) (cid:107) , which gives us (cid:10) G + ( ε ) , P + E ( ε ) (cid:11) = ±(cid:107) G + ( ε ) (cid:107) (cid:107) P + E ( ε ) (cid:107) and (cid:10) G + ( ε ) , εP + E (cid:48) ( ε ) (cid:11) = ± ε (cid:107) G + ( ε ) (cid:107)(cid:107) P + E ( ε ) (cid:107) (cid:104) P + E ( ε ) , P + E (cid:48) ( ε ) (cid:105) = ± ε − (cid:107) G + ( ε ) (cid:107)(cid:107) P + E ( ε ) (cid:107) (cid:107) R (cid:107) . From (3.14) we thus obtain the stated formula, since f (cid:48) ( ε ) ≤ ε = ε k < ε (cid:63) , we make use of the standard Newton iteration ε k +1 = ε k − f ( ε k ) f (cid:48) ( ε k ) , (3.15)In a practical algorithm it is useful to couple the Newton iteration (3.15) with a bisection tech-nique. To do this we adopt a tolerance tol which allows us to distinguish whether ε < ε (cid:63) , inwhich case we may use the derivative formula and perform the Newton step, or ε > ε (cid:63) , so thatwe have to make use of bisection. The method is formulated in Algorithm 1.
4. The two-level method for the membership- and cardinality-constrained min-imum cut problems.4.1. Functional for the membership-constrained minimum cut problem.
Our ap-proach to the membership problem is the same two-level procedure as for the unconstrainedminimum cut problem, except that the functional (3.1) is replaced by the following functional:For ε > E of unit Frobenius norm, let x = ( x i ) ∈ R n be the eigenvectorto the second smallest eigenvalue λ of Lap( W + εE ). Let V − and V + be the set of indiceswhose membership to different components of the cut graph is prescribed. Let x − = ( x − i ) with x − i = min( x i ,
0) and x + = ( x + i ) with x + i = max( x i ,
0) collect the negative and positive compo-nents of x , respectively. Let n − and n + be the numbers of negative and nonnegative componentsof x , respectively. We denote the averages of x − and x + by (cid:104) x − (cid:105) = 1 n − n (cid:88) i =1 x − i , (cid:104) x + (cid:105) = 1 n + n (cid:88) i =1 x + i . raph partitioning using differential equations Algorithm 1:
Newton-bisection method for distance approximation
Data:
Matrix W is given, k max (max number of iterations), tol (tolerance) ε , ε lb and ε ub (starting values for the lower and upper bounds for ε (cid:63) ) Result: ε (cid:63) (upper bound for the distance), E ( ε (cid:63) ) begin Compute E ( ε ) by the inner iteration Set k = 0 while k ≤ k max doif f ( ε k ) < tol then Set ε ub = min( ε ub , ε k )Set ε k +1 = ( ε lb + ε ub ) / else Set ε lb = max( ε lb , ε k ) Compute f ( ε k ) and f (cid:48) ( ε k ) Compute ε k +1 = ε k − f ( ε k ) f (cid:48) ( ε k ) (Newton step) if ε k +1 (cid:54)∈ ( ε lb , ε ub ) then Set ε k +1 = ( ε lb + ε ub ) / if k = k max or ε ub − ε lb < tol then Return ε k +1 and the interval [ ε lb , ε ub ] Stopelse
Set k = k + 1 Compute E ( ε k ) by the inner iteration Return ε (cid:63) = ε k Motivated by the special form of the eigenvectors as given in the Fiedler theorem (Theorem 2.1),we consider the functional F ε ( E ) = λ (Lap( W + εE )) + α (cid:88) i ∈V − ( x i − (cid:104) x − (cid:105) ) + α (cid:88) i ∈V + ( x i − (cid:104) x + (cid:105) ) , (4.1)where α > x is such that F ε ( E ) takes the smaller of the two possible values. This functional is to be minimized under theinequality constraints W + εE ≥
0, the norm constraint (cid:107) E (cid:107) = 1 and the symmetry and thesparsity pattern of E . For thecardinality-constrained problem we use the same functional F ε , except that the sets V − and V + are not given a priori , but are chosen depending on E in the following way: V − and V + collectthe indices of the smallest and largest n components of the eigenvector x , respectively, augmentedby those indices for which the components of x do not differ by more than a threshold δ fromthe average of the smallest and largest n components, respectively. F ε .4.3.1. Eigenvector derivatives. We use the following lemma.
Lemma 4.1. [15, Corollary 4]
Consider the differentiable n × n symmetric-matrix valuedfunction C ( t ) for t in a neighbourhood of , let λ ( t ) be a simple eigenvalue of C ( t ) and let E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich x ( t ) be the associated eigenvector normalized such that (cid:107) x ( t ) (cid:107) = 1 . Moreover, let M ( t ) = C ( t ) − λ ( t ) I and let M ( t ) † be the Moore-Penrose pseudoinverse of M ( t ) . Then, the derivativeof the eigenvector is given by ˙ x ( t ) = − M ( t ) † ˙ M ( t ) x ( t ) . (4.2)We remark that in [15] this is formulated with the group inverse, which in the symmetric caseis the same as the Moore-Penrose pseudoinverse. F ε . Consider a differentiable path E ( t ) of ε -feasible matrices, anddenote the corresponding Laplacian matrix by L ( t ) = Lap( W + εE ( t )), by λ ( t ) the secondsmallest eigenvalue of L ( t ) , and by x ( t ) the associated eigenvector. We set − = ( − i ) ∈ R n with − i = (cid:40) x i <
00 else , + = ( + i ) ∈ R n with + i = (cid:40) x i ≥
00 else , and, with e i denoting the i th standard unit vector, v = v + + v − with v ± = − (cid:88) i ∈V ± ( x i − (cid:104) x ± (cid:105) )( e i − n ± ± ) . We define z = ( L − λ I ) † v, which is computed as the solution of the linear system (cid:18) L − λ I xx T (cid:19) (cid:18) zµ (cid:19) = (cid:18) v (cid:19) . (4.3)We denote by x • y = ( x i y i ) the vector obtained by componentwise multiplication of the entriesof x and y . We then have the following result. Lemma 4.2.
In the above situation we have ddt F ε ( E ) = ε (cid:104) G ε ( E ) , ˙ E (cid:105) , where (4.4) G ε ( E ) = P E (cid:16) Sym (cid:0) ( x • ( x + αz ) T − x ( x + αz ) T (cid:1)(cid:17) (4.5) is symmetric and has the sparsity pattern determined by the set of edges E .Proof . We have ddt (cid:88) i ∈V − ( x i − (cid:104) x − (cid:105) ) = (cid:88) i ∈V − ( x i − (cid:104) x − (cid:105) )( ˙ x i − ddt (cid:104) x − (cid:105) )and similarly for the sum over V + . With K = ( L − λ I ) † we obtain from Lemma 4.1 that˙ x i = − e Ti K ˙ Lx, ddt (cid:104) x ± (cid:105) = 1 n ± ± T K ˙ Lx, raph partitioning using differential equations ddt (cid:18) (cid:88) i ∈V − ( x i − (cid:104) x − (cid:105) ) + 12 (cid:88) i ∈V + ( x i − (cid:104) x + (cid:105) ) (cid:19) = v T K ˙ Lx = (cid:104) Kvx T , ˙ L (cid:105) = (cid:104) zx T , ˙ L (cid:105) . Using the expression for ˙ L given in (3.4) and proceeding as in the proof of Lemma 3.2 gives theresult.The computational cost of computing G ε ( E ) lies in computing the second eigenvalue and itseigenvector and in solving the linear system (4.3). For a sparse weight matrix, these computationshave a complexity that is linear in the number of vertices. With this gradient G ε ( E ), the furtherprocedure is now exactly the same as in Section 3.
5. The two-level method for the ambiguity problem.5.1. Two-level formulation.
For the ambiguity problem we proceed similarly as in Sec-tion 3.1. Given ε >
0, we look for a symmetric matrix E ∈ R n × n with the same sparsity patternas W (i.e., e ij = 0 if w ij = 0), of unit Frobenius norm, with W + εE ≥ W + εE ) is minimized. The obtained minimizer is denoted by E ( ε ).2. We look for the smallest value of ε such that the second and third eigenvalues of Lap( W + εE ( ε )) coalesce.In order to compute E ( ε ) for a given ε >
0, we make use of a constrained gradient systemfor the functional F ε ( E ) = λ (cid:0) Lap( W + εE ) (cid:1) − λ (cid:0) Lap( W + εE ) (cid:1) , (5.1)under the inequality constraints W + εE ≥
0, the norm constraint (cid:107) E (cid:107) = 1 and the symmetryand the sparsity pattern of E .In the outer iteration we compute the optimal ε , denoted ε (cid:63) , by a combined Newton-bisectionmethod as in Section 3. F ε . Consider a regular path E ( t ) of ε -feasible matrices, and denote thecorresponding Laplacian matrix by L ( t ) = Lap( W + εE ( t )) and by λ ( t ) and λ ( t ) the second andthird smallest eigenvalues of L ( t ), respectively. We denote by x ( t ) and y ( t ) the correspondingeigenvectors of unit Euclidean norm. Lemma 5.1.
In the above situation we have ddt F ε ( E ) = ε (cid:104) G ε ( E ) , ˙ E (cid:105) , where G ε ( E ) = − P E (cid:0) Sym( x T ) − xx T − Sym( y T ) + yy T (cid:1) is symmetric and has the sparsity pattern determined by the set of edges E . With this gradient we then proceed further as in Section 3.
6. Algorithmic aspects.6.1. Discretizing the constrained gradient flow.
We use a modified explicit Eulermethod for the approximate integration of the differential equation (3.10). For a given ε >
0, astep-size h > ε -feasible perturbation matrix E n of the n th time step, we compute E n +1 as follows. We compute G ε ( E n ) = (cid:0) g nij (cid:1) and define (cid:101) E n +1 = (cid:0) ˜ e n +1 ij (cid:1) by setting˜ e n +1 ij = e nij − hg nij if w ij + ε (cid:0) e nij − hg nij (cid:1) ≥ E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich and else ˜ e n +1 ij = e nij − θhg nij with θ ∈ [0 ,
1) such that w ij + ε (cid:0) e nij − θhg nij (cid:1) = 0 , that is, with θ = ( w ij + εe nij ) / ( εhg nij ) . We would ideally take the new perturbation matrix E n +1 such that (cid:107) E n +1 − (cid:101) E n +1 (cid:107) → min subject to (cid:107) E n +1 (cid:107) = 1 and W + εE n +1 ≥ . We approximate this optimization problem by treating the two constraints one after the otherin an alternating way. Let E be the set of edges ( i, j ) ∈ E for which w ij + ε ˜ e n +1 ij = 0 (cut edges),and let P = P E and P + = P E\E be the complementary projections as defined in Section 3.2.2.We first normalize by choosing ρ > (cid:98) E n +1 = P (cid:101) E n +1 + ρP + (cid:101) E n +1 has unit Frobenius norm, i.e., ρ = (cid:113) − (cid:107) P (cid:101) E n +1 (cid:107) (cid:107) P + (cid:101) E n +1 (cid:107) . (In case that (cid:107) P (cid:101) E n +1 (cid:107) is larger than 1 or very close to 1, we replace P (cid:101) E n +1 by P E n in thetwo lines above.) We denote by E − the set of inadmissible edges ( i, j ) for which w ij + ε ˆ e n +1 ij < e n +1 ij to e n +1 ij = − w ij ε for ( i, j ) ∈ E − , augment E := E ∪ E − and consider the updated projection P = P E . We then normalize theso obtained matrix E n +1 in the same way as above by leaving the entries of P E n +1 unchanged,reset the entries for inadmissible edges, normalize, and so on. As there are only finitely manyedges, this iteration terminates after finitely many steps (typically after the first step). Finally,we have obtained an ε -feasible perturbation matrix E n +1 . The step-size h can, for example, be selected by the followingadaptive algorithm. Here the objective is to reduce the function F ε , not to follow accurately atrajectory of the constrained gradient differential equation. Let F n = F ε ( E n ). In order to stop the integration when F n hasapproximately reached a stationary value, we use a criterion of the following type:Integrate until F n − F n +1 ≤ βhF n + δ or F n ≤ tol , where tol is a tolerance parameter (e.g., tol = 10 − ) and β and δ are further parameters. Wehad good experience with the choice β = 10 · tol and δ = tol / ε . When we changeto a new value of ε in the outer iteration, we need an initial value for the constrained gradientflow. A first idea might be to take the terminal perturbation matrix (cid:101) E = E ( ε old ) as the initialvalue, but usually this does not satisfy the nonnegativity constraints W + εE ≥ ε ≥ ε old .We therefore modify (cid:101) E to E by solving approximately (cid:107) E − (cid:101) E (cid:107) → min subject to (cid:107) E (cid:107) = 1 and W + εE ≥ , alternating between normalization and enforcing the nonnegativity constraints as in Section 6.1,but this time beginning with the empty set E = ∅ . raph partitioning using differential equations Algorithm 2:
Step-size selection
Data:
Matrix E n and stepsize h n − are given Result:
Matrix E n +1 and stepsize h n begin Initialize the step-size by the previous step-size, h = h n − Compute E n +1 ( h ) and its function value F ε ( E n +1 ( h )) with the step-size h if F ε ( E n +1 ( h )) ≥ F ε ( E n ) then halve the step-size, h := h/ elseif h = h n − then compute E n +1 (2 h ) and its function value F ε ( E n +1 (2 h )) with thestep-size 2 h if F ε ( E n +1 (2 h )) ≤ F ε ( E n +1 ( h )) then double the step-size, h := 2 h Set h n = h and E n +1 = E n +1 ( h ) Return E n +1 and h n ε and the initial perturbation matrix. While one might just start with a random perturbation, a more educated guess starts from thenormalized free gradient E = − G ε (0) / (cid:107) G ε (0) (cid:107) and determines ε as the largest number ε suchthat W + εE ≥ In the exact solution W (cid:63) = W + ε (cid:63) E (cid:63) to the constrained minimum cut problem, the entries of W (cid:63) are either zero or those of W . To decide about the cut, it is therefore not necessary to iterate towards W (cid:63) with very highaccuracy, but instead the cut can be inferred earlier from a moderately accurate approximation W + εE ( ε ), for example using the following criterion, with a small threshold parameter ϑ > i, j ) ∈ E , either w ij + εe ij ≤ ϑw ij or | εe ij | ≤ ϑw ij .In the first case one would then cut to w (cid:63)ij = 0 and in the second case one would leave theweight unchanged: w (cid:63)ij = w ij . It can finally be checked if the so obtained cut graph is indeeddisconnected, by computing λ (Lap( W (cid:63) )). Instead, in the ambiguity problem such a shortcut isnot feasible.
7. Numerical examples.
We consider a few illustrative examples for both the cardinalityand the membership constraints. At the end we will also consider graphs to illustrate theambiguity problem.The standard Fiedler spectral partitioning algorithm, to which we refer below, is simplybased on the sign of the components of the eigenvector of Lap( W ) associated to λ . Example 1 (
Zachary’s karate club ). This weighted graph consisting of 34 vertices describesthe relationship between 34 members of a karate club (for a detailed description see [19]). Usingthe Fiedler spectral partitioning we obtain two connected components of 16 and 18 vertices asshown in Figure 7.1a.According tho the standard Fiedler partitioning the first component is led by the vertexlabeled as 1, while the second one is led by the vertex labeled as 34.We next consider the following constraints:(i) Cardinality constraint with threshold equal to ¯ n = 17 vertices;4 E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich (ii) Membership constraint;(iii) Both constraints.In more detail:(i) By asking for a cardinality constraint with ¯ n = 17 vertices in each component, theapproximate computed distance is ε (cid:63) ≈ . − and the weight α = 3 in (4.1). With respect tothe standard partition obtained by the Fiedler eigenvector of Lap( W ), the vertex thatchanges partition is the vertex labeled as 9.Table 7.1: Computed values of ε , f ( ε ) = F ε ( E ( ε )) for Example 1,(i)k ε k f ( ε k )0 1.355198757424337 5.0000000000000001 1.142857142857148 10.0363138917051882 0.000001319577846 10.036313891705202(ii) In this second case we consider the membership constraint: we ask for the vertices 1 and34 to be in different partitions. Moreover we consider 4 different cases, that is vertex 9 tobe in the same connected component as vertex 1, vertex 32 to be in the same componentas vertex 1, vertex 14 to be in the same component as vertex 34, or vertex 20 in thesame component as vertex 34.For these four examples, Table 7.2 reports the values of ε (cid:63) for the functional F ε of (4.1)with α = 3, as computed by setting the tolerance tol = 10 − . We can see from Table7.2 that vertex 9 is the easiest to be required for changing the connected component; onthe other hand, vertex 20 turns out to be the most difficult to change the component. (a) Zachary’s karate club colored by standard Fiedlerpartitioning. (b) Zachary’s karate club-membership constraint:highlighted the vertices required to change partitionwith respect to the clustering shown in (a). Fig. 7.1: Example 1: Zachary’s karate club(iii) Finally we consider both constraints, that is, we ask for a cardinality constraint withthreshold ¯ n = 17 vertices, and we require that vertex 9 is in the same connected com-ponent as the vertex 34. Table 7.3 shows values of ε k , f ( ε k ) = F ε k ( E ( ε k )) computed raph partitioning using differential equations ε (cid:63) for Example 1,(ii)node ε (cid:63) − and the weights α c = 3 , α m = 10 in the functional F ε that combines the cardinality and membership functionals. We obtain that the vertex 10further changes the connected component in order to satisfy the cardinality constraint.Table 7.3: Computed values of ε k , f ( ε k ) for Example 1,(iii)k ε k f ( ε k )0 1.401034325554263 5.0000000000000001 1.279411764675930 10.2066522330313562 0.000000000000002 10.206652233031399 Example 2 (
A misbehavior of the algorithm ). We present an example where the algorithmfails.Consider an unweighted graph with N vertices, such that each vertex 2 , . . . , N is connectedto the following two vertices, i. e. w i,i +1 = w i,i +2 = 1 for all 2 ≤ i ≤ N − . The first vertex is connected to the second one, i.e., w , = 1, but not to the third one. It is clear1 2 3 4 5 6 7 8Fig. 7.2: Graph with 8 vertices.that the minimum cut is obtained by removing the edge (1 , N . The algorithm works correctly when N = 8 but fails when N ≥ N = 20 the resulting partition is { , , . . . , } ∪ { , , . . . , } instead of the correct partition { } ∪ { , , . . . , } . If we impose a cardinality constraint with ¯ n = 10 we get the same solution. Example 3 (
Books about US Politics ). This graph is a network of 105 vertices, each onerepresenting a book about US politics sold in 2004 by an online bookseller [14]. Two books arelinked if they were purchased by the same person, and the books are colored red, blue or green,based on book buying data (see Fig.(7.3b)). The links determine the grouping and coloringof the vertices. The vertices are colored by Fiedler spectral partitioning in Figure 7.3a. Bythis partitioning we obtain two connected components of 52 and 53 vertices. We ask for the6
E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich(a) Books about US Politics colored by Fiedler eigen-vector. (b) Books about US Politics colored red, blue orgreen, based on book buying data.
Fig. 7.3: Example 3: Books about US Politicsmembership constraints: we compute the distance (see Table 7.4) for each one of the vertex inthe green group of Figure 7.3b to belong to the blue group ( ε (cid:63)b ) or to belong to the red group( ε (cid:63)r ). The results reported in the tables are obtained with the weight α = 1.Table 7.4: Computed values of ε (cid:63) for Example 3vertex ε (cid:63)b ε (cid:63)r Example 4 (
Les Miserables ). Figure 7.4 shows the graph of character co-occurence in LesMiserables [13]. This graph consists of 77 vertices (representing characters). According to theFiedler partitioning, 22 of these belong to one part and the remaining 55 belong to the otherpart. Asking for the partitioning of the graph with the cardinality constraint with threshold¯ n = 35, we obtain the result shown in Figure 7.4. raph partitioning using differential equations Example 5 (
Planted Partition Model — ambiguity problem ). We consider here a class ofgraphs for which we investigate the distance to ambiguity (that is, a coalescence of the second andthird eigenvalues in the associated Laplacian matrix). We consider Planted Partition Models [4],which constitute a special case of Stochastic Block models, a commonly used generative modelfor social and biological networks. The probability matrix consists of a constant value p in on thediagonal and a different constant value p out off the diagonal; in addition, the number of verticesis n = 40, while the communities are 4, each one made of 10 vertices. In Figure 7.3 we can seetwo Planted Partition Models; on the left when p in = 0 . p out = 0 .
2, on the right when p in = 0 . p out = 0 .
1. In Table 7.5 a comparison among various values of p in and p out isshown. These results are obtained by setting the tolerance to tol = 10 − . (a) Sparsity pattern for PPM with p in = . p out = . p in = . p out = . Fig. 7.5: Example 4: Planted Partition ModelThe results show that relatively small perturbations may yield an ambiguity in the Fiedlerpartitioning for these graphs.8
E. Andreotti, D. Edelmann, N. Guglielmi and C. Lubich
Table 7.5: Example 5: Computed values of ε (cid:63) for various parameters p in p out ε (cid:63) Acknowledgments.
The authors thank Daniel Kressner (EPFL, Lausanne) for interestingdiscussions during an Oberwolfach meeting and Armando Bazzani (University of Bologna, Italy)for stimulating discussions.Part of this work was developed during some visits to Gran Sasso Science Institute in L’Aquilaand to the University of T¨ubingen. The authors thank both institutions for the very kindhospitality.N. Guglielmi thanks the Italian M.I.U.R. and the INdAM GNCS for financial support andalso the Center of Excellence DEWS.
REFERENCES[1] M. Bruglieri, M. Ehrgott, H. W. Hamacher, and F. Maffioli. An annotated bibliography of combinatorialoptimization problems with fixed cardinality constraints.
Discrete Appl. Math. , 154(9):1344–1357, June2006.[2] M. Bruglieri, F. Maffioli, and M. Ehrgott. Cardinality constrained minimum cut problems: Complexity andalgorithms.
Discrete Appl. Math. , 137(3):311–341, March 2004.[3] F. R. K. Chung.
Spectral Graph Theory . American Mathematical Society, 1997.[4] A. Condon and R.M. Karp. Algorithms for graph partitioning on the planted partition model.
RandomStruct. Algorithms , 18(2):116–140, March 2001.[5] D. Edelmann.
Graph partitioning using differential equations . Master Thesis, Univ. T¨ubingen, 2017.[6] M. Fiedler. Algebraic connectivity of graphs.
Czechoslovak Math. J. , 23(98):298–305, 1973.[7] R. Fletcher.
Practical methods of optimization . John Wiley & Sons, 2013.[8] N. Guglielmi, D. Kressner, and C. Lubich. Low rank differential equations for Hamiltonian matrix nearnessproblems.
Numer. Math. , 129:279–319, 2015.[9] N. Guglielmi and C. Lubich. Differential equations for roaming pseudospectra: paths to extremal pointsand boundary tracking.
SIAM J. Numer. Anal. , 49:1194–1209, 2011.[10] N. Guglielmi and C. Lubich. Matrix stabilization using differential equations.
SIAM J. Numer. Anal. , pagein press, 2018.[11] N. Guglielmi, C. Lubich, and V. Mehrmann. On the nearest singular matrix pencil.
SIAM J. Matrix Anal.Appl. , 38:776–806, 2017.[12] T. Kato.
Perturbation Theory for Linear Operators . Springer Verlag, New York, N.Y., 1995.[13] D. E. Knuth.
Stanford GraphBase: A Platform for Combinatorial Computing, The . Addison-WesleyProfessional, 1st edition, 2009.[14] V. Krebs. Books about US politics. , 2004.[15] C.D. Meyer and G.W. Stewart. Derivatives and perturbations of eigenvectors.
SIAM J. Numer. Anal. ,25:679–691, 1988.[16] D. Spielman. Spectral graph theory and its application. In
Proceedings of the 48th Annual IEEE Symposiumon Foundations of Computer Science , pages 29–38, 2007.[17] M. Stoer and F. Wagner. A simple min-cut algorithm.
Journal of the ACM (JACM) , 44(4):585–591, 1997.[18] U. von Luxburg. A tutorial on spectral clustering.
Statistics and computing , 17(4):395–416, 2007.[19] W.W. Zachary. An information flow model for conflict and fission in small groups.