Nonlinear Eigenproblems in Data Analysis - Balanced Graph Cuts and the RatioDCA-Prox
aa r X i v : . [ s t a t . M L ] M a r Nonlinear Eigenproblems in Data Analysis -Balanced Graph Cuts and the RatioDCA-Prox
Leonardo Jost, Simon Setzer and Matthias Hein
Abstract
It has been recently shown that a large class of balanced graph cuts allowsfor an exact relaxation into a nonlinear eigenproblem. We review briefly some ofthese results and propose a family of algorithms to compute nonlinear eigenvectorswhich encompasses previous work as special cases. We provide a detailed analysisof the properties and the convergence behavior of these algorithms and then discusstheir application in the area of balanced graph cuts.
Key words:
Clustering, Graphs, Hypergraphs, Balanced graph cuts, Differences ofconvex functions, Ratios of convex functions, Nonlinear eigenproblem, Nonconvexoptimization, Lovasz extension
Spectral clustering is one of the standard methods for graph-based clustering [1]. Itis based on the spectral relaxation of the so called normalized cut, which is one ofthe most popular criteria for balanced graph cuts. While the spectral relaxation isknown to be loose [2], tighter relaxations based on the graph p -Laplacian have beenproposed in [3]. Exact relaxations for the Cheeger cut based on the nonlinear eigen-problem of the graph 1-Laplacian have been proposed in [4, 5]. In [6] the generalbalanced graph cut problem of an undirected, weighted graph ( V , E ) is considered.Let n = | V | and denote the weight matrix of the graph by W = ( w i j ) ni , j = , then thegeneral balanced graph cut criterion can be written asarg min A ⊂ V cut ( A , A ) ˆ S ( A ) , Department of Mathematics and Computer Science, Saarland University, Saarbr¨ucken, Germanye-mail: [email protected],[email protected],[email protected] where A = V \ A , cut ( A , A ) = (cid:229) i ∈ A , j ∈ A w i j , and ˆ S : 2 V → R + is a symmetric and non-negative balancing function. Exact relaxations of such balanced graph cuts and rela-tions to corresponding nonlinear eigenproblems are discussed in [6] and are brieflyreviewed in Section 2. A further generalization to hypergraphs has been establishedin [7].There exist different approaches to minimize the exact continuous relaxations.However, in all cases the problem boils down to the minimization of a ratio of aconvex and a difference of convex functions. The two lines of work of [8, 9] and[5, 6] have developed different algorithms for this problem, which have been com-pared in [8]. We show that both types of algorithms are special cases of our newalgorithm RatioDCA-prox introduced in Section 3.1. We provide a unified analy-sis of the properties and the convergence behavior of RatioDCA-prox. Moreover, inSection 4 we prove stronger convergence results when the RatioDCA-prox is ap-plied to the balanced graph cut problem or, more generally, problems where oneminimizes nonnegative ratios of Lovasz extensions of set functions. Further, we dis-cuss the choice of the relaxation of the balancing function in [6] and show that froma theoretical perspective the Lovasz extension is optimal which is supported by thenumerical results in Section 5. A key element for the exact continuous relaxation of balanced graph cuts is theLovasz extension of a function on the power set 2 V to R V . Definition 1.
Let ˆ S : 2 V → R be a set function with ˆ S ( /0 ) =
0. Let f ∈ R V , let V be ordered such that f ≤ f ≤ . . . ≤ f n and define C i = { j ∈ V | j > i } . Then, theLovasz extension S : R V → R of ˆ S is given by S ( f ) = n (cid:229) i = f i (cid:16) ˆ S ( C i − ) − ˆ S ( C i ) (cid:17) = n − (cid:229) i = ˆ S ( C i )( f i + − f i ) + f ˆ S ( V ) . Note that for the characteristic function of a set C ⊂ V , we have S ( C ) = ˆ S ( C ) .The Lovasz extension is convex if and only if ˆ S is submodular [10] and every Lo-vasz extension can be written as a difference of convex functions [6]. Moreover,the Lovasz extension of a symmetric set function is positively one-homogeneous and preserves non-negativity, that is S ( f ) ≥ , ∀ f ∈ R V if ˆ S ( A ) ≥ , ∀ A ⊂ V . It itwell known, see e.g. [7], that the Lovasz extension of the submodular cut function,ˆ R ( A ) = cut ( A , A ) , yields the total variation on a graph, R ( f ) = n (cid:229) i , j = w i j | f i − f j | . (1) A function A : R n → R is (positively) p -homogeneous if A ( n x ) = n p A ( x ) for all n ∈ R ( n ≥ Theorem 1 shows exact continuous relaxations of balanced graph cuts [6]. A moregeneral version for the class of constrained fractional set programs is given in [11].
Theorem 1.
Let G = ( V , E ) be an undirected, weighted graph and S : V → R andlet ˆ S : 2 V → R be symmetric with ˆ S ( /0 ) = , then min f ∈ R V (cid:229) ni , j = w i j | f i − f j | S ( f ) = min A ⊂ V cut ( A , A ) ˆ S ( A ) , if either one of the following two conditions holds1. S is one-homogeneous, even, convex and S ( f + a ) = S ( f ) for all f ∈ R V , a ∈ R and ˆ S is defined as ˆ S ( A ) : = S ( A ) for all A ⊂ V .2. S is the Lovasz extension of the non-negative, symmetric set function ˆ S with ˆ S ( /0 ) = .Let f ∈ R V and denote by C t : = { i ∈ V | f i > t } , then it holds under both conditions, min t ∈ R cut ( C t , C t ) ˆ S ( C t ) ≤ (cid:229) ni , j = w i j | f i − f j | S ( f ) . We observe that the exact continuous relaxation corresponds to a minimization prob-lem of a ratio of non-negative, one-homogeneous functions, where the enumeratoris convex and the denominator can be written as a difference of convex functions.
We consider in this paper continuous optimization problems of the formmin f ∈ R V F ( f ) , where F ( f ) = R ( f ) S ( f ) = R ( f ) − R ( f ) S ( f ) − S ( f ) , (2)where R , R , S , S are convex and one-homogeneous and R ( f ) = R ( f ) − R ( f ) and S ( f ) = S ( f ) − S ( f ) are non-negative. Thus we are minimizing a non-negativeratio of d.c. (difference of convex) functions. As discussed above the exact contin-uous relaxation of Theorem 1 leads exactly to such a problem, where R ( f ) = R ( f ) = (cid:229) ni , j = w i j | f i − f j | . Different choices of balancing functions lead todifferent functions S .While [5, 9, 8] consider only algorithms for the minimization of ratios of convexfunctions, in [6] the RatioDCA has been proposed for the minimization of prob-lems of type (2). The generalized version RatioDCA-prox is a family of algorithmswhich contains the work of [5, 6, 9, 8] as special cases and allows us to treat theminimization problem (2) in a unified manner. Leonardo Jost, Simon Setzer and Matthias Hein
The RatioDCA-prox algorithm for minimization of (2) is given in Algorithm 1. Ineach step one has to solve the convex optimization problemmin G ( u ) ≤ F c k f k ( u ) , (3)which we denote as the inner problem in the following with F c k f k ( u ) : = R ( u ) − D u , r ( f k ) E + l k (cid:16) S ( u ) − D u , s ( f k ) E (cid:17) − c k D u , g ( f k ) E and c k ≥
0. As the constraint set we can choose any set containing a neighborhood of0, such that the inner problem is bounded from below, i.e. any nonnegative convex p -homogeneous ( p ≥ ) function G . Although a slightly more general formulation ispossible, we choose the constraint set to be compact, i.e. G ( f ) = ⇔ f =
0. More-over, s ( f k ) ∈ ¶ S ( f k ) , r ( f k ) ∈ ¶ R ( f k ) , g ( f k ) ∈ ¶ G ( f k ) , where ¶ S , ¶ R , ¶ G are the subdifferentials. Note that for any p -homogeneous function A we havethe generalized Euler identity [12, Theorem 2.1] that is h f , a ( f ) i = p A ( f ) for all a ( f ) ∈ ¶ A ( f ) .Clearly F c k f k is also one-homogeneous and with the Euler identity we get F c k f k ( f k ) = − c k pG ( f k ) ≤ Algorithm 1 RatioDCA-prox – Minimization of a ratio of non-negative, one-homogeneous d.c. functions Initialization: f = random with G ( f ) = l = F ( f ) repeat
3: find s ( f k ) ∈ ¶ S ( f k ) , r ( f k ) ∈ ¶ R ( f k ) , g ( f k ) ∈ ¶ G ( f k )
4: find f k + ∈ argmin G ( u ) ≤ F c k f k ( u ) l k + = F ( f k + ) until f k + ∈ argmin G ( u ) ≤ F c k + f k + ( u ) The difference to the RatioDCA in [6] is the additional proximal term − c k (cid:10) u , g ( f k ) (cid:11) in F c k f k ( u ) and the choice of G . It is interesting to note that this term can be derivedby applying the RatioDCA to a different d.c. decomposition of F . Let us write F as F = R ′ − R ′ S ′ − S ′ = ( R + c R G ) − ( R + c R G )( S + c S G ) − ( S + c S G ) (4)with arbitrary c R , c S ≥
0. If we now define c k : = c R + l k c S , the function to be mini-mized in the inner problem of the RatioDCA reads alanced Graph Cuts and the RatioDCA-Prox 5 F ′ f k ( u ) = R ′ ( u ) − D u , r ′ ( f k ) E + l k (cid:16) S ′ ( u ) − D u , s ′ ( f k ) E(cid:17) = F c k f k ( u ) + c k G ( u ) , which is not necessarily one-homogeneous anymore. The following lemma impliesthat the minimizers of the inner problem of RatioDCA-prox and of RatioDCA ap-plied to the d.c.-decomposition (4) can be chosen to be the same. Lemma 1.
For G ( f k ) = we have arg min G ( u ) ≤ F ′ f k ( u ) ⊇ arg min G ( u ) ≤ F c k f k ( u ) . Moreover,1. if p > , c k > then arg min u F ′ f k ( u ) ⊇ n · arg min G ( u ) ≤ F c k f k ( u ) for some n ≥ ,2. if f k ∈ arg min G ( u ) ≤ F c k f k ( u ) then arg min u F ′ f k ( u ) ⊇ arg min G ( u ) ≤ F c k f k ( u ) .Proof. For fixed x ≥ F c k f k that any mini-mizer of arg min G ( u )= x F ′ f k ( u ) is a multiple of one f k + ∈ arg min G ( u ) ≤ F c k f k ( u ) , so let us look at n f k + with G ( f k + ) =
1. We get from the homogeneity of F c k f k and G for n > ¶¶n (cid:0) F ′ f k ( n f k + ) (cid:1) = F c k f k ( f k + ) + c k p n p − ≤ c k p ( n p − − ) , which is non-positive for n ∈ ( , ] and with F ′ f k ( ) = ≥ F ′ f k ( f k ) = c k ( − p ) it follows that a minimum is attained at n ≥
1. If p > , c k > F ′ f k exists and by the previous arguments is attained at multiples of f k + ∈ arg min G ( u ) ≤ F c k f k ( u ) . If f k ∈ arg min G ( u ) ≤ F c k f k ( u ) then also the global optimum of F ′ f k exists and the claim follows since n = F ′ f k ( n f k ) = − n c k p + n p c k . ⊓⊔ Note that G ( f k ) = F c k f k that G ( f k ) = k . The following lemma verifies the intuition that thestrength of the proximal term of RatioDCA-prox controls in some sense how nearsuccessive iterates are. Lemma 2.
Let f k + ∈ arg min G ( u ) ≤ F c k f k ( u ) , and f k + ∈ arg min G ( u ) ≤ F d k f k ( u ) .If c k ≤ d k then (cid:10) f k + , g ( f k ) (cid:11) ≤ (cid:10) f k + , g ( f k ) (cid:11) .Proof. This follows from F d k f k ( f k + ) ≤ F d k f k ( f k + ) = F c k f k ( f k + ) + ( c k − d k ) D f k + , g ( f k ) E ≤ F c k f k ( f k + ) + ( c k − d k ) D f k + , g ( f k ) E = F d k f k ( f k + ) + ( d k − c k ) D f k + , g ( f k ) E + ( c k − d k ) D f k + , g ( f k ) E . ⊓⊔ Leonardo Jost, Simon Setzer and Matthias Hein
Remark 1.
As all proofs can be split up into the individual steps we may choosedifferent functions G in every step of the algorithm. Moreover, it will not be neces-sary that f k + is an exact minimizer of the inner problem, but we will only use that F c k f k ( f k + ) < F c k f k ( f k ) . It is easy to see that we get for c k = G = k · k the RatioDCA [6] as a specialcase of the RatioDCA-prox. Moreover, Lemma 1 shows that the RatioDCA-proxcorresponds to the RatioDCA with a general constraint set for the d.c. decomposi-tion of the ratio F given in (4).If we apply RatioDCA-prox to the ratio cut problem, where ˆ S ( C ) = | C || C | , then R ( u ) = R ( u ) = (cid:229) ni , j = w i j | u i − u j | and [9] chose S ( u ) = S ( u ) = k u − mean ( u ) k .The following lemma shows that for a particular choice of G and c k , RatioDCA-proxand algorithm 1 of [9], which calculates iterates ˜ f k + for v k ∈ ¶ S ( f k ) by h k + = arg min u ( (cid:229) i , j w i j | u i − u j | + l k c k u − ( ˜ f k + cv k ) k ) , ˜ f k + = h k + / (cid:13)(cid:13)(cid:13) h k + (cid:13)(cid:13)(cid:13) , produce the same sequence if given the same initialization. Lemma 3.
If f = ˜ f , mean ( f ) = , c > and one uses the same subgradients ineach step then, for the sequence ˜ f k produced by algorithm 1 of [9] and f k producedby RatioDCA-prox with c k = l k c and G ( u ) = k u k , we have ˜ f k = f k for all k.Proof. If f k = ˜ f k and we choose v k : = s ( f k ) = s ( ˜ f k ) = ( I − n T ) sign ( f k − mean ( f k )) . For RatioDCA-prox we get f k + by f k + = arg min k u k ≤ F c k f k ( u ) and for the algorithm 1 of [9] h k + = arg min u (cid:26) R ( u ) + l k c (cid:16) k u k − D u , f k E − D u , cv k E(cid:17)(cid:27) = arg min u (cid:26) F c k f k ( u ) + l k c k u k (cid:27) Finally, ˜ f k + = h k + / (cid:13)(cid:13) h k + (cid:13)(cid:13) and application of Lemma 1 then shows that ˜ f k + = f k + . As k . k is strictly convex, the minimizers are unique. ⊓⊔ Analogously, the algorithm presented in [8] is a special case of RatioDCA-proxapplied to the ratio cheeger cut where R ( u ) = R ( u ) = (cid:229) i , j w i j | u i − u j | and S ( u ) = S ( u ) = (cid:229) i | u i − median ( u ) | . alanced Graph Cuts and the RatioDCA-Prox 7 In this section we show that the sequence F ( f k ) produced by RatioDCA-prox ismonotonically decreasing similar to the RatioDCA of [6] and, additionally, we canshow a convergence property, which generalizes the results of [8, 9]. Proposition 1.
For every nonnegative sequence c k any sequence f k produced byRatioDCA-prox satisfies F ( f k + ) < F ( f k ) for all k ≥ or the sequence terminates.Moreover, we get that c k (cid:10) f k + − f k , g ( f k ) (cid:11) → .Proof. If the sequence does not terminate then F c k f k ( f k + ) < F c k f k ( f k ) and it follows R ( f k + ) − l k S ( f k + ) − c k D f k + , g ( f k ) E ≤ F c k f k ( f k + ) < F c k f k ( f k ) = − c k D f k , g ( f k ) E , where we used that for any one-homogeneous convex function A we have for all f , g ∈ R V and all a ∈ ¶ A ( g ) A ( f ) ≥ A ( g ) + h f − g , a i = h f , a i . Adding c k (cid:10) f k + , g ( f k ) (cid:11) gives R ( f k + ) − l k S ( f k + ) < c k D f k + , g ( f k ) E − c k D f k , g ( f k ) E ≤ G is convex D f k + , g ( f k ) E − D f k , g ( f k ) E ≤ G ( f k + ) − G ( f k ) = . Dividing (5) by S ( f k + ) gives F ( f k + ) < F ( f k ) . As the sequence F ( f k ) is boundedfrom below and monotonically decreasing and thus converging and S ( f k + ) isbounded on the constraint set, we get the convergence result from l k + S ( f k + ) − l k S ( f k + ) ≤ c k D f k + − f k , g ( f k ) E ≤ . ⊓⊔ If we choose G ( u ) = k u k we get g ( f k ) = f k and if c k is bounded from below k f k + − f k k → G . Proposition 2.
If G is strictly convex and c k ≥ g > for all k, then any sequence f k produced by RatioDCA-prox fulfills k f k + − f k k → .Proof. As in the proof of Proposition 1, we have (cid:10) g ( f k ) , f k + − f k (cid:11) ≤ G ( f k + ) = G ( f k ) =
1. Suppose f k + ∈ G e : = { u | G ( u ) = , k u − f k k ≥ e } . If (cid:10) g ( f k ) , f k + − f k (cid:11) =
0, then the first order condition yields for 0 < t < Leonardo Jost, Simon Setzer and Matthias Hein G ( f k + t ( f k + − f k )) ≥ G ( f k ) + D g ( f k ) , t ( f k + − f k ) E = G ( f k ) = , which is a contradiction to the strict convexity of G as for 0 < t < G ( f k + t ( f k + − f k )) < ( − t ) G ( f k ) + tG ( f k + ) = . Thus with the compactness of G e we get D g ( f k ) , f k + − f k E ≤ max u ∈ G e D g ( f k ) , u − f k E = : d < . However, with c k ≥ g > k this contradicts for k large enough the result (cid:10) f k + − f k , g ( f k ) (cid:11) → k → ¥ of Proposition 1. Thus under the stated conditions k f k + − f k k → k → ¥ . ⊓⊔ While the previous result does not establish convergence of the sequence, it estab-lishes that the set of accumulation points has to be connected.As we are interested in minimizing the ratio F we want to find vectors f with S ( f ) = Lemma 4.
If S ( f ) = then every vector in the sequence f k produced by RatioDCA-prox fulfills S ( f k ) = .Proof. As R and S are one-homogeneous and G ( f k ) =
1, we have for any vector h with S ( h ) = G ( h ) = F c k f k ( h ) ≥ R ( h ) − R ( h ) + l k ( S ( h ) − S ( h )) − c k D h , g ( f k ) E ≥ R ( h ) − c k D f k , g ( f k ) E ≥ − c k D f k , g ( f k ) E = F c k f k ( f k ) where we have used that (cid:10) g ( f k ) , h (cid:11) ≤ G ( h ) − G ( f k ) + (cid:10) f k , g ( f k ) (cid:11) = (cid:10) f k , g ( f k ) (cid:11) .Further, if f k is a minimizer then the algorithm terminates. ⊓⊔ While the iterates f k and thus the final result of RatioDCA and RatioDCA-prox dif-fer in general, the following lemma shows that termination of RatioDCA impliestermination of RatioDCA-prox and under some conditions also the reverse implica-tion holds true. Thus switching from RatioDCA to RatioDCA-prox at terminationdoes not allow to get further descent. Lemma 5.
Let f k , k f k k = , f k = f k G ( f k ) p , c k ≥ , s ( f k ) = s ( f k ) , r ( f k ) = r ( f k ) as in the algorithm RatioDCA-prox and W = arg min G ( u ) ≤ F c k f k ( u ) , and W = arg min k u k ≤ F f k ( u ) . alanced Graph Cuts and the RatioDCA-Prox 9 Then the following implications hold:1. If f k ∈ W then f k ∈ W .2. If f k ∈ W and either ¶ G ( f k ) = { g ( f k ) } or c k = then f k ∈ W .Proof. If f k ∈ W then F f k ( f k ) =
0. As F f k is one-homogeneous, f k is also a globalminimizer and thus for all u ∈ R V with G ( u ) ≤ F c k f k ( u ) = F f k ( u ) − c k (cid:10) g ( f k ) , u (cid:11) ≥− c k (cid:10) g ( f k ) , u (cid:11) ≥ − c k p . As (cid:10) g ( f k ) , f k (cid:11) = p , f k is minimizer which proves the firstpart.On the other hand if f k ∈ arg min G ( u ) ≤ F c k f k ( u ) , then by Lemma 1 also f k ∈ arg min u n F c k f k ( u ) + c k G ( u ) o . f k being a global minimizer implies0 ∈ ¶ (cid:16) F c k f k + c k G (cid:17) ( f k ) = ¶F f k ( f k ) − c k g ( f k ) + c k ¶ G ( f k ) = ¶F f k ( f k ) , where we used that by assumption c k ( g ( f k ) − ¶ G ( f k )) =
0. Thus f k is also a mini-mizer of F f k and the result follows with F f k ( f k ) = F f k (cid:0) f k (cid:1) = F f k = F f k . ⊓⊔ The sequence F ( f k ) is not only monotonically decreasing but we also show nowthat the sequence f k converges to a generalized nonlinear eigenvector as introducedin [5]. Theorem 2.
Each cluster point f ∗ of the sequence f k produced by RatioDCA-proxfulfills for a c ∗ and with l ∗ = R ( f ∗ ) S ( f ∗ ) ∈ (cid:2) , F ( f ) (cid:3) ∈ ¶ (cid:0) R ( f ∗ ) + c ∗ G ( f ∗ ) (cid:1) − ¶ (cid:0) R ( f ∗ ) + c ∗ G ( f ∗ ) (cid:1) − l ∗ (cid:0) ¶ S ( f ∗ ) − ¶ S ( f ∗ ) (cid:1) . If for every f with G ( f ) = the subdifferential ¶ G ( f ) is unique or c k = for all k,then f ∗ is an eigenvector with eigenvalue l ∗ in the sense that it fulfills ∈ ¶ R ( f ∗ ) − ¶ R ( f ∗ ) − l ∗ (cid:0) ¶ S ( f ∗ ) − ¶ S ( f ∗ ) (cid:1) . (6) Proof.
By Proposition 1 the sequence F ( f k ) is monotonically decreasing. By as-sumption S = S − S and R = R − R are nonnegative and hence F is boundedbelow by zero. Thus we have convergence towards a limit l ∗ = lim k → ¥ F ( f k ) . Note that f k is contained in a compact set, which implies that there exists a subse-quence f k j converging to some element f ∗ . As the sequence F ( f k j ) is a subsequenceof a convergent sequence, it has to converge towards the same limit, hence alsolim j → ¥ F ( f k j ) = l ∗ . Assume now that for all c min G ( u ) ≤ F cf ∗ ( u ) < F cf ∗ ( f ∗ ) . Then by Proposition 1, anyvector f ( c ) ∈ arg min G ( u ) ≤ F cf ∗ ( u ) satisfies F ( f ( c ) ) < l ∗ = F ( f ∗ ) , which is a contradiction to the fact that the sequence F ( f k ) has converged to l ∗ .Thus there exists c ∗ such that f ∗ ∈ arg min G ( u ) ≤ n F c ∗ f ∗ ( u ) o and by Lemma 1 then f ∗ ∈ arg min u n F c ∗ f ∗ ( u ) + c ∗ G ( u ) o and we get0 ∈ ¶ R ( f ∗ ) − r ( f ∗ ) + l ∗ ( ¶ S ( f ∗ ) − s ( f ∗ )) − c ∗ g ( f ∗ ) + c ∗ ¶ G ( f ∗ ) . If c k = k then we only need to look at c ∗ =
0. In this case or if we get from G ( f ∗ ) = ¶ G ( f ∗ ) = { g ( f ∗ ) } it follows that0 ∈ ¶ R ( f ∗ ) − r ( f ∗ ) + l ∗ ( ¶ S ( f ∗ ) − s ( f ∗ )) which then implies that f ∗ is an eigenvector of F with eigenvalue l ∗ . ⊓⊔ Remark 2. (6) is a necessary condition for f ∗ being a critical point of F . If R , S arecontinuously differentiable at f ∗ , it is also sufficient. The necessity of (6) followsfrom [13, Proposition 2.3.14]. If R , S are continuously differentiable at f ∗ thenwe get from [13, Propositions 2.3.6 and 2.3.14] that 0 ∈ ¶ F ( f ∗ ) and f ∗ is a criticalpoint of F . A large class of combinatorial problems [6, 11] allows for an exact continuous re-laxation which results in a minimization problem of a non-negative ratio of Lovaszextensions as introduced in Section 1. In this paper, we restrict ourselves to balancedgraph cuts even though most statements can be immediately generalized to the classof problems considered in [11]. alanced Graph Cuts and the RatioDCA-Prox 11
We first collect some important properties of Lovasz extensions before we provestronger results for the RatioDCA-prox when applied to minimize a non-negativeratio of Lovasz extensions.
The following lemma is a reformulation of [10, Proposition 4.2(c)] for our purposes:
Lemma 6.
Let ˆ S be a submodular function with ˆ S ( /0 ) = ˆ S ( V ) = . If S is the Lovaszextension of ˆ S then h ¶ S ( f ) , C i i = S ( C i ) = ˆ S ( C i ) for all sets C i = { j ∈ V | f j > f i } .Proof. Let wlog f be in increasing order f ≤ f ≤ · · · ≤ f n . With f = (cid:229) n − i = C i ( f i + − f i ) + V · f we get n (cid:229) i = ˆ S ( C i )( f i + − f i ) = S ( f ) = h ¶ S ( f ) , f i = n − (cid:229) i = h ¶ S ( f ) , C i i ( f i + − f i ) . Since ˆ S is submodular S is convex and thus h ¶ S ( f ) , C i i ≤ S ( C i ) = ˆ S ( C i ) , but be-cause f i + − f i ≥ ⊓⊔ More generally this also holds if ˆ S is not submodular: Lemma 7.
Let ˆ S be a set function with ˆ S ( /0 ) = ˆ S ( V ) = . If S is the Lovasz extensionof ˆ S then h ¶ S ( f ) , C i i = ˆ S ( C i ) for all sets C i = { j ∈ V | f j > f i } .Proof. ˆ S can be written as the difference of two submodular set functions ˆ S = ˆ S − ˆ S and the Lovasz extension S of ˆ S is the difference of the corresponding Lovaszextensions S and S . We get ¶ S ( f ) ⊆ ¶ S ( f ) − ¶ S ( f ) [13, Propositions 2.3.1 and2.3.3] and both S and S fulfill the conditions of Lemma 6. Thus h ¶ S ( f ) , C i i ⊆ h ¶ S ( f ) − ¶ S ( f ) , C i i = h ¶ S ( f ) , C i i − h ¶ S ( f ) , C i i = S ( C i ) − S ( C i ) = S ( C i ) and the claim follows since ¶ S ( f ) is nonempty [13, Proposition 2.1.2]. ⊓⊔ Also Lovasz extensions are maximal in the considered class of functions:
Lemma 8.
Let ˆ S be a symmetric set function with ˆ S ( /0 ) = , S L its Lovasz exten-sion and S any extension fulfilling the properties of Theorem 1, that is S is one-homogeneous, even, convex and S ( f + a ) = S ( f ) for all f ∈ R V , a ∈ R and ˆ S ( A ) : = S ( A ) for all A ⊂ V . Then S L ( f ) ≥ S ( f ) for all f ∈ R V . Proof.
By Lemma 7 and using the convexity and one-homogeneity of S we get S L ( f ) = n − (cid:229) i = ˆ S ( C i )( f i + − f i ) = n − (cid:229) i = S ( C i )( f i + − f i ) ≥ n − (cid:229) i = h ¶ S ( f ) , C i i ( f i + − f i ) = h ¶ S ( f ) , f i = S ( f ) ⊓⊔ Remark 3.
By [6, Lemma 3.1] any function S fulfilling the properties of the lemmacan be rewritten by S ( f ) = sup u ∈ U h u , f i where U ⊂ R n is a closed symmetric con-vex set and h u , i = u ∈ U . The previous lemma implies that for a givenset function ˆ S ( C ) the set U is maximal for the Lovasz extension S L . In turn thisimplies that the subdifferential of S L is maximal everywhere and thus should beused in the RatioDCA-prox. In [6, 9] the authors use for the balancing functionˆ S ( C ) = | C || C | instead of the Lovasz extension S L ( f ) = (cid:229) ni , j = | f i − f j | the convexfunction S ( f ) = k f − mean f k which fulfills the properties of the previous lemma.In Section 5 we show that using the Lovasz extension leads almost always to betterbalanced graph cuts. Applied to balanced graph cuts we can show the following “improvement theorem”generalizing the result of [6] for our algorithm. It implies that we can use the resultof any other graph partitioning method as initialization and in particular, we canalways improve the result of spectral clustering.
Theorem 3.
Let ( A , A ) be a given partition of V and let S : V → R + satisfy one ofthe conditions stated in Theorem 1. If one uses as initialization of RatioDCA-proxf = A , then either the algorithm terminates after one step or it yields an f whichafter optimal thresholding as in Theorem 1 gives a partition ( B , B ) which satisifies cut ( B , B ) ˆ S ( B ) < cut ( A , A ) ˆ S ( A ) . Proof.
This follows in the same way from Proposition 1 as in [6, Theorem 4.2]. ⊓⊔ In the case that we have Lovasz extensions we can show that accumulation pointsare directly related to the optimal sets:
Theorem 4.
If R and S are Lovasz-extensions of the corresponding set functionsthen every accumulation point f ∗ of RatioDCA-prox with c k = fulfills F ( f ∗ ) = F ( C ∗ ) where C ∗ is the set we get from optimal thresholding of f ∗ . If also R and S are the Lovasz-extensions then f ∗ = (cid:229) mi = a i C i + b V with a i > , C i = { j ∈ V | f ∗ j > f ∗ i } , b ∈ R , and alanced Graph Cuts and the RatioDCA-Prox 13 ˆ R ( C i ) ˆ S ( C i ) = l ∗ = R ( f ∗ ) S ( f ∗ ) , i = , . . . , m . If l ∗ is only attained for one set C ∗ then f ∗ = C ∗ is the only accumulation point.Proof. In the proof of Theorem 2 it has been shown that from f ∗ no further descentis possible. Assume F ( f ∗ ) > F ( C ∗ ) . Then F f ∗ ( C ∗ ) = R ( C ∗ ) − h r ( f ∗ ) , C ∗ i + l ∗ ( S ( C ∗ ) − h s ( f ∗ ) , C ∗ i = R ( C ∗ ) − l ∗ S ( C ∗ ) < R ( C ∗ ) − F ( C ∗ ) S ( C ∗ ) = = F f ∗ ( f ∗ ) which leads to a contradiction. Thus the first claim follows from Theorem 1. If also R and S are the Lovasz-extensions then for f ∗ = (cid:229) n − i = a i C i + V · min j f ∗ j we getby Lemma 6 and the definition of the Lovasz extension that0 = F f ∗ ( f ∗ ) = n (cid:229) i = a i F f ∗ ( C i ) and if for one a i > ˆ R ( C i ) ˆ S ( C i ) > l ∗ then F f ∗ ( C i ) > F f ∗ ( C ∗ ) < = F f ∗ ( f ∗ ) which again is a contradiction. ⊓⊔ Remark 4.
By Lemma 5 this also holds for c k > G is differentiable at the bound-ary.If we have Lovasz extensions we can also use the reduced version of the RatioDCA-prox with c k = c k ≥ g > k by Proposition 2 but an even stronger property such as finite convergence can onlybe proven when c k = Theorem 5.
Let c k = and S , R be Lovasz extensions in the RatioDCA-prox.Further, let C ∗ k be the set obtained by optimal thresholding of f k . If in step 5 ofRatioDCA-prox we choose, l k = F ( C ∗ k ) , and in step 4 choose f k + = ∗ = C ∗ k G ( C ∗ k ) p if ∗ ∈ arg min G ( u ) ≤ F c k f k , then the RatioDCA-prox terminates in finitely many steps.Proof. With c k = R , S are convex and one-homogeneous,we get R ( f k + ) − F ( C ∗ k ) S ( f k + ) ≤ F c k f k ( f k + ) ≤ F c k f k ( ∗ ) = R ( ∗ ) − D r ( f k ) , ∗ E + F ( C ∗ k ) (cid:0) S ( ∗ ) − D s ( f k ) , ∗ E (cid:1) = R ( ∗ ) − F ( C ∗ k ) S ( ∗ ) = F ( C ∗ k + ) ≤ F ( f k + ) ≤ F ( C ∗ k ) and equality in the second inequality onlyholds if f k + = ∗ , but then in the next step we either get strict improvement or the sequence terminates. As there are only finitely many different cuts, RatioDCA-proxhas to terminate in finitely many steps. ⊓⊔ The convex inner problem in Equation (3) is solved using the primal dual hybridgradient method (PDHG) as in [6]. In the first iterations the problem is not solved tohigh accuracy as all results in this paper only rely on the fact that either the algorithmterminates or f c k f k ( f k + ) < f c k f k ( f k ) . First, we study the influence of different values of c k in the RatioDCA-prox algo-rithm. We choose G = k · k and choose different values for c k .We compare the algorithms on the wing graph from [14] (62032 vertices, 243088edges) and a graph built from the two-moons dataset (2000 vertices, 33466 edges)as described in [3]. Table 1
Displayed are the averages of all, the 10 best and the best cuts for different values of c k = c l k on wing (top) and two-moons (bottom).graph \ c In Table 1 we have plotted the resulting ratio cheeger cuts (RCC) of ten differentchoices of c k = c · l k for RatioDCA-prox. In all cases we use one initializationwith the second eigenvector of the standard graph Laplacian and 99 initializationswith random vectors, which are the same for all algorithms. As one is interested inthe best result and how often this can be achieved, we report the best, average andtop10 performance. For both graphs there is no clear trend that a particular choiceof the proximal term improves or worsens the results compared to c k = c k = alanced Graph Cuts and the RatioDCA-Prox 15 In previous work [6, 9] on the ratio cut with the balancing function ˆ S ( C ) = | C || C | notthe Lovasz extension S L ( f ) = (cid:229) ni , j = | f i − f j | has been used but the function S ( f ) = k f − mean f k . As discussed in Section 4, this should lead to worse performancein the algorithm as the subdifferential of S L is maximal. In Table 2 we compare bothextensions with the RatioDCA-prox with c k = G ( u ) = k u k on seven differentgraphs [14]. One initialization is done with the second eigenvector of the standardgraph laplacian and the same 10 random initializations are used for both extensions. Table 2
For each graph it is shown how many times for the 11 initializations the RatioDCA-prox with the Lovasz extension performs better/equal/worse than the previously used continuousextension and the ratio of the best solutions of Lovasz vs continuous extension is shown ( < While the differences in the best found cut are minor, using the Lovasz extensionfor the balancing function leads consistently to better results.
References
1. U. von Luxburg,
A Tutorial on Spectral Clustering , Statistics and Computing , 395 (2007)2. S. Guattery, G.L. Miller, On the quality of spectral separators , SIAM Journal on Matrix Anal-ysis and Applications , 701 (1998)3. T. B¨uhler, M. Hein, Spectral clustering based on the graph p-Laplacian, in Proceedings of the26th International Conference on Machine Learning (ICML) (2009), pp. 81–884. A. Szlam, X. Bresson, Total variation and Cheeger cuts, in
Proceedings of the 27th Interna-tional Conference on Machine Learning (ICML) (2010), pp. 1039–10465. M. Hein, T. B¨uhler, An inverse power method for nonlinear eigenproblems with applications in1-spectral clustering and sparse PCA, in
Advances in Neural Information Processing Systems23 (NIPS) (2010), pp. 847–8556. M. Hein, S. Setzer, Beyond spectral clustering - tight relaxations of balanced graph cuts, in
Advances in Neural Information Processing Systems 24 (NIPS) (2011), pp. 2366–23747. M. Hein, S. Setzer, L. Jost, S. Rangapuram, The total variation on hypergraphs - learningon hypergraphs revisited, in
Advances in Neural Information Processing Systems 26 (NIPS) (2013), pp. 2427–24358. X. Bresson, T. Laurent, D. Uminsky, J.H. von Brecht, Convergence and energy landscapefor cheeger cut clustering, in
Advances in Neural Information Processing Systems 25 (NIPS) (2012), pp. 1394–14029. X. Bresson, T. Laurent, D. Uminsky, J.H. von Brecht. Convergence of a steepest descentalgorithm for ratio cut clustering (2012). ArXiv:1204.6545v110. F. Bach,
Learning with Submodular Functions: A Convex Optimization Perspective , Founda-tions and Trends in Machine Learning (2008)6 Leonardo Jost, Simon Setzer and Matthias Hein11. T. B¨uhler, S. Rangapuram, S. Setzer, M. Hein, Constrained fractional set programs and theirapplication in local clustering and community detection, in Proceedings of the 30th Interna-tional Conference on Machine Learning (ICML) (2013), pp. 624–63212. F. Yang, Z. Wei,
Generalized Euler identity for subdifferentials of homogeneous functions andapplications , Journal of Mathematical Analysis and Applications , 516 (2008)13. F. Clarke,
Optimization and Nonsmooth Analysis (Wiley New York, 1983)14. C. Walshaw. The graph partitioning archive (2004).Staffweb.cms.gre.ac.uk/ ∼∼