Continuous-Domain Assignment Flows
11 Continuous-Domain Assignment Flows
F. SAVARINO and C. SCHN ¨ORR
Image and Pattern Analysis Group, Heidelberg University, Heidelberg, Germanyemail : [email protected], [email protected] URL : https://ipa.math.uni-heidelberg.de Assignment flows denote a class of dynamical models for contextual data labeling (classification) ongraphs. We derive a novel parametrization of assignment flows that reveals how the underlying in-formation geometry induces two processes for assignment regularization and for gradually enforcingunambiguous decisions, respectively, that seamlessly interact when solving for the flow. Our resultenables to characterize the dominant part of the assignment flow as a Riemannian gradient flow withrespect to the underlying information geometry. We consider a continuous-domain formulation of thecorresponding potential and develop a novel algorithm in terms of solving a sequence of linear ellip-tic PDEs subject to a simple convex constraint. Our result provides a basis for addressing learningproblems by controlling such PDEs in future work.
Key Words: image labeling, image segmentation, information geometry, replicator equation, evolutionarydynamics, assignment flow.
Contents1. Introduction
2. Preliminaries
3. A Novel Representation of the Assignment Flow
4. Continuous-Domain Variational Approach
5. Conclusion References a r X i v : . [ m a t h . D S ] O c t F. Savarino and C. Schn¨orr
Deep networks are omnipresent in many disciplines due to their unprecedented predictive powerand the availability of software for training that is easy to use. However, this rapid developmentduring recent years has not improved our mathematical understanding in the same way, so far[
Ela17 ]. The ‘black box’ behaviour of deep networks and systematic failures [
ARP + ], thelack of performance guarantees and reproducibility of results, raises doubts if a purely data-driven approach can deliver the high expectations of some of its most passionate proponents[ CDH16 ]. ‘Mathematics of deep networks’, therefore, has become a focal point of research.Initiated maybe by [
HZRS16 ] and mathematically substantiated and promoted by, e.g. [
HR17,E17 ], attempts to understand deep network architectures as discretized realizations of dynamicalsystems has become a fruitful line of research. Adopting this viewpoint, we introduced a dy-namical system – called assignent flow – for contextual data classification and image labelingon graphs [ ˚APSS17 ]. We refer to [
Sch19 ] for a review of recent work including parameter es-timation (learning) [
HSPS19 ], adaption of data prototypes during assignment [
ZZPS19a ], andlearning prototypes from low-rank data representations and self-assignment [
ZZPS19b ].Two key properties of the assignment flow are smoothness and gradual enforcement of un-ambigious classification in a single process, solely induced by adopting an elementary statisticalmanifold as state space that is natural for classification tasks, and the corresponding informa-tion geometry [
AN00 ]. This differs from traditional variational approaches to image labeling[
LS11, CCP12 ] that enjoy convexity but are inherently nonsmooth and require postprocessingto achieve unambigous decisions. We regard nonsmoothness as a major barrier to the design ofhierarchical architectures for data classification.The assignment flow combines by composition (rather than by addition) separate local pro-cesses at each vertex of the underlying graph and nonlocal regularization. Each local processfor label assignment is governed by an ODE, the replicator equation [ HS03, San10 ], whereasregularization is accomplished by nonlocal geometric averaging of the evolving assignments. Itis well known [
HS03 ] that if the affinity measure which defines the replicator equation and hencegoverns label selection can be derived as gradient of a potential, then the replicator equation isjust the corresponding Riemannian gradient flow induced by the Fisher-Rao metric. The geomet-ric regularization of assignments performed by the assignment flow yields an affinity measurefor which the (non-)existence of a corresponding potential is not immediate, however.
Contribution and Organization.
The objective of this paper is to clarify this situation. Aftercollecting background material in Section 2, we prove that no potential exists that enables tocharacterize the assignment flow as Riemannian gradient flow (Section 3.1). Next, we provide anovel parametrization of the assignment flow by separating a dominant component of the flow,called S -flow , that completely determines the remaining component and hence essentially char-acterizes the assignment flow (Section 3.2). The S -flow does correspond to a potential, under anadditional symmetry assumption with respect to the weights that parametrize the regularizationproperties of the assignment flow through (weighted) geometric averaging. This potential canbe decomposed into two components that make explicit the two interacting processes mentionedabove: regularization of label assignments and gradually enforcing unambigous decisions. Wepoint out again that this is a direct consequence of the ‘spherical geometry’ (positive curvature)underlying the assignment flow. ontinuous-Domain Assignment Flow continuous-domain variational formula-tion in Section 4. We prove well-posedness which is not immediate due to nonconvexity, andwe propose an algorithm that computes a locally optimal assignment by solving a sequence ofsimple linear PDEs, with changing right-hand side and subject to a simple convex constraint. Anumerical example demonstrates that our PDE-based approach reproduces results obtained withsolving the original formulation of the assignment flow using completely different numericaltechniques [
ZSPS19 ]. We hope that the simplicity of our PDE-approach and the direct connec-tion to a smooth geometric setting will stimulate future work on learning, from an optimal controlpoint-of-view [
EHL19, LT19 ]. We conclude by a formal derivation of a PDE that characterizesglobal minimizers of the nonconvex objective function (Section 4.4) and by outlining future re-search in Section 5.
We denote the standard basis of R n by B n := { e , . . . , e n } . (2.1) | · | applied to a finite set denotes its cardinality, i.e. |B n | = n . We set [ n ] = { , , . . . , n } for n ∈ N and n = (1 , , . . . , (cid:62) ∈ R n . The symbols I = [ n ] , J = [ c ] , n, c ∈ N (2.2)will specifically index data points and classes (labels), respectively. (cid:107) · (cid:107) denotes the Euclideanvector norm and the Frobenius matrix norm induced by the inner product (cid:107) A (cid:107) = (cid:104) A, A (cid:105) / =tr( A (cid:62) A ) / . All other norms will be indicated by a corresponding subscript. For a given matrix A ∈ R n × c , A i , i ∈ [ n ] denote the row vectors, A j , j ∈ [ c ] denote the column vectors, and A (cid:62) ∈ R c × n the transpose matrix. S n + denotes the set of all symmetric n × n matrices withnonnegative entries. ∆ n = { p ∈ R n + : (cid:104) n , p (cid:105) = 1 } (2.3)denotes the probability simplex. There will be no danger to confuse it with the Laplacian differ-ential operator ∆ that we use without subscript. For strictly positive vectors p > , we efficientlydenote componentwise subdivision by vp . Likewise, we set pv = ( p v , . . . , p n v n ) (cid:62) . The ex-ponential function applies componentwise to vectors (and similarly for log ) and will always bedenoted by e v = ( e v , . . . , e v n ) (cid:62) , in order not to confuse it with the exponential maps (2.18).Strong and weak convergence of a sequence ( f n ) is written as f n → f and f n (cid:42) f , respec-tively. ψ S denotes the indicator function of some set S : ψ S ( i ) = 1 if i ∈ S and ψ S ( i ) = 0 otherwise. δ C denotes the indicator function from the optimization point of view: δ C ( f ) = 0 if f ∈ C and δ C ( f ) = + ∞ otherwise. We sketch the assignment flow as introduced by [ ˚APSS17 ] and refer to the recent survey [
Sch19 ]for more background and a review of recent related work.
F. Savarino and C. Schn¨orr
Assignment Manifold
Let ( F , d F ) be a metric space and F n = { f i ∈ F : i ∈ I} , |I| = n. (2.4)given data. Assume that a predefined set of prototypes F ∗ = { f ∗ j ∈ F : j ∈ J } , |J | = c. (2.5)is given. Data labeling denotes the assignments j → i, f ∗ j → f i (2.6)of a single prototype f ∗ j ∈ F ∗ to each data point f i ∈ F n . The set I is assumed to form the vertexset of an undirected graph G = ( I , E ) which defines a relation E ⊂ I × I and neighborhoods N i = { k ∈ I : ik ∈ E} ∪ { i } , (2.7)where ik is a shorthand for the unordered pair (edge) ( i, k ) = ( k, i ) . We require these neighbor-hoods to satisfy the relations k ∈ N i ⇔ i ∈ N k , ∀ i, k ∈ I . (2.8)The assignments (labeling) (2.6) are represented by matrices in the set W ∗ = { W ∈ { , } n × c : W c = n } (2.9)with unit vectors W i , i ∈ I , called assignment vectors , as row vectors. These assignment vectorsare computed by numerically integrating the assignment flow below (2.28), in the followingelementary geometric setting. The integrality constraint of (2.9) is relaxed and vectors W i = ( W i , . . . , W ic ) (cid:62) ∈ S , i ∈ I , (2.10)that we still call assignment vectors , are considered on the elementary Riemannian manifold ( S , g ) , S = { p ∈ ∆ c : p > } (2.11)with S = 1 c c ∈ S , ( barycenter ) (2.12)tangent space T = { v ∈ R c : (cid:104) c , v (cid:105) = 0 } (2.13)and tangent bundle T S = S × T , orthogonal projection Π : R c → T , Π = I − S (cid:62) (2.14)and the Fisher-Rao metric g p ( u, v ) = (cid:88) j ∈J u j v j p j , p ∈ S , u, v ∈ T . (2.15)Based on the linear map R p : R c → T , R p = Diag( p ) − pp (cid:62) , p ∈ S (2.16) ontinuous-Domain Assignment Flow R p = R p Π = Π R p , (2.17)exponential maps and their inverses are defined as Exp :
S × T → S , ( p, v ) (cid:55)→ Exp p ( v ) = pe vp (cid:104) p, e vp (cid:105) , (2.18 a ) Exp − p : S → T , q (cid:55)→ Exp − p ( q ) = R p log qp , (2.18 b ) exp p : T → S , exp p = Exp p ◦ R p , (2.18 c ) exp − p : S → T , exp − p ( q ) = Π log qp . (2.18 d )Applying the map exp p to a vector in R c = T ⊕ R1 does not depend on the constant componentof the argument, due to (2.17). Remark 2.1
The map
Exp corresponds to the e-connection of information geometry [
AN00 ],rather than to the exponential map of the Riemannian connection. Accordingly, the affine geodesics (2.18 a ) are not length-minimizing. But they provide an close approximation [ ˚APSS17, Prop. 3 ]and are more convenient for numerical computations. The assignment manifold is defined as ( W , g ) , W = S × · · · × S . ( n = |I| factors ) (2.19)We identify W with the embedding into R n × c W = { W ∈ R n × c : W c = n and W ij > for all i ∈ [ n ] , j ∈ [ c ] } . (2.20)Thus, points W ∈ W are row-stochastic matrices W ∈ R n × c with row vectors W i ∈ S , i ∈ I that represent the assignments (2.6) for every i ∈ I . We set T := T × · · · × T ( n = |I| factors ) . (2.21)Due to (2.20), the tangent space T can be identified with T = { V ∈ R n × c : V c = 0 } . (2.22)Thus, V i ∈ T for all row vectors of V ∈ R n × c and i ∈ I . All mappings defined above factorizein a natural way and apply row-wise, e.g. Exp W = (Exp W , . . . , Exp W n ) etc. Assignment Flow
Based on (2.4) and (2.5), the distance vector field D F ; i = (cid:0) d F ( f i , f ∗ ) , . . . , d F ( f i , f ∗ c ) (cid:1) (cid:62) , i ∈ I (2.23)is well-defined. These vectors are collected as row vectors of the distance matrix D F ∈ S n + . (2.24) F. Savarino and C. Schn¨orr
The likelihood map and the likelihood vectors , respectively, are defined as L i : S → S , L i ( W i ) = exp W i (cid:16) − ρ D F ; i (cid:17) = W i e − ρ D F ; i (cid:104) W i , e − ρ D F ; i (cid:105) , i ∈ I , (2.25)where the scaling parameter ρ > is used for normalizing the a-prior unknown scale of thecomponents of D F ; i that depends on the specific application at hand.A key component of the assignment flow is the interaction of the likelihood vectors through geometric averaging within the local neighborhoods (2.7). Specifically, using weights ω ik > for all k ∈ N i , i ∈ I with (cid:88) k ∈N i w ik = 1 , (2.26)the similarity map and the similarity vectors , respectively, are defined as S i : W → S , S i ( W ) = Exp W i (cid:16) (cid:88) k ∈N i w ik Exp − W i (cid:0) L k ( W k ) (cid:1)(cid:17) , i ∈ I . (2.27)If Exp W i were the exponential map of the Riemannian (Levi-Civita) connection, then the argu-ment inside the brackets of the right-hand side would just be the negative Riemannian gradientwith respect to W i of the center of mass objective function comprising the points L k , k ∈ N i ,i.e. the weighted sum of the squared Riemannian distances between W i and L k [ Jos17, Lemma6.9.4 ]. In view of Remark 2.1, this interpretation is only approximately true mathematically, butstill correct informally: S i ( W ) moves W i towards the geometric mean of the likelihood vectors L k , k ∈ N i . Since Exp W i (0) = W i , this mean is equal to W i if the aforementioned gradientvanishes.The assignment flow is induced by the locally coupled system of nonlinear ODEs ˙ W = R W S ( W ) , W (0) = W , (2.28 a ) ˙ W i = R W i S i ( W ) , W i (0) = S , i ∈ I , (2.28 b )where W ∈ W denotes the barycenter of the assignment manifold (2.19). The solution curve W ( t ) ∈ W is numerically computed by geometric integration [ ZSPS19 ] and determines a label-ing W ( T ) ∈ W ∗ for sufficiently large T after a trivial rounding operation. We record background material that will be used in Section 4.
Sobolev Spaces
We list few basic definitions and fix the corresponding notation [
Zie89, ABM14 ]. Throughoutthis section Ω ⊂ R d denotes an open bounded domain.We denote the inner product and the norm of functions f, g ∈ L (Ω) by ( f, g ) Ω = (cid:90) Ω f gdx, (cid:107) f (cid:107) Ω = ( f, f ) / . (2.29)Functions f and f are equivalent and identified whenever they merely differ pointwise on aLebesque-negligible set of measure zero. f and f then are said to be equal a.e. (almost ev-erywhere). H (Ω) = W , (Ω) denotes the Sobolev space of functions f with square-integrable ontinuous-Domain Assignment Flow D α f up to order one. H (Ω) is a Hilbert space with inner product and normdenoted by ( f, g ) = (cid:88) | α |≤ ( D α f, D α g ) Ω , (cid:107) f (cid:107) = (cid:16) (cid:88) | α |≤ (cid:107) D α f (cid:107) (cid:17) / . (2.30) Lemma 2.2 ([Zie89, Cor. 2.1.9]) If Ω is connected, u ∈ H (Ω) and Du = 0 a.e. on Ω , then u is equivalent to a constant function on Ω . The closure in H (Ω) of the set of test functions C ∞ c (Ω) that are compactly supported on Ω , isthe Sobolev space H (Ω) = C ∞ c (Ω) ⊂ H (Ω) . (2.31)It contains all functions in H (Ω) whose boundary values on ∂ Ω (in the sense of traces) vanish.The space H (Ω; R c ) , ≤ c ∈ N contains vector-valued functions f whose component func-tions f i , i ∈ [ c ] are in H (Ω) . For notational efficiency, we denote the norm of f ∈ H (Ω; R c ) by (cid:107) f (cid:107) = (cid:16) (cid:88) i ∈ [ c ] (cid:107) f i (cid:107) (cid:17) / (2.32)as in the scalar case (2.30). It will be clear from the context if f is scalar- or vector-valued.The compactness theorem of Rellich-Kondrakov [ ABM14, Thm. 5.3.3 ] says that the canon-ical embedding H (Ω) (cid:44) → L (Ω) (2.33)is compact, i.e. every bounded subset in H (Ω) is relatively compact in L (Ω) . This extends tothe vector-valued case H (Ω; R c ) (cid:44) → L (Ω; R c ) (2.34)since H (Ω; R c ) is isomorphic to H (Ω) × · · · × H (Ω) and likewise for L (Ω; R c ) . Thedual space of H (Ω) is commonly denoted by H − (Ω) = (cid:0) H (Ω) (cid:1) . Accordingly, we set H − (Ω; R c ) = (cid:0) H (Ω; R c ) (cid:1) (cid:48) . Weak Convergence Properties, Variational Inequalities
We list few further basic facts [
Zei85, Prop. 38.2 ], [
ABM14, Prop. 2.4.6 ]. Proposition 2.3
The following assertions hold in a Banach space X .(a) A closed convex subset C ⊂ X is weakly closed, i.e. a sequence ( f n ) n ∈ N ⊂ C that weaklyconverges to f implies f ∈ C .(b) If X is reflexive (in particular, if X is a Hilbert space), then every bounded sequence in X has a weakly convergent subsequence.(c) If f n weakly converges to f , then ( f n ) n ∈ N is bounded and (cid:107) f (cid:107) X ≤ lim inf n →∞ (cid:107) f n (cid:107) X . (2.35) F. Savarino and C. Schn¨orr
The following theorem states conditions for minimizers of the functional to satisfy a correspond-ing variational inequality.
Theorem 2.4 ([Zei85, Thm. 46.A(a)])
Let F : C → R be a functional on the convex nonemptyset C of a real locally convex space X , and let b ∈ X (cid:48) be a given element. Suppose the Gateaux-derivative F (cid:48) exists on C . Then any solution f of min f ∈ C (cid:8) F ( f ) − (cid:104) b, f (cid:105) X (cid:48) × X (cid:9) , (2.36) satisfies the variational inequality (cid:104) F (cid:48) ( f ) − b, h − f (cid:105) X (cid:48) × X ≥ , for all h ∈ C. (2.37) Let J : W → R be a smooth function on the assignment manifold (2.19) and denote the Rie-mannian gradient of J at W ∈ W induced by the Fisher-Rao metric (2.15) by grad J ( W ) ∈ T .In view of the embedding (2.20), we can also compute the Euclidean gradient of J denoted by ∂ J ( W ) ∈ R n × c . These two gradients are related by [ ˚APSS17, Prop. 1 ] grad J ( W ) = R W ∂ J ( W ) , W ∈ W , (3.1)where R W : R n × c → T is the product map obtained by applying R W i from (2.16) to every rowvector indexed by i ∈ I . This relation raises the natural question: Is there a potential J suchthat the assignment flow (2.28) is a Riemannian gradient descent flow with respect to J , i.e. does R W S ( W ) = − grad J ( W ) hold?We next show that such a potential does not exist in general (Section 3.1). However, in Sec-tion 3.2, we derive a novel representation by decoupling the assignment flow into two separateflows, where one flow steers the other and in this sense dominates the assignment flow. Under theadditional assumption that the weights ω ij of the similarity map S ( W ) in (2.27) are symmetric,we show that the dominating flow is a Riemannian gradient flow induced by a potential. Thisresult is the basis for the continuous-domain formulation of the assignment flow studied in thesubsequent sections. We next show (Theorem 3.4) that under some mild assumptions on D F (2.24) which are alwaysfulfilled in practice, no potential J exists that induces the assignment flow. In order to prove thisresult, we first derive some properties of the mapping exp given by (2.18 c ) as well as explicitexpressions of the differential dS ( W ) of the similarity map (2.27) and its transpose dS ( W ) (cid:62) with respect to the standard Euclidean structure on R n × c . Lemma 3.1
The following properties hold for exp p and its inverse (2.18 c ) , (2.18 d ) . (1) For every p ∈ S the map exp p : R c → S can be expressed by v (cid:55)→ exp p ( v ) = pe v (cid:104) p, e v (cid:105) . (3.2) ontinuous-Domain Assignment Flow Its restriction to T , exp p : T → S , is a diffeomorphism. The differential of exp p and exp − p at v ∈ T and q ∈ S , respectively, are given by d exp p ( v )[ u ] = R exp p ( v ) [ u ] and d exp − p ( q )[ u ] = Π (cid:104) uq (cid:105) for all u ∈ T . (3.3) (2) Let p, q ∈ S . Then
Exp − p ( q ) = R p exp − p ( q ) . (3) Let q ∈ S . If the linear map R q from (2.16) is restricted to T , then R q : T → T is a linearisomorphism with inverse given by ( R q | T ) − ( u ) = Π (cid:104) uq (cid:105) for all u ∈ T . (4) If R c is viewed as an abelian group, then exp : R c × S → S given by ( v, p ) (cid:55)→ exp p ( v ) defines a Lie-group action, i.e. exp p ( v + u ) = exp exp p ( u ) ( v ) and exp p (0) = p for all v, u ∈ T and p ∈ S . (3.4) Furthermore, the following identities follow for all p, q, a ∈ S and v ∈ R c exp p ( v ) = exp q (cid:0) v + exp − q ( p ) (cid:1) (3.5 a ) exp − q ( p ) = − exp − p ( q ) (3.5 b ) exp − q ( a ) = exp − p ( a ) − exp − p ( q ) . (3.5 c ) Proof (1): We have
Exp p ( v + λp ) = Exp p ( v ) for every p ∈ S , v ∈ T and λ ∈ R , as a simplecomputation using definition (2.18 a ) of Exp p directly shows. Therefore, for every v ∈ T exp p ( v ) = Exp p ( R p v ) = Exp p ( pv − (cid:104) v, p (cid:105) p ) = Exp p ( pv ) = pe v (cid:104) p, e v (cid:105) . (3.6)If we restrict exp p to T , then an inverse is explicitly given by (2.18 c ). The differentials (3.3)result from a standard computation.(2): The formula is a direct consequence of the formulas for Exp − p and exp − p given in(2.18 b ) and (2.18 c ), together with the fact (2.17).(3): Fix any p ∈ S and set v q := exp − p ( q ) for q ∈ S . Since exp p : T → S is a dif-feomorphism, the differential d exp p ( v q ) : T → T is an isomorphism. By (3.3), we have R q [ u ] = R exp p ( v q ) [ u ] = d exp p ( v q )[ u ] for all u ∈ T , showing that R q is an isomorphismwith the corresponding inverse.(4): Properties (3.4) defining the group action are directly verified using (3.2). Now, suppose p, q, a ∈ S and v ∈ R c are arbitrary. Since exp q : T → S is a diffeomorphism, we have p = exp q (cid:0) exp − q ( p ) (cid:1) and by the group action property exp p ( v ) = exp exp q (cid:0) exp − q ( p ) (cid:1) ( v ) = exp q (cid:0) v + exp − q ( p ) (cid:1) , (3.7)which proves (3.5 a ). To show (3.5 b ), set v a := exp − p ( a ) and substitute this vector into (3.5 a ).Applying exp − q to both sides then gives exp − q ( a ) = exp − q (cid:0) exp p ( v a ) (cid:1) = v a + exp − q ( p ) = exp − p ( a ) + exp − q ( p ) . (3.8)Setting a = q in this equation, we obtain (3.5 b ) from − q ( q ) = exp − p ( q ) + exp − q ( p ) . (3.9)Using exp − q ( p ) = − exp − p ( q ) in (3.8) yields (3.5 c ).0 F. Savarino and C. Schn¨orr
Lemma 3.2
The i -th component of the similarity map S ( W ) defined by (2.27) can equivalentlybe expressed as S i ( W ) = exp S (cid:16) (cid:88) j ∈N i ω ij (cid:16) exp − S ( W j ) − ρ D F ; j (cid:17)(cid:17) for all i ∈ I and W ∈ W . (3.10) Proof
Consider the expression
Exp − W i (cid:0) L j ( W j ) (cid:1) in the sum of the definition (2.27) of S i ( W ) .Using (3.2) and (3.5 a ), the likelihood (2.25) can be expressed as L j ( W j ) = exp W j (cid:16) − ρ D F ; j (cid:17) = exp W i (cid:16) exp − W i ( W j ) − ρ D F ; j (cid:17) . (3.11)In the following, we set V k = exp − S ( W k ) for all k ∈ I . (3.12)With this and (3.5 c ), we have exp − W i ( W j ) = exp − S ( W j ) − exp − S ( W i ) = V j − V i . (3.13)The two previous identities and Lemma 3.1(2) give Exp − W i (cid:0) L j ( W j ) (cid:1) = R W i (cid:104) exp − W i (cid:0) L j ( W j ) (cid:1)(cid:105) = R W i (cid:104) exp − W i ( W j ) − ρ D F ; j (cid:105) (3.14 a ) = R W i (cid:104) V j − V i − ρ D F ; j (cid:105) . (3.14 b )The sum over the neighboring nodes N i in the definition (2.27) of S i ( W ) can therefore be rewrit-ten as (cid:88) j ∈N i ω ij Exp − W i (cid:0) L j ( W j ) (cid:1) = (cid:88) j ∈N i ω ij R W i (cid:104) V j − V i − ρ D F ; j (cid:105) (3.15 a ) = R W i (cid:104) − V i + (cid:88) j ∈N i ω ij (cid:16) V j − ρ D F ; j (cid:17)(cid:105) , (3.15 b )where we used (cid:80) j ∈N i ω ij = 1 for the last equation. Setting Y i := (cid:80) j ∈N i ω ij (cid:16) V j − ρ D F ; j (cid:17) ,we then have S i ( W ) = Exp W i (cid:0) R W i (cid:2) − V i + Y i (cid:3)(cid:1) = exp W i (cid:0) − V i + Y i (cid:1) = exp S (cid:0) Y i (cid:1) , (3.16)where the last equality again follows from (3.5 a ) together with the definition (3.12) of V i . Lemma 3.3
The i -th component of the differential of the similarity map S ( W ) ∈ W is given by dS i ( W )[ X ] = (cid:88) j ∈N i ω ij R S i ( W ) (cid:20) X j W j (cid:21) for all X ∈ T and i ∈ I . (3.17) Furthermore, the i -th component of the adjoint differential dS ( W ) (cid:62) : T → T with respect tothe standard Euclidean inner product on T ⊂ R n × c is given by dS i ( W ) (cid:62) [ X ] = (cid:88) j ∈N i ω ji Π (cid:20) R S j ( W ) X j W i (cid:21) for every X ∈ T and i ∈ I . (3.18) ontinuous-Domain Assignment Flow Proof
Define the map F i : W → R c by F i ( W ) := (cid:80) j ∈N i ω ij (cid:16) exp − S ( W j ) − ρ D F ; j (cid:17) ∈ R c for all W ∈ W . Let γ : ( − ε, ε ) → W be a smooth curve, with ε > , γ (0) = W and ˙ γ (0) = X .By (3.3), we then have dF i ( W )[ X ] = ddt F i ( γ ( t )) (cid:12)(cid:12) t =0 = (cid:88) j ∈N i ω ij ddt exp − S ( γ j ( t )) (cid:12)(cid:12) t =0 = (cid:88) j ∈N i ω ij Π (cid:104) X j W j (cid:105) . (3.19)Due to Lemma 3.2, we can express the i -th component of the similarity map as S i ( W ) =exp S (cid:0) F i ( W ) (cid:1) . Therefore, the differential of S i is given by dS i ( W )[ X ] = d exp S ( F i ( W )) (cid:2) dF i ( W )[ X ] (cid:3) = R exp S ( F i ( W )) (cid:2) dF i ( W )[ X ] (cid:3) (3.20 a ) = R S i ( W ) (cid:104) (cid:88) j ∈N i ω ij Π (cid:104) X j W j (cid:105)(cid:105) = (cid:88) j ∈N i ω ij R S i ( W ) (cid:104) X j W j (cid:105) , (3.20 b )where we used R S i ( W ) Π = R S i ( W ) from (2.17) to obtain the last equation.Now let W ∈ W and X, Y ∈ T be arbitrary. By assumption on the neighborhood structure(2.7), we have j ∈ N i if and only if i ∈ N j , i.e. ψ N i ( j ) = ψ N j ( i ) . Since R S i ( W ) ∈ R c × c is asymmetric matrix, we obtain (cid:68) dS ( W )[ X ] , Y (cid:69) = (cid:88) i ∈I (cid:68) dS i ( W )[ X ] , Y i (cid:69) = (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:68) R S i ( W ) (cid:104) X j W j (cid:105) , Y i (cid:69) (3.21 a ) = (cid:88) i ∈I (cid:88) j ∈I ψ N i ( j ) ω ij (cid:68) X j W j , R S i ( W ) [ Y i ] (cid:69) = (cid:88) i ∈I (cid:88) j ∈I ψ N j ( i ) ω ij (cid:68) X j , R S i ( W ) [ Y i ] W j (cid:69) (3.21 b ) = (cid:88) j ∈I (cid:88) i ∈N j ω ij (cid:68) X j , Π (cid:104) R S i ( W ) [ Y i ] W j (cid:105)(cid:69) = (cid:88) j ∈I (cid:68) X j , (cid:88) i ∈N j ω ij Π (cid:104) R S i ( W ) [ Y i ] W j (cid:105)(cid:69) . (3.21 c )On the other hand, we have (cid:68) dS ( W )[ X ] , Y (cid:69) = (cid:68) X, dS ( W ) (cid:62) [ Y ] (cid:69) = (cid:88) j ∈I (cid:68) X j , dS j ( W ) (cid:62) [ Y ] (cid:69) . (3.22)Because (3.21) and 3.22 hold for all X, Y ∈ T , the formula for dS i ( W ) (cid:62) [ X ] is proven. Theorem 3.4
Suppose c ≥ and there exists a node i ∈ I such that the distance vector D F ; i is not constant: D F ; i / ∈ R1 . Then no potential J : W → R exists satisfying R W S ( W ) = − grad J ( W ) , i.e. the assignment flow (2.28) is not a Riemannian gradient descent flow. Proof
By (2.17), we have R W S ( W ) = R W Π S ( W ) and R W : T → T is a linear isomor-phism (Lemma 3.1(3)). Therefore, the question of existence of a potential J : W → R for theassignment flow (2.28) can be transferred to the Euclidean setting by applying ( R W | T ) − toboth sides of the equation R W S ( W ) = grad J ( W ) , i.e. R W S ( W ) = − grad J ( W ) = − R W ∂ J ( W ) ⇔ Π S ( W ) = − Π ∂ J ( W ) ∈ T . (3.23)If such a potential J exists, then the negative Hessian of J is given by − Π Hess J ( W ) = d (cid:0) − Π ∂ J (cid:1) ( W ) = d (Π ◦ S )( W ) = Π dS ( W ) = dS ( W ) , (3.24)where the last equation follows from dS ( W ) : T → T . Furthermore, Hess J ( W ) and therefore2 F. Savarino and C. Schn¨orr also dS ( W ) must be symmetric with respect to the Euclidean scalar product on T . Hence, inorder to prove that a potential cannot exist, we show that dS ( W ) is not symmetric at every point W ∈ W . To this end, we construct a W ∈ W and X ∈ T with dS ( W )[ X ] − dS ( W ) (cid:62) [ X ] (cid:54) = 0 .It suffices to show dS i ( W )[ X ] − dS i ( W ) (cid:62) [ X ] (cid:54) = 0 for some row index i ∈ I . (3.25)To simplify notation, we write D i instead of D F ; i in the remainder of the proof. Due to thehypothesis, we have D i = D F ; i (cid:54) = R1 . (3.26)Let k, l ∈ [ c ] be indices such that D ik = min r ∈ [ c ] D ir and D il = max r ∈ [ c ] D ir . (3.27)Relation (3.26) implies D ik < D il and e − ρ D ik > e − ρ D il . (3.28)Define u = e k − e l ∈ T , e k , e l ∈ B c . (3.29)Since c ≥ , there is also a point p ∈ S with p (cid:54) = S and p k = p l , (3.30)e.g. by choosing < α < c and setting p k = p l = α and p r = ( c − − (1 − α ) for r (cid:54) = k, l .With these choices, we define the point W p ∈ W , W pj = exp p (cid:16) ρ D j (cid:17) for all j ∈ I . (3.31)Also, set v := exp − S ( p ) . Then W pj = exp S (cid:0) v + ρ D j (cid:1) by (3.5 a ) and Lemma 3.2 implies S i ( W ) = exp S (cid:16) (cid:88) j ∈N ( i ) ω ij (cid:0) exp − S ( W pj ) − ρ D j (cid:1)(cid:17) (3.32 a ) = exp S (cid:16) (cid:88) j ∈N ( i ) ω ij v (cid:17) = exp c ( v ) = p, (3.32 b )for all i ∈ I . Now, define X u ∈ T with X uk = (cid:40) u ∈ T , if k = iX uj = 0 , if k (cid:54) = i. (3.33)Using the expressions for dS i ( W P ) and dS i ( W p ) (cid:62) from Lemma 3.3, we obtain dS i ( W p )[ X u ] − dS i ( W p ) (cid:62) [ X u ] = ω ii R S i ( W p ) (cid:104) X ui W pi (cid:105) − ω ii Π (cid:104) R S i ( W p ) X ui W pi (cid:105) (3.34 a ) (3.32) = ω ii R p (cid:104) u exp p ( ρ D i ) (cid:105) − ω ii Π (cid:104) R p u exp p ( ρ D i ) (cid:105) (3.34 b ) (3.2) = ω ii (cid:104) p, e ρ D i (cid:105) (cid:16) R p (cid:104) up e − ρ D i (cid:105) − Π (cid:104) e − ρ D i p R p u (cid:105)(cid:17) . (3.34 c ) ontinuous-Domain Assignment Flow ω ii (cid:104) p, e ρ D i (cid:105) > , we only have to check the expression inside the brackets. As for the firstterm, using (2.16), we have R p (cid:104) up e − ρ D i (cid:105) = ue − ρ D i − (cid:10) u, e − ρ D i (cid:11) p. (3.35 a )Setting a := (cid:0) (cid:104) e − ρ D i , S (cid:105) − e − ρ D i (cid:1) , we obtain for the second term Π (cid:104) e − ρ D i p R p u (cid:105) = Π (cid:104) e − ρ D i u − (cid:104) u, p (cid:105) e − ρ D i (cid:105) = e − ρ D i u − (cid:104) u, e − ρ D i (cid:105) S + (cid:104) u, p (cid:105) a. (3.35 b )Thus, the term inside the brackets reads R p (cid:104) p ue − ρ D i (cid:105) − Π (cid:104) e − ρ D i p R p u (cid:105) = − (cid:10) u, e − ρ D i (cid:11) p + (cid:104) u, e − ρ D i (cid:105) S − (cid:104) u, p (cid:105) a (3.36 a ) = (cid:104) u, e − ρ D i (cid:105) (cid:0) S − p (cid:1) − (cid:104) u, p (cid:105) a. (3.36 b )(3.29) and (3.30) imply (cid:104) u, e − ρ D i (cid:105) = e − ρ D ik − e − ρ D il > and (cid:104) u, p (cid:105) = p k − p l = 0 (3.37)such that we can conclude (cid:104) u, e − ρ D i (cid:105) (cid:0) S − p (cid:1) − (cid:104) u, p (cid:105) a = ( e − ρ D ik − e − ρ D il ) (cid:0) S − p (cid:1) (cid:54) = 0 . (3.38)This proves (3.25) and consequently the theorem. Even though Theorem 3.4 says that no potential exists for the assignment flow in general, wereveal in this section a ‘hidden’ potential flow under an additional assumption. To this end, wedecouple the assignment flow into two components and show that one component depends onthe second one. The dominating second one, therefore, provides a new parametrization of theassignment flow. Assuming symmetry of the averaging matrix defined below by (3.39), the dom-inating flow becomes a Riemannian gradient descent flow. The corresponding potential definedon a continuous domain will be studied in subsequent sections.For notational efficiency, we collect all weights (2.26) into the averaging matrix Ω ω ∈ R n × n with Ω ωij := ψ N i ( j ) ω ij = (cid:40) ω ij if j ∈ N i , else , for i, j ∈ I . (3.39) Ω ω encodes the spatial structure of the graph and the weights. For an arbitrary matrix M ∈ R n × c ,the average of its row vectors using the weights indexed by the neighborhood N i is given by (cid:88) k ∈N i ω ik M k = (cid:88) k ∈I Ω ωik M k = M (cid:62) Ω ωi . (3.40)Thus, all row vector averages are given as row vectors of the matrix Ω ω M .We now introduce a new representation of the assignment flow.4 F. Savarino and C. Schn¨orr
Proposition 3.5
The assignment flow (2.28) is equivalent to the system ˙ W = R W S with W (0) = W (3.41 a ) ˙ S = R S [Ω S ] with S (0) = S ( W ) . (3.41 b ) Remark 3.6
We observe that the flow W ( t ) is completely determined by S ( t ) . In the following,we refer to the dominating part (3.41 b ) as the S -flow. Proof
Let W ( t ) be a solution of the assignment flow, i.e. ˙ W i = R W i S i ( W ) for all i ∈ I . Set S ( t ) := S ( W ( t )) . Then (3.41 a ) is immediate from the assumption on W . Using the expressionfor dS i ( W ) from Lemma 3.3 gives ˙ S i = ddt S ( W ) i = dS i ( W )[ ˙ W ] = (cid:88) j ∈N i ω ij R S i ( W ) (cid:104) ˙ W j W j (cid:105) . (3.42)Since W solves the assignment flow and R S i ( W ) = R S i ( W ) Π by (2.17) with ker(Π ) = R1 c ,it follows using the explicit expresssion (2.16) of R S i ( W ) that R S i ( W ) (cid:104) ˙ W j W j (cid:105) = R S i ( W ) (cid:104) R W j S j ( W ) W j (cid:105) = R S i ( W ) (cid:104) S j ( W ) − (cid:104) W j , S j ( W ) (cid:105) c (cid:105) (3.43 a ) = R S i ( W ) (cid:2) S j ( W ) (cid:3) . (3.43 b )Back-substitution of this identity into (3.42), pulling the linear map R S i ( W ) out of the sum andkeeping S i ( W ) = S i in mind, results in ˙ S i = R S i ( W ) (cid:2) S j ( W ) (cid:3) = R S i (cid:104) (cid:88) j ∈N i ω ij S j (cid:105) = R S i [ S (cid:62) Ω ωi ] for all i ∈ I . (3.44)Collecting these vectors as row vectors of the matrix ˙ S gives (3.41 b ). Remark 3.7
Henceforth, we write S for the S -flow S to stress the underlying connection to theassignment flow and to simplify the notation. We next show that the S -flow which essentially determines the assignment flow (Remark 3.7)becomes a Riemannian descent flow under the additional assumption that the averaging matrix(3.39) is symmetric. Proposition 3.8
Suppose the weights defining the similarity map in (2.27) are symmetric, i.e. (Ω ω ) (cid:62) = Ω ω . Then the S -flow (3.41 b ) is a Riemannian gradient decent flow ˙ S = − grad J ( S ) ,induced by the potential J ( S ) := − (cid:104) S, Ω ω S (cid:105) , S ∈ W . (3.45) Proof
Let γ : ( − ε, ε ) → W , ε > , be any smooth curve with ˙ γ (0) = V ∈ R n × c and γ (0) = S .By the symmetry of Ω ω , we have (cid:104) ∂ J ( S ) , V (cid:105) = dJ ( S )[ V ] = ddt J ( γ ( t )) (cid:12)(cid:12) t =0 = −(cid:104) Ω ω S, V (cid:105) for all V ∈ R n × c . Therefore, ∂ J ( S ) = − Ω ω S . Thus, the Riemannian gradient is given by grad J ( S ) = R S [ ∂ J ( S )] = − R S [Ω ω S ] . ontinuous-Domain Assignment Flow L G = I n − Ω ω , (3.46)where I n ∈ R n × n is the identity matrix. Since I n = Diag(Ω ω n ) by (2.26) is the degreematrix of the symmetric averaging matrix Ω ω , L G can be regarded as Laplacian (matrix) ofthe underlying undirected weighted graph G = ( V , E ) . For the analysis of the S -flow it will beconvenient to rewrite the potential (3.45) accordingly. Proposition 3.9
Under the assumption of Proposition 3.8, the potential (3.45) can be written inthe form J ( S ) = 12 (cid:104) S, L G S (cid:105) − (cid:107) S (cid:107) = 14 (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:107) S i − S j (cid:107) − (cid:107) S (cid:107) . (3.47) The matrix L G is symmetric, positive semidefinite and L G n = 0 . Proof
We have J ( S ) = − (cid:104) S, (Ω ω − I n ) S (cid:105) + (cid:104) S, S (cid:105) = ( (cid:104) S, L G S (cid:105) − (cid:107) S (cid:107) ) . Thus, we focuson the sum of (3.47).First, note that (cid:107) S j − S i (cid:107) = (cid:104) S j , S j − S i (cid:105) + (cid:104) S i , S i − S j (cid:105) . Since ψ N i ( j ) = ψ N j ( i ) and ω ij = ω ji by assumption, we have (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S j , S j − S i (cid:105) = (cid:88) i,j ∈I ψ N i ( j ) ω ij (cid:104) S j , S j − S i (cid:105) = (cid:88) i,j ∈I ψ N j ( i ) ω ji (cid:104) S j , S j − S i (cid:105) (3.48 a ) = (cid:88) j ∈I (cid:88) i ∈N j ω ji (cid:104) S j , S j − S i (cid:105) = (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S i , S i − S j (cid:105) , (3.48 b )where the last equality follows by renaming the indices i and j . Thus, using (2.26), (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:107) S i − S j (cid:107) = (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S j , S j − S i (cid:105) + (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S i , S i − S j (cid:105) (3.49 a ) = 2 (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S i , S i − S j (cid:105) = 2 (cid:88) i ∈I (cid:68) S i , S i − (cid:88) j ∈N i ω ij S j (cid:69) (3.49 b ) = 2 (cid:88) i ∈I (cid:104) S i , ( LS ) i (cid:105) = 2 (cid:104) S, LS (cid:105) . (3.49 c )The properties of L G follow from the symmetry of Ω ω , nonnegativity of the quadratic form (3.49)and definition (3.46). In this section, we study a continuous-domain variational formulation of the potential of Propo-sition 3.9. We confine ourselves to uniform weights (2.26) and neighborhoods (2.7) that only For undirected graphs, the graph Laplacian is commonly defined by the weighted adjacency matrices with diagonal entries 0, whereas Ω ωii = ω ii >
0. The diagonal entries do not affect thequadratic form (3.47) , however. F. Savarino and C. Schn¨orr contain the nearest neighbors of each vertex i , such that L G becomes the discretized ordinaryLaplacian. As a result, we consider the problem to minimize the functional E α : H ( M ; R c ) → R , (4.1 a ) E α ( S ) := (cid:90) M (cid:107) DS ( x ) (cid:107) − α (cid:107) S ( x ) (cid:107) d x , α > . (4.1 b )Throughout this section, M ⊂ R is a simply-connected bounded open subset in the Euclideanplane. Parameter α controls the interaction between regularization and enforcing integrality when S ( x ) , x ∈ M is restricted to values in the probability simplex.We prove well-posedness for vanishing (Section 4.1) and Dirichlet boundary conditions (Sec-tion 4.2), respectively, and specify explicitly the set of minimizers in the former case. The gra-dient descent flow corresponding to the latter case, initialized by means of given data and withparameter value α = 1 , may be seen as continuous-domain extension of the assignment flow,that is parametrized according to (3.5) and operates at the smallest spatial scale in terms of thesize |N i | of uniform neighborhoods (2.7) (in the discrete formulation (2.28): nearest neighboraveraging). We illustrate this by a numerical example (Section 4.3), based on discretizing (4.1)and applying an algorithm that mimics the S -flow and converges to a local minimum of thenon-convex functional (4.1), by solving a sequence of convex programs.We point out that M could be turned into a Riemannian manifold using a metric that reflectsimages features (edges etc.), as was proposed with the Laplace-Beltrami framework for imagedenoising [ KMS00 ]. In this work we focus on the essential point, however, that distinguishesimage denoising from image labeling , i.e. the interaction of the two terms (4.1) that essentiallyis a consequence of the information geometry of the assignment manifold W (2.19). Based on (2.3) we define the closed convex set D ( M ) = { S ∈ H ( M ; R c ) : S ( x ) ∈ ∆ c a.e. in M} . (4.2)and focus on the variational problem inf S ∈ D ( M ) E α ( S ) , (4.3)with E α given by (4.1). E α is smooth but nonconvex. We specify the set of minimizers (Prop. 4.2).Recall notation (2.1). Lemma 4.1
Let p ∈ ∆ c . Then (cid:107) p (cid:107) = 1 if and only if p ∈ B c . Proof
The ‘if’ statement is obvious. As for the ‘only if’, suppose p (cid:54)∈ B c , i.e. p i < for all i ∈ [ c ] . Then p i < p i and (cid:107) p (cid:107) < (cid:107) p (cid:107) = 1 . Proposition 4.2
The functional E α : D ( M ) → R given by (4.1) is lower bounded, E α ( S ) ≥ − α Vol( M ) > −∞ , ∀ S ∈ D ( M ) . (4.4) ontinuous-Domain Assignment Flow This lower bound is attained at some point in arg min S ∈ D ( M ) E α ( S ) = (cid:40) { S e , . . . , S e c } , if α > , { S p : M → ∆ : p ∈ ∆ } , if α = 0 , (4.5) where, for any p ∈ ∆ , S p denotes the constant vector field x (cid:55)→ S p ( x ) = p . Proof
Let p ∈ ∆ . Then (cid:107) p (cid:107) ≤ (cid:107) p (cid:107) = 1 . It follows for S ∈ D ( M ) that E α ( S ) ≥ − α (cid:107) S (cid:107) M ≥ − α (cid:107) (cid:107) M = − α Vol( M ) , (4.6)which is (4.4).We next show that the right-hand side of (4.5) specifies minimizers of E α . For any p ∈ ∆ ,the constant vector field S p is contained in D ( M ) . Consider specifically S e i , i ∈ [ c ] . Since (cid:107) S e i ( x ) (cid:107) = (cid:107) e i (cid:107) = 1 and DS e i ≡ , the lower bound is attained, E α ( S e i ) = − α Vol( M ) ,and the functions { S e , . . . , S e c } minimize E α , for every α ≥ . If α = 0 , then the constantfunctions S p are minimizers as well, for any p ∈ ∆ , since then E α ( S p ) = (cid:107) DS p (cid:107) M = 0 = − · Vol( M ) . (4.7)We conclude by showing that no minimizers other than (4.5) exist. Let S ∗ ∈ D ( M ) beanother minimizer of E α with E α ( S ∗ ) = − α Vol( M ) . We distinguish the two cases α = 0 and α > .If α = 0 , then S ∗ satisfies (4.7) and (cid:107) DS ∗ (cid:107) M = 0 . Since (cid:107) DS ∗ ; i (cid:107) M ≤ (cid:107) DS ∗ (cid:107) M = 0 forevery i ∈ [ c ] , S ∗ is constant by Lemma 2.2, i.e. a p ∈ ∆ exists such that S ∗ = S p a.e.If α > , then using the equation E α ( S ∗ ) = − α Vol( M ) and (cid:107) S ∗ ( x ) (cid:107) ≤ gives α Vol( M ) ≤ (cid:107) DS ∗ (cid:107) M + α Vol( M ) = (cid:107) DS ∗ (cid:107) M − E α ( S ∗ ) = α (cid:107) S ∗ (cid:107) M (4.8 a ) ≤ α (cid:107) (cid:107) M = α Vol( M ) , (4.8 b )which shows (cid:107) DS ∗ (cid:107) M = 0 and hence by Lemma 2.2 again S ∗ = S p for some p ∈ ∆ . Thepreceding inequalities also imply Vol( M ) = (cid:107) S ∗ (cid:107) M , i.e. (cid:107) S ∗ ( x ) (cid:107) = 1 a.e. By Lemma 4.1, weconclude S ∗ = S p with p ∈ B c , that is S ∗ ∈ { S e , . . . , S e c } .Proposition 4.2 highlights the effect of the concave term of the objective E α (4.1): labelings areenforced in the absence of data. Below, the latter are taken into account (i) by imposing non-zeroboundary conditions and (ii) by initalizing a corresponding gradient flow (Section 4.3). In this section, we consider the case where boundary conditions are imposed by restricting thefeasible set of problem (4.3) to A g ( M ) = { S ∈ D ( M ) : S − g ∈ H ( M ; R c ) } = (cid:0) g + H ( M ; R c ) (cid:1) ∩ D ( M ) (4.9)for some fixed g that prescribes simplex-valued boundary values (in the trace sense). As inter-section of a closed affine subspace and a closed convex set, A g ( M ) is closed convex.Weak lower semicontinuity is a key property for proving the existence of minimizers. In thecase of E α (4.1) this is not immediate, due to the lack of convexity.8 F. Savarino and C. Schn¨orr
Proposition 4.3
The functional E α given by (4.1) is weak sequentially lower semicontinuouson A g ( M ) , i.e. for any sequence ( S n ) n ∈ N ⊂ A g ( M ) weakly converging to S ∈ A g ( M ) , theinequality E α ( S ) ≤ lim inf n →∞ E α ( S n ) (4.10) holds. Proof
Let S n (cid:42) S converge weakly in A g ( M ) ⊂ H ( M ; R c ) . Then, by Prop. 2.3(c), (cid:107) S (cid:107) M ≤ lim inf n →∞ (cid:107) S n (cid:107) M . (4.11)Since S, S n ∈ A g ( M ) , we also have ( S n − g ) (cid:42) ( S − g ) in H ( M ; R c ) by (4.9) andconsequently S n → S strongly in L ( M ; R c ) due to (2.34). Taking into account (4.11) and lim inf n →∞ (cid:107) s n (cid:107) M = lim n →∞ (cid:107) S n (cid:107) M = (cid:107) S (cid:107) M , we obtain E α ( S ) = (cid:107) S (cid:107) M − (1 + α ) (cid:107) S (cid:107) M ≤ lim inf n →∞ (cid:107) S n (cid:107) M + lim inf n →∞ (cid:0) − (1 + α ) (cid:107) S n (cid:107) M (cid:1) (4.12 a ) ≤ lim inf n →∞ E α ( S n ) . (4.12 b )We are now prepared to show that E α attains its minimal value on A g ( M ) , following the basicproof pattern of [ Zei85, Ch. 38 ]. Theorem 4.4
Let E α be given by (4.1) . There exists S ∗ ∈ A g ( M ) such that E ∗ α := E α ( S ∗ ) = inf S ∈A g ( M ) E α ( S ) . (4.13) Proof
Let ( S n ) n ∈ N ⊂ A g ( M ) be a minimizing sequence such that lim n →∞ E α ( S n ) = E ∗ α . (4.14)Then there exists some sufficiently large n ∈ N such that E ∗ α ≥ E α ( S n ) = (cid:107) S n (cid:107) M − (1 + α ) (cid:107) S n (cid:107) M , ∀ n ≥ n . (4.15)Since S n ( x ) ∈ ∆ for a.e. x ∈ M , we have (cid:107) S n (cid:107) M ≤ Vol( M ) and hence obtain (cid:107) S n (cid:107) M ≤ E ∗ α + (1 + α ) (cid:107) S n (cid:107) M ≤ E ∗ α + (1 + α ) Vol( M ) , ∀ n ≥ n . (4.16)Thus the sequence ( S n ) n ∈ N ⊂ H ( M ; R c ) is bounded and, by Prop. 2.3(c), we may extract aweakly converging subsequence S n k (cid:42) S ∗ ∈ H ( M ; R c ) . Since A g ( M ) ⊂ H ( M ; R c ) isclosed convex, Prop. 2.3(a) implies S ∗ ∈ A g ( M ) . Consequently, by Prop. 4.3 and (4.14), E α ( S ∗ ) ≤ lim inf k →∞ E α ( S n k ) = lim k →∞ E α ( S n k ) = E ∗ α (4.17)which implies E α ( S ∗ ) = E ∗ α , i.e. S ∗ ∈ A g ( M ) minimizes E α . ontinuous-Domain Assignment Flow We consider the variational problem (4.13) inf S ∈A g ( M ) (cid:90) M (cid:107) DS (cid:107) − α (cid:107) S (cid:107) dx, (4.18)for some fixed g specifying the boundary values S | ∂ M = g | ∂ M , and the problem to compute alocal minimum numerically using an optimization scheme that mimics the S -flow of Proposition3.5.Based on (4.9), we rewrite the problem in the form inf f ∈ H ( M ; R c ) (cid:110) (cid:107) D ( g + f ) (cid:107) M − α (cid:107) g + f (cid:107) M dx + δ D ( M ) ( g + f ) (cid:111) (4.19 a ) = inf f ∈ H ( M ; R c ) (cid:110) (cid:107) Df (cid:107) M + 2 (cid:104) Dg, Df (cid:105) M − α (cid:0) (cid:107) f (cid:107) M + 2 (cid:104) g, f (cid:105) M (cid:1) + δ D ( M ) ( g + f ) (cid:111) + C, (4.19 b )where the last constant C collects terms not depending on f . We discretize the problem asfollows. f becomes a vector f ∈ R c n with n = |I| subvectors f i ∈ R c , i ∈ I or alterna-tively with c = |J | subvectors f j , j ∈ J . The inner product (cid:104) g, f (cid:105) M is replaced by (cid:104) g, f (cid:105) = (cid:80) i ∈ [ n ] (cid:104) g i , f i (cid:105) = (cid:80) j ∈ [ c ] (cid:104) g j , f j (cid:105) . We keep the symbols f, g for simplicity and indicate the dis-cretized setting by the subscript n as introduced next. D becomes a gradient matrix D n that estimates the gradient of each subvector f j separately,such that L n f := D (cid:62) n D n f (4.20)is the basic discrete 5-point stencil Laplacian applied to each subvector f j . The feasible set D ( M ) is replaced by the closed convex set D n := { f ≥ (cid:104) c , f i (cid:105) = 1 , ∀ i ∈ I} . (4.21)Thus the discretized problem reads inf f (cid:110) (cid:107) D n f (cid:107) + 2 (cid:104) L n g − αg, f (cid:105) − α (cid:107) f (cid:107) + δ D n ( g + f ) (cid:111) . (4.22)Having computed a local minimum f ∗ , the corresponding local minimum of (4.18) is S ∗ = g + f ∗ .In order to compute f ∗ , we applied the proximal forward-backward scheme f ( k +1) = arg min f (cid:110) (cid:107) D n f (cid:107) +2 (cid:104) L n g − α ( g + f ( k ) ) , f (cid:105) + 12 τ k (cid:107) f − f ( k ) (cid:107) + δ D n ( g + f ) (cid:111) , k ≥ , (4.23)with proximal parameters τ k , k ∈ N and initialization f (0) i , i ∈ I specified further below. Theiterative scheme (4.23) is a special case of the PALM algorithm [ BST14, Sec. 3.7 ]. Ignoringthe proximal term, each problem (4.23) amounts to solve c (discretized) Dirichlet problems withthe boundary values of g j , j ∈ [ c ] imposed, and with right-hand sides that change during theiteration since they depend on f ( k ) . The solutions ( f j ) ( k ) , j ∈ J to these Dirichlet problemsdepend on each other, however, through the feasible set (4.21). At each iteration k , problem (4.23)can be solved by convex programming. The proximal parameters τ k act as stepsizes such that thesequence f ( k ) does not approach a local minimum too rapidly. Then the interplay between the0 F. Savarino and C. Schn¨orr
Figure 4.1. Evaluation of the numerical scheme (4.23) that mimics the S -flow of Propo-sition 3.5. Parameter values: α = 1 , τ k = τ = 10 , ∀ k . Top , from left to right: Groundtruth, noisy input data f (0) , iterate f (100) and f ∗ resulting from f (100) by a trivial round-ing step. S ( k ) = f ( k ) + g differs from f ( k ) by the boundary values corresponding tothe noisy input data. Inspecting the values of f (100) close to the boundary shows thatthe influence of boundary noise is minimal. Bottom , from left to right: The iterates f (10) , f (20) , f (30) , f (40) . Taking into account rounding as post-processing step, the se-quence f ( k ) quickly converges after rounding to a reasonable partition. About 50 moreiterations are required to fix the values at merely few hundred remaining pixels. Slightrounding of the geometry of the components of the partition, in comparison to groundtruth, corresponds to using uniform weights (2.26) for the assignment flow. linear form that adapts during the iteration and the regularizing effect of the Laplacians can finda labeling (partition) corresponding to a good local optimum.As for g , we chose g i = L i ( S ) , i ∈ I at boundary vertices i and g i = 0 at every interiorvertex i . Consequently, with the initialization f (0) i = L i ( S ) , i ∈ I at interior vertices (theboundary values of f are zero), the sequence S ( k ) = g + f ( k ) mimics the S -flow of Proposition3.5 where the given data also show up in the initialization S (0) only.Figure 4.1 provides an illustration using an experiment adopted from [ ˚APSS17, Fig. 6 ], orig-inally designed to evaluate the performance of geometric regularization of label assignmentsthrough the assignment flow in an unbiased way. Parameter values are specified in the caption.The result confirms that the continuous-domain formulations discussed above represent the as-signment flow at the smallest spatial scale. Let S ∗ solve the variational problem (4.18) . Then S ∗ satisfies the variationalinequality (cid:104) DS ∗ , DS − DS ∗ (cid:105) M − α (cid:104) S ∗ , S − S ∗ (cid:105) M ≥ , ∀ S ∈ A g ( M ) . (4.24) ontinuous-Domain Assignment Flow Proof
Functional E α given by (4.18) is Gateaux-differentiable with derivative (cid:104) E (cid:48) ( α )( S ∗ ) , S (cid:105) H − ( M ; R c ) × H ( M ; R c ) = 2 (cid:0) (cid:104) DS ∗ , S (cid:105) M − α (cid:104) S ∗ , S (cid:105) M (cid:1) . (4.25)The assertion follows from applying Theorem 2.4.We conclude this section by deriving a PDE corresponding to (4.24), that a minimizer S ∗ issupposed to satisfy in the weak sense. The derivation is formal in that we adopt the unrealisticregularity assumption S ∗ ∈ A g ( M ) , (4.26)with A g ( M ) defined analogous to (4.9). While this will hold for the continuous-domain linear problems corresponding to (4.23) at each step k of the iteration and for sufficiently smooth ∂ M ,it will not hold in the limit k → ∞ , since we expect (and wish) S ∗ to become discontinuous,contrary to the regularity assumption (4.26) and the continuity implied by the Sobolev embeddingtheorem for M ⊂ R d with d = 2 . Nevertheless, since the PDE provides another interpretation ofthe assignment flow, we state it – see (4.32) below – and hope it will stimulate further research.In view of the assumption (4.26), set S ∗ = g + f ∗ , f ∗ ∈ H ( M ; R c ) . (4.27)Inserting S ∗ and S = g + h, h ∈ H ( M ; R c ) , into (4.24) and partial integration gives (cid:104)− ∆ S ∗ − αS ∗ , h − f ∗ (cid:105) M ≥ , (4.28)where ∆ S ∗ = (∆ S ∗ ;1 , . . . , ∆ S ∗ ; c ) (cid:62) applies componentwise. Using the shorthands ν α ( S ∗ ) = − ∆ S ∗ − αS ∗ , (4.29 a ) µ α ( S ∗ ) = ν α ( S ∗ ) − (cid:104) ν α ( S ∗ ) , S ∗ (cid:105) R c , (4.29 b )where (cid:104) ν α ( S ∗ ) , S ∗ (cid:105) R denotes the function x (cid:55)→ (cid:10) ν α ( S ∗ )( x ) , S ∗ ( x ) (cid:11) , x ∈ M , we have (cid:104) µ α ( S ∗ ) , S ∗ (cid:105) M = 0 (4.30 a )since (cid:104) c , S ∗ ( x ) (cid:105) = 1 for a.e. x , and (cid:104) µ α ( S ∗ ) , S (cid:105) M = (cid:104) ν α ( S ∗ ) , h − f ∗ (cid:105) M ≥ , (4.30 b )which is (4.28). Since S ( x ) ≥ a.e. in M and may have arbitrary support, we deduce fromthe inequality (cid:104) µ α ( S ∗ ) , S (cid:105) M ≥ and from the self-duality of the nonnegative orthant R c + that µ α ( S ∗ ) ≥ a.e. in M . Since also S ∗ ≥ a.e., this implies that equation (4.30 a ) holds pointwisea.e. in M : µ α ( S ∗ )( x ) S ∗ ( x ) = ν α ( S ∗ )( x ) S ∗ ( x ) − (cid:10) ν α ( S ∗ )( x ) S ∗ ( x ) (cid:11) S ∗ ( x ) = 0 a.e. in M . (4.31)Substituting ν α ( S ∗ ) we deduce that a minimizer S ∗ = g + f ∗ characterized by the variationalinequality (4.24) weakly satisfies the PDE R S ∗ ( − ∆ S ∗ − αS ∗ ) = 0 , (4.32)where R S ∗ defined by (2.16) applies R S ∗ ( x ) to vector ( − ∆ S ∗ − αS ∗ )( x ) at every x ∈ M .2 F. Savarino and C. Schn¨orr
Remark 4.6 (Comments) (1) We point out that computing a vector field S ∗ satisfying (4.24) is difficult in practice, due tothe nonconvexity of problem (4.18) . On the other hand, the algorithm proposed in Section 4.3in the result illustrated by Figure 4.1 shows that good suboptima can be computed by merelysolving a sequence of simple problems.(2) As already pointed out at the beginning of this section, the derivation of the PDE (4.32) is merely a formal one, due to the unrealistic regularity assumption (4.26) . In fact, since ker R S ∗ ( x ) = R1 c , equation (4.32) says that S ∗ is constant up to a set of measure zero.While the numerical result (Fig. 4.1) clearly reflects this, the discontinuity of S ∗ conflictswith assumption (4.26) . We presented a novel parametrization of the assignment flow for contextual data classification ongraphs. The dominating part of the flow admits the interpretation as Riemannian gradient flowwith respect to the underlying information geometry, unlike the original formulation of the as-signment flow. A decomposition of the corresponding potential by means of a non-local graphLaplacian makes explicit the interaction of two processes: regularization of label assignments andgradual enforcement of unambiguous decisions. The assignment flow combines these aspects ina seamless way, unlike traditional approaches where solutions to convex relaxations require post-processing. It is remarkable that this behaviour is solely induced by the underlying informationgeometry.We studied a continuous-domain variational formulation as counterpart of the discrete for-mulation restricted to a local discrete Laplacian (nearest neighbor interaction). A numerical al-gorithm in terms of a sequence of simple linear elliptic problems reproduces results that wereobtained with the original formulation of the assignment flow using completely different numer-ics (geometric ODE integration). This illustrates the derived mathematical relations.We outline three attractive directions of further research. • We clarified in Section 4 that the inherent smooth setting of the assignment flow (2.28) trans-lates under suitable assumptions to the sequence of linear (discretized) elliptic PDE problems(4.23) together with a simple convex constraint. We did not touch on the limit problem, how-ever. More mathematical work is required here, cf. Remark 4.6.Since the assignment flow returns image partitions when applied to image features on a gridgraph, the situation reminds us of the Mumford-Shah functional [
MS89 ] and its approximationby a sequence of Γ -converging smooth elliptic problems [ AT90 ]. Likewise, one may regardthe concave second term of (4.18) together with the convex constraint S ∈ A g as a vector-valued counterpart of the basic nonnegative double-well potential of scalar phase-field modelsfor binary segmentation [ Ste91, CT18 ]. In these works, too, nonsmooth limit cases resultfrom Γ -converging simpler problems. • Adopting the viewpoint of evolutionary dynamics [
HS03 ] on label assignment, the assignmentflow may be characterized as spatially coupled replicator dynamics. To the best of our knowl-edge, our paper [ ˚APSS17 ] seems to be the first one that used information theory to formulatethis spatial coupling. Some consequences of the geometry were elaborated in the present paperand discussed above. ontinuous-Domain Assignment Flow
TC04, dB13 ], applied mathematics[
NPB11, BPN14 ], including extensions to scenarios with an infinite number of strategies(as opposed to selecting from a finite set of labels) – see [
AFMS18 ] and references therein.In this context, our work might stimulate researchers working on spatially extended evo-lutionary dynamics in various scientific disciplines. In particular, generalizing our approachto continuous-domain integro-differential models seem attractive that conform to the assig-ment flow with non-local interactions (i.e. with larger neighborhoods |N i | , i ∈ I ) and theunderlying geometry. • Last but not least, our work may support a better understanding of learning with networks.Our preliminary work on learning the weights (2.26) using the linearized assignment flow[
HSPS19 ] on a single graph (‘layer’) revealed the model expressiveness of this limited sce-nario, on the one hand, and that subdividing complex learning tasks in this way avoids ‘blackbox behaviour’, on the other hand. We hope that the continuous-domain perspective developedin this paper in terms of sequences of linear PDEs will support our further understanding oflearning with hierarchical ‘deeper’ architectures.
Acknowledgements
Financial support by the German Science Foundation (DFG), grant GRK 1653, is gratefully ac-knowledged. This work has also been stimulated by the Heidelberg Excellence Cluster STRUC-TURES, funded by the DFG under Germany’s Excellen Strategy EXC-2181/1 - 390900948.
References [ABM14] H. Attouch, G. Buttazzo, and G. Michaille,
Variational Analysis in Sovolev and BVSpaces: Applications to PDEs and Optimization , 2nd ed., SIAM, 2014.[AFMS18] L. Ambrosio, M. Fornasier, M. Morandotti, and G. Savar´e,
Spatially InhomogeneousEvolutionary Games , CoRR abs/1805.04027 (2018).[AN00] S.-I. Amari and H. Nagaoka,
Methods of Information Geometry , Amer. Math. Soc.and Oxford Univ. Press, 2000.[˚APSS17] F. ˚Astr¨om, S. Petra, B. Schmitzer, and C. Schn¨orr,
Image Labeling by Assignment ,Journal of Mathematical Imaging and Vision (2017), no. 2, 211–238.[ARP +
19] V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen,
On Instabilitiesof Deep Learning in Image Reconstruction - Does AI Come at a Cost? , CoRRabs/1902.05300 (2019).[AT90] L. Ambrosio and V. M. Tortorelli,
Approximation of Functional Depending on Jumpsby Elliptic Functional via Γ -Convergence , Comm. Pure Appl. Math. (1990),no. 8, 999–1036.[BPN14] A. S. Bratus, V. P. Posvyanskii, and A. S. Novozhilov, Replicator equations andSpace , Math. Modelling Natural Phenomena (2014), no. 3, 47–67.[BST14] J. Bolte, S. Sabach, and M. Teboulle, Proximal Alternating Linearized Minimizationfor Nonconvex and Nonsmooth Problems , Math. Progr., Ser. A (2014), no. 1-2,459–494.[CCP12] A. Chambolle, D. Cremers, and T. Pock,
A Convex Approach to Minimal Partitions ,SIAM J. Imag. Sci. (2012), no. 4, 1113–1158.[CDH16] P. V. Coveney, E. R. Dougherty, and R. R. Highfield, Big Data Need Big Theory too ,Phil. Trans. R. Soc. Lond. A (2016), 20160153. F. Savarino and C. Schn¨orr [CT18] R. Cristoferi and M. Thorpe,
Large Data Limit for a Phase Transition Model withthe p -Laplacian on Point Clouds , Europ. J. Appl. Math. (2018), 1–47.[dB13] Russ deForest and A. Belmonte, Spatial Pattern Dynamics due to the Fitness Gra-dient Flux in Evolutionary Games , Physical Review E (2013), no. 6, 062138.[E17] Weinan E, A Proposal on Machine Learning via Dynamical Systems , Comm. Math.Statistics (2017), no. 1, 1–11.[EHL19] W. E, J. Han, and Q. Li, A Mean-Field Optimal Control Formulation of Deep Learn-ing , Res. Math. Sci. (2019), no. 10, 41 pages.[Ela17] M. Elad, Deep, Deep Trouble: Deep Learning’s Impact on Image Processing, Math-ematics, and Humanity , SIAM News (2017).[HR17] E. Haber and L. Ruthotto,
Stable Architectures for Deep Neural Networks , InverseProblems (2017), no. 1, 014004.[HS03] J. Hofbauer and K. Siegmund, Evolutionary Game Dynamics , Bull. Amer. Math.Soc. (2003), no. 4, 479–519.[HSPS19] R. H¨uhnerbein, F. Savarino, S. Petra, and C. Schn¨orr, Learning Adaptive Regular-ization for Image Labeling Using Geometric Assignment , Proc. SSVM, Springer,2019.[HZRS16] K. He, X. Zhang, S. Ren, and J. Sun,
Deep Residual Learning for Image Recognition ,Proc. CVPR, 2016.[Jos17] J. Jost,
Riemannian Geometry and Geometric Analysis , 7th ed., Springer-VerlagBerlin Heidelberg, 2017.[KMS00] R. Kimmel, R. Malladi, and N. Sochen,
Images as Embedded Maps and MinimalSurfaces: Movies, Color, Texture, and Volumetric Images , Int. J. Comp. Vision (2000), no. 2, 111–129.[LS11] J. Lellmann and C. Schn¨orr, Continuous Multiclass Labeling Approaches and Algo-rithms , SIAM J. Imag. Sci. (2011), no. 4, 1049–1096.[LT19] G.-H. Liu and E. A. Theodorou, Deep Learning Theory Review: An Optimal Controland Dynamical Systems Perspective , CoRR abs/1908.10920 (2019).[MS89] D. Mumford and J. Shah,
Optimal Approximations by Piecewise Smooth Functionsand Associated Variational Problems , Comm. Pure Appl. Math. (1989), 577–685.[NPB11] A. Novozhilov, V. P. PosvNovozh, and A. S. Bratus, On the Reaction–DiffusionReplicator Systems: Spatial Patterns and Asymptotic Behaviour , Russ. J. Numer.Anal. Math. Modelling (2011), no. 6, 555–564.[San10] W. H. Sandholm, Population Games and Evolutionary Dynamics , MIT Press, 2010.[Sch19] C. Schn¨orr,
Assignment Flows , Variational Methods for Nonlinear Geometric Dataand Applications (P. Grohs, M. Holler, and A. Weinmann, eds.), Springer (inpress), 2019.[Ste91] P. Sternberg,
Vector-Valued Local Minimizers of Nonconvex Variational Problems ,Rocky-Mountain J. Math. (1991), no. 2, 799–807.[TC04] A. Traulsen and J. C. Claussen, Similarity-Based Cooperation and Spatial Segrega-tion , Phys. Rev. E (2004), no. 4, 046128.[Zei85] E. Zeidler, Nonlinear Functional Analysis and its Applications , vol. 3, Springer, 1985.[Zie89] W. P. Ziemer,
Weakly Differentiable Functions , Springer, 1989.[ZSPS19] A. Zeilmann, F. Savarino, S. Petra, and C. Schn¨orr,
Geometric Numerical Integrationof the Assignment Flow , CoRR abs/1810.06970, Inverse Problems: in press (2019).[ZZPS19a] A. Zern, M. Zisler, S. Petra, and C. Schn¨orr,
Unsupervised Assignment Flow: LabelLearning on Feature Manifolds by Spatially Regularized Geometric Assignment ,CoRR abs/1904.10863 (2019).[ZZPS19b] M. Zisler, A. Zern, S. Petra, and C. Schn¨orr,