[PDF] Continuous-Domain Assignment Flows

Abstract

Assignment flows denote a class of dynamical models for contextual data labeling (classification) on graphs. We derive a novel parametrization of assignment flows that reveals how the underlying information geometry induces two processes for assignment regularization and for gradually enforcing unambiguous decisions, respectively, that seamlessly interact when solving for the flow. Our result enables to characterize the dominant part of the assignment flow as a Riemannian gradient flow with respect to the underlying information geometry. We consider a continuous-domain formulation of the corresponding potential and develop a novel algorithm in terms of solving a sequence of linear elliptic PDEs subject to a simple convex constraint. Our result provides a basis for addressing learning problems by controlling such PDEs in future work.

Full PDF

11 Continuous-Domain Assignment Flows

F. SAVARINO and C. SCHN ¨ORR

Image and Pattern Analysis Group, Heidelberg University, Heidelberg, Germanyemail : [email protected], [email protected] URL : https://ipa.math.uni-heidelberg.de Assignment ﬂows denote a class of dynamical models for contextual data labeling (classiﬁcation) ongraphs. We derive a novel parametrization of assignment ﬂows that reveals how the underlying in-formation geometry induces two processes for assignment regularization and for gradually enforcingunambiguous decisions, respectively, that seamlessly interact when solving for the ﬂow. Our resultenables to characterize the dominant part of the assignment ﬂow as a Riemannian gradient ﬂow withrespect to the underlying information geometry. We consider a continuous-domain formulation of thecorresponding potential and develop a novel algorithm in terms of solving a sequence of linear ellip-tic PDEs subject to a simple convex constraint. Our result provides a basis for addressing learningproblems by controlling such PDEs in future work.

Key Words: image labeling, image segmentation, information geometry, replicator equation, evolutionarydynamics, assignment ﬂow.

Contents1. Introduction

2. Preliminaries

3. A Novel Representation of the Assignment Flow

4. Continuous-Domain Variational Approach

5. Conclusion References a r X i v : . [ m a t h . D S ] O c t F. Savarino and C. Schn¨orr

Deep networks are omnipresent in many disciplines due to their unprecedented predictive powerand the availability of software for training that is easy to use. However, this rapid developmentduring recent years has not improved our mathematical understanding in the same way, so far[

Ela17 ]. The ‘black box’ behaviour of deep networks and systematic failures [

ARP + ], thelack of performance guarantees and reproducibility of results, raises doubts if a purely data-driven approach can deliver the high expectations of some of its most passionate proponents[ CDH16 ]. ‘Mathematics of deep networks’, therefore, has become a focal point of research.Initiated maybe by [

HZRS16 ] and mathematically substantiated and promoted by, e.g. [

HR17,E17 ], attempts to understand deep network architectures as discretized realizations of dynamicalsystems has become a fruitful line of research. Adopting this viewpoint, we introduced a dy-namical system – called assignent ﬂow – for contextual data classiﬁcation and image labelingon graphs [ ˚APSS17 ]. We refer to [

Sch19 ] for a review of recent work including parameter es-timation (learning) [

HSPS19 ], adaption of data prototypes during assignment [

ZZPS19a ], andlearning prototypes from low-rank data representations and self-assignment [

ZZPS19b ].Two key properties of the assignment ﬂow are smoothness and gradual enforcement of un-ambigious classiﬁcation in a single process, solely induced by adopting an elementary statisticalmanifold as state space that is natural for classiﬁcation tasks, and the corresponding informa-tion geometry [

AN00 ]. This differs from traditional variational approaches to image labeling[

LS11, CCP12 ] that enjoy convexity but are inherently nonsmooth and require postprocessingto achieve unambigous decisions. We regard nonsmoothness as a major barrier to the design ofhierarchical architectures for data classiﬁcation.The assignment ﬂow combines by composition (rather than by addition) separate local pro-cesses at each vertex of the underlying graph and nonlocal regularization. Each local processfor label assignment is governed by an ODE, the replicator equation [ HS03, San10 ], whereasregularization is accomplished by nonlocal geometric averaging of the evolving assignments. Itis well known [

HS03 ] that if the afﬁnity measure which deﬁnes the replicator equation and hencegoverns label selection can be derived as gradient of a potential, then the replicator equation isjust the corresponding Riemannian gradient ﬂow induced by the Fisher-Rao metric. The geomet-ric regularization of assignments performed by the assignment ﬂow yields an afﬁnity measurefor which the (non-)existence of a corresponding potential is not immediate, however.

Contribution and Organization.

The objective of this paper is to clarify this situation. Aftercollecting background material in Section 2, we prove that no potential exists that enables tocharacterize the assignment ﬂow as Riemannian gradient ﬂow (Section 3.1). Next, we provide anovel parametrization of the assignment ﬂow by separating a dominant component of the ﬂow,called S -ﬂow , that completely determines the remaining component and hence essentially char-acterizes the assignment ﬂow (Section 3.2). The S -ﬂow does correspond to a potential, under anadditional symmetry assumption with respect to the weights that parametrize the regularizationproperties of the assignment ﬂow through (weighted) geometric averaging. This potential canbe decomposed into two components that make explicit the two interacting processes mentionedabove: regularization of label assignments and gradually enforcing unambigous decisions. Wepoint out again that this is a direct consequence of the ‘spherical geometry’ (positive curvature)underlying the assignment ﬂow. ontinuous-Domain Assignment Flow continuous-domain variational formula-tion in Section 4. We prove well-posedness which is not immediate due to nonconvexity, andwe propose an algorithm that computes a locally optimal assignment by solving a sequence ofsimple linear PDEs, with changing right-hand side and subject to a simple convex constraint. Anumerical example demonstrates that our PDE-based approach reproduces results obtained withsolving the original formulation of the assignment ﬂow using completely different numericaltechniques [

ZSPS19 ]. We hope that the simplicity of our PDE-approach and the direct connec-tion to a smooth geometric setting will stimulate future work on learning, from an optimal controlpoint-of-view [

EHL19, LT19 ]. We conclude by a formal derivation of a PDE that characterizesglobal minimizers of the nonconvex objective function (Section 4.4) and by outlining future re-search in Section 5.

We denote the standard basis of R n by B n := { e , . . . , e n } . (2.1) | · | applied to a ﬁnite set denotes its cardinality, i.e. |B n | = n . We set [ n ] = { , , . . . , n } for n ∈ N and n = (1 , , . . . , (cid:62) ∈ R n . The symbols I = [ n ] , J = [ c ] , n, c ∈ N (2.2)will speciﬁcally index data points and classes (labels), respectively. (cid:107) · (cid:107) denotes the Euclideanvector norm and the Frobenius matrix norm induced by the inner product (cid:107) A (cid:107) = (cid:104) A, A (cid:105) / =tr( A (cid:62) A ) / . All other norms will be indicated by a corresponding subscript. For a given matrix A ∈ R n × c , A i , i ∈ [ n ] denote the row vectors, A j , j ∈ [ c ] denote the column vectors, and A (cid:62) ∈ R c × n the transpose matrix. S n + denotes the set of all symmetric n × n matrices withnonnegative entries. ∆ n = { p ∈ R n + : (cid:104) n , p (cid:105) = 1 } (2.3)denotes the probability simplex. There will be no danger to confuse it with the Laplacian differ-ential operator ∆ that we use without subscript. For strictly positive vectors p > , we efﬁcientlydenote componentwise subdivision by vp . Likewise, we set pv = ( p v , . . . , p n v n ) (cid:62) . The ex-ponential function applies componentwise to vectors (and similarly for log ) and will always bedenoted by e v = ( e v , . . . , e v n ) (cid:62) , in order not to confuse it with the exponential maps (2.18).Strong and weak convergence of a sequence ( f n ) is written as f n → f and f n (cid:42) f , respec-tively. ψ S denotes the indicator function of some set S : ψ S ( i ) = 1 if i ∈ S and ψ S ( i ) = 0 otherwise. δ C denotes the indicator function from the optimization point of view: δ C ( f ) = 0 if f ∈ C and δ C ( f ) = + ∞ otherwise. We sketch the assignment ﬂow as introduced by [ ˚APSS17 ] and refer to the recent survey [

Sch19 ]for more background and a review of recent related work.

F. Savarino and C. Schn¨orr

Assignment Manifold

Let ( F , d F ) be a metric space and F n = { f i ∈ F : i ∈ I} , |I| = n. (2.4)given data. Assume that a predeﬁned set of prototypes F ∗ = { f ∗ j ∈ F : j ∈ J } , |J | = c. (2.5)is given. Data labeling denotes the assignments j → i, f ∗ j → f i (2.6)of a single prototype f ∗ j ∈ F ∗ to each data point f i ∈ F n . The set I is assumed to form the vertexset of an undirected graph G = ( I , E ) which deﬁnes a relation E ⊂ I × I and neighborhoods N i = { k ∈ I : ik ∈ E} ∪ { i } , (2.7)where ik is a shorthand for the unordered pair (edge) ( i, k ) = ( k, i ) . We require these neighbor-hoods to satisfy the relations k ∈ N i ⇔ i ∈ N k , ∀ i, k ∈ I . (2.8)The assignments (labeling) (2.6) are represented by matrices in the set W ∗ = { W ∈ { , } n × c : W c = n } (2.9)with unit vectors W i , i ∈ I , called assignment vectors , as row vectors. These assignment vectorsare computed by numerically integrating the assignment ﬂow below (2.28), in the followingelementary geometric setting. The integrality constraint of (2.9) is relaxed and vectors W i = ( W i , . . . , W ic ) (cid:62) ∈ S , i ∈ I , (2.10)that we still call assignment vectors , are considered on the elementary Riemannian manifold ( S , g ) , S = { p ∈ ∆ c : p > } (2.11)with S = 1 c c ∈ S , ( barycenter ) (2.12)tangent space T = { v ∈ R c : (cid:104) c , v (cid:105) = 0 } (2.13)and tangent bundle T S = S × T , orthogonal projection Π : R c → T , Π = I − S (cid:62) (2.14)and the Fisher-Rao metric g p ( u, v ) = (cid:88) j ∈J u j v j p j , p ∈ S , u, v ∈ T . (2.15)Based on the linear map R p : R c → T , R p = Diag( p ) − pp (cid:62) , p ∈ S (2.16) ontinuous-Domain Assignment Flow R p = R p Π = Π R p , (2.17)exponential maps and their inverses are deﬁned as Exp :

S × T → S , ( p, v ) (cid:55)→ Exp p ( v ) = pe vp (cid:104) p, e vp (cid:105) , (2.18 a ) Exp − p : S → T , q (cid:55)→ Exp − p ( q ) = R p log qp , (2.18 b ) exp p : T → S , exp p = Exp p ◦ R p , (2.18 c ) exp − p : S → T , exp − p ( q ) = Π log qp . (2.18 d )Applying the map exp p to a vector in R c = T ⊕ R1 does not depend on the constant componentof the argument, due to (2.17). Remark 2.1

The map

Exp corresponds to the e-connection of information geometry [

AN00 ],rather than to the exponential map of the Riemannian connection. Accordingly, the afﬁne geodesics (2.18 a ) are not length-minimizing. But they provide an close approximation [ ˚APSS17, Prop. 3 ]and are more convenient for numerical computations. The assignment manifold is deﬁned as ( W , g ) , W = S × · · · × S . ( n = |I| factors ) (2.19)We identify W with the embedding into R n × c W = { W ∈ R n × c : W c = n and W ij > for all i ∈ [ n ] , j ∈ [ c ] } . (2.20)Thus, points W ∈ W are row-stochastic matrices W ∈ R n × c with row vectors W i ∈ S , i ∈ I that represent the assignments (2.6) for every i ∈ I . We set T := T × · · · × T ( n = |I| factors ) . (2.21)Due to (2.20), the tangent space T can be identiﬁed with T = { V ∈ R n × c : V c = 0 } . (2.22)Thus, V i ∈ T for all row vectors of V ∈ R n × c and i ∈ I . All mappings deﬁned above factorizein a natural way and apply row-wise, e.g. Exp W = (Exp W , . . . , Exp W n ) etc. Assignment Flow

Based on (2.4) and (2.5), the distance vector ﬁeld D F ; i = (cid:0) d F ( f i , f ∗ ) , . . . , d F ( f i , f ∗ c ) (cid:1) (cid:62) , i ∈ I (2.23)is well-deﬁned. These vectors are collected as row vectors of the distance matrix D F ∈ S n + . (2.24) F. Savarino and C. Schn¨orr

The likelihood map and the likelihood vectors , respectively, are deﬁned as L i : S → S , L i ( W i ) = exp W i (cid:16) − ρ D F ; i (cid:17) = W i e − ρ D F ; i (cid:104) W i , e − ρ D F ; i (cid:105) , i ∈ I , (2.25)where the scaling parameter ρ > is used for normalizing the a-prior unknown scale of thecomponents of D F ; i that depends on the speciﬁc application at hand.A key component of the assignment ﬂow is the interaction of the likelihood vectors through geometric averaging within the local neighborhoods (2.7). Speciﬁcally, using weights ω ik > for all k ∈ N i , i ∈ I with (cid:88) k ∈N i w ik = 1 , (2.26)the similarity map and the similarity vectors , respectively, are deﬁned as S i : W → S , S i ( W ) = Exp W i (cid:16) (cid:88) k ∈N i w ik Exp − W i (cid:0) L k ( W k ) (cid:1)(cid:17) , i ∈ I . (2.27)If Exp W i were the exponential map of the Riemannian (Levi-Civita) connection, then the argu-ment inside the brackets of the right-hand side would just be the negative Riemannian gradientwith respect to W i of the center of mass objective function comprising the points L k , k ∈ N i ,i.e. the weighted sum of the squared Riemannian distances between W i and L k [ Jos17, Lemma6.9.4 ]. In view of Remark 2.1, this interpretation is only approximately true mathematically, butstill correct informally: S i ( W ) moves W i towards the geometric mean of the likelihood vectors L k , k ∈ N i . Since Exp W i (0) = W i , this mean is equal to W i if the aforementioned gradientvanishes.The assignment ﬂow is induced by the locally coupled system of nonlinear ODEs ˙ W = R W S ( W ) , W (0) = W , (2.28 a ) ˙ W i = R W i S i ( W ) , W i (0) = S , i ∈ I , (2.28 b )where W ∈ W denotes the barycenter of the assignment manifold (2.19). The solution curve W ( t ) ∈ W is numerically computed by geometric integration [ ZSPS19 ] and determines a label-ing W ( T ) ∈ W ∗ for sufﬁciently large T after a trivial rounding operation. We record background material that will be used in Section 4.

Sobolev Spaces

We list few basic deﬁnitions and ﬁx the corresponding notation [

Zie89, ABM14 ]. Throughoutthis section Ω ⊂ R d denotes an open bounded domain.We denote the inner product and the norm of functions f, g ∈ L (Ω) by ( f, g ) Ω = (cid:90) Ω f gdx, (cid:107) f (cid:107) Ω = ( f, f ) / . (2.29)Functions f and f are equivalent and identiﬁed whenever they merely differ pointwise on aLebesque-negligible set of measure zero. f and f then are said to be equal a.e. (almost ev-erywhere). H (Ω) = W , (Ω) denotes the Sobolev space of functions f with square-integrable ontinuous-Domain Assignment Flow D α f up to order one. H (Ω) is a Hilbert space with inner product and normdenoted by ( f, g ) = (cid:88) | α |≤ ( D α f, D α g ) Ω , (cid:107) f (cid:107) = (cid:16) (cid:88) | α |≤ (cid:107) D α f (cid:107) (cid:17) / . (2.30) Lemma 2.2 ([Zie89, Cor. 2.1.9]) If Ω is connected, u ∈ H (Ω) and Du = 0 a.e. on Ω , then u is equivalent to a constant function on Ω . The closure in H (Ω) of the set of test functions C ∞ c (Ω) that are compactly supported on Ω , isthe Sobolev space H (Ω) = C ∞ c (Ω) ⊂ H (Ω) . (2.31)It contains all functions in H (Ω) whose boundary values on ∂ Ω (in the sense of traces) vanish.The space H (Ω; R c ) , ≤ c ∈ N contains vector-valued functions f whose component func-tions f i , i ∈ [ c ] are in H (Ω) . For notational efﬁciency, we denote the norm of f ∈ H (Ω; R c ) by (cid:107) f (cid:107) = (cid:16) (cid:88) i ∈ [ c ] (cid:107) f i (cid:107) (cid:17) / (2.32)as in the scalar case (2.30). It will be clear from the context if f is scalar- or vector-valued.The compactness theorem of Rellich-Kondrakov [ ABM14, Thm. 5.3.3 ] says that the canon-ical embedding H (Ω) (cid:44) → L (Ω) (2.33)is compact, i.e. every bounded subset in H (Ω) is relatively compact in L (Ω) . This extends tothe vector-valued case H (Ω; R c ) (cid:44) → L (Ω; R c ) (2.34)since H (Ω; R c ) is isomorphic to H (Ω) × · · · × H (Ω) and likewise for L (Ω; R c ) . Thedual space of H (Ω) is commonly denoted by H − (Ω) = (cid:0) H (Ω) (cid:1) . Accordingly, we set H − (Ω; R c ) = (cid:0) H (Ω; R c ) (cid:1) (cid:48) . Weak Convergence Properties, Variational Inequalities

We list few further basic facts [

Zei85, Prop. 38.2 ], [

ABM14, Prop. 2.4.6 ]. Proposition 2.3

The following assertions hold in a Banach space X .(a) A closed convex subset C ⊂ X is weakly closed, i.e. a sequence ( f n ) n ∈ N ⊂ C that weaklyconverges to f implies f ∈ C .(b) If X is reﬂexive (in particular, if X is a Hilbert space), then every bounded sequence in X has a weakly convergent subsequence.(c) If f n weakly converges to f , then ( f n ) n ∈ N is bounded and (cid:107) f (cid:107) X ≤ lim inf n →∞ (cid:107) f n (cid:107) X . (2.35) F. Savarino and C. Schn¨orr

The following theorem states conditions for minimizers of the functional to satisfy a correspond-ing variational inequality.

Theorem 2.4 ([Zei85, Thm. 46.A(a)])

Let F : C → R be a functional on the convex nonemptyset C of a real locally convex space X , and let b ∈ X (cid:48) be a given element. Suppose the Gateaux-derivative F (cid:48) exists on C . Then any solution f of min f ∈ C (cid:8) F ( f ) − (cid:104) b, f (cid:105) X (cid:48) × X (cid:9) , (2.36) satisﬁes the variational inequality (cid:104) F (cid:48) ( f ) − b, h − f (cid:105) X (cid:48) × X ≥ , for all h ∈ C. (2.37) Let J : W → R be a smooth function on the assignment manifold (2.19) and denote the Rie-mannian gradient of J at W ∈ W induced by the Fisher-Rao metric (2.15) by grad J ( W ) ∈ T .In view of the embedding (2.20), we can also compute the Euclidean gradient of J denoted by ∂ J ( W ) ∈ R n × c . These two gradients are related by [ ˚APSS17, Prop. 1 ] grad J ( W ) = R W ∂ J ( W ) , W ∈ W , (3.1)where R W : R n × c → T is the product map obtained by applying R W i from (2.16) to every rowvector indexed by i ∈ I . This relation raises the natural question: Is there a potential J suchthat the assignment ﬂow (2.28) is a Riemannian gradient descent ﬂow with respect to J , i.e. does R W S ( W ) = − grad J ( W ) hold?We next show that such a potential does not exist in general (Section 3.1). However, in Sec-tion 3.2, we derive a novel representation by decoupling the assignment ﬂow into two separateﬂows, where one ﬂow steers the other and in this sense dominates the assignment ﬂow. Under theadditional assumption that the weights ω ij of the similarity map S ( W ) in (2.27) are symmetric,we show that the dominating ﬂow is a Riemannian gradient ﬂow induced by a potential. Thisresult is the basis for the continuous-domain formulation of the assignment ﬂow studied in thesubsequent sections. We next show (Theorem 3.4) that under some mild assumptions on D F (2.24) which are alwaysfulﬁlled in practice, no potential J exists that induces the assignment ﬂow. In order to prove thisresult, we ﬁrst derive some properties of the mapping exp given by (2.18 c ) as well as explicitexpressions of the differential dS ( W ) of the similarity map (2.27) and its transpose dS ( W ) (cid:62) with respect to the standard Euclidean structure on R n × c . Lemma 3.1

The following properties hold for exp p and its inverse (2.18 c ) , (2.18 d ) . (1) For every p ∈ S the map exp p : R c → S can be expressed by v (cid:55)→ exp p ( v ) = pe v (cid:104) p, e v (cid:105) . (3.2) ontinuous-Domain Assignment Flow Its restriction to T , exp p : T → S , is a diffeomorphism. The differential of exp p and exp − p at v ∈ T and q ∈ S , respectively, are given by d exp p ( v )[ u ] = R exp p ( v ) [ u ] and d exp − p ( q )[ u ] = Π (cid:104) uq (cid:105) for all u ∈ T . (3.3) (2) Let p, q ∈ S . Then

Exp − p ( q ) = R p exp − p ( q ) . (3) Let q ∈ S . If the linear map R q from (2.16) is restricted to T , then R q : T → T is a linearisomorphism with inverse given by ( R q | T ) − ( u ) = Π (cid:104) uq (cid:105) for all u ∈ T . (4) If R c is viewed as an abelian group, then exp : R c × S → S given by ( v, p ) (cid:55)→ exp p ( v ) deﬁnes a Lie-group action, i.e. exp p ( v + u ) = exp exp p ( u ) ( v ) and exp p (0) = p for all v, u ∈ T and p ∈ S . (3.4) Furthermore, the following identities follow for all p, q, a ∈ S and v ∈ R c exp p ( v ) = exp q (cid:0) v + exp − q ( p ) (cid:1) (3.5 a ) exp − q ( p ) = − exp − p ( q ) (3.5 b ) exp − q ( a ) = exp − p ( a ) − exp − p ( q ) . (3.5 c ) Proof (1): We have

Exp p ( v + λp ) = Exp p ( v ) for every p ∈ S , v ∈ T and λ ∈ R , as a simplecomputation using deﬁnition (2.18 a ) of Exp p directly shows. Therefore, for every v ∈ T exp p ( v ) = Exp p ( R p v ) = Exp p ( pv − (cid:104) v, p (cid:105) p ) = Exp p ( pv ) = pe v (cid:104) p, e v (cid:105) . (3.6)If we restrict exp p to T , then an inverse is explicitly given by (2.18 c ). The differentials (3.3)result from a standard computation.(2): The formula is a direct consequence of the formulas for Exp − p and exp − p given in(2.18 b ) and (2.18 c ), together with the fact (2.17).(3): Fix any p ∈ S and set v q := exp − p ( q ) for q ∈ S . Since exp p : T → S is a dif-feomorphism, the differential d exp p ( v q ) : T → T is an isomorphism. By (3.3), we have R q [ u ] = R exp p ( v q ) [ u ] = d exp p ( v q )[ u ] for all u ∈ T , showing that R q is an isomorphismwith the corresponding inverse.(4): Properties (3.4) deﬁning the group action are directly veriﬁed using (3.2). Now, suppose p, q, a ∈ S and v ∈ R c are arbitrary. Since exp q : T → S is a diffeomorphism, we have p = exp q (cid:0) exp − q ( p ) (cid:1) and by the group action property exp p ( v ) = exp exp q (cid:0) exp − q ( p ) (cid:1) ( v ) = exp q (cid:0) v + exp − q ( p ) (cid:1) , (3.7)which proves (3.5 a ). To show (3.5 b ), set v a := exp − p ( a ) and substitute this vector into (3.5 a ).Applying exp − q to both sides then gives exp − q ( a ) = exp − q (cid:0) exp p ( v a ) (cid:1) = v a + exp − q ( p ) = exp − p ( a ) + exp − q ( p ) . (3.8)Setting a = q in this equation, we obtain (3.5 b ) from − q ( q ) = exp − p ( q ) + exp − q ( p ) . (3.9)Using exp − q ( p ) = − exp − p ( q ) in (3.8) yields (3.5 c ).0 F. Savarino and C. Schn¨orr

Lemma 3.2

The i -th component of the similarity map S ( W ) deﬁned by (2.27) can equivalentlybe expressed as S i ( W ) = exp S (cid:16) (cid:88) j ∈N i ω ij (cid:16) exp − S ( W j ) − ρ D F ; j (cid:17)(cid:17) for all i ∈ I and W ∈ W . (3.10) Proof

Consider the expression

Exp − W i (cid:0) L j ( W j ) (cid:1) in the sum of the deﬁnition (2.27) of S i ( W ) .Using (3.2) and (3.5 a ), the likelihood (2.25) can be expressed as L j ( W j ) = exp W j (cid:16) − ρ D F ; j (cid:17) = exp W i (cid:16) exp − W i ( W j ) − ρ D F ; j (cid:17) . (3.11)In the following, we set V k = exp − S ( W k ) for all k ∈ I . (3.12)With this and (3.5 c ), we have exp − W i ( W j ) = exp − S ( W j ) − exp − S ( W i ) = V j − V i . (3.13)The two previous identities and Lemma 3.1(2) give Exp − W i (cid:0) L j ( W j ) (cid:1) = R W i (cid:104) exp − W i (cid:0) L j ( W j ) (cid:1)(cid:105) = R W i (cid:104) exp − W i ( W j ) − ρ D F ; j (cid:105) (3.14 a ) = R W i (cid:104) V j − V i − ρ D F ; j (cid:105) . (3.14 b )The sum over the neighboring nodes N i in the deﬁnition (2.27) of S i ( W ) can therefore be rewrit-ten as (cid:88) j ∈N i ω ij Exp − W i (cid:0) L j ( W j ) (cid:1) = (cid:88) j ∈N i ω ij R W i (cid:104) V j − V i − ρ D F ; j (cid:105) (3.15 a ) = R W i (cid:104) − V i + (cid:88) j ∈N i ω ij (cid:16) V j − ρ D F ; j (cid:17)(cid:105) , (3.15 b )where we used (cid:80) j ∈N i ω ij = 1 for the last equation. Setting Y i := (cid:80) j ∈N i ω ij (cid:16) V j − ρ D F ; j (cid:17) ,we then have S i ( W ) = Exp W i (cid:0) R W i (cid:2) − V i + Y i (cid:3)(cid:1) = exp W i (cid:0) − V i + Y i (cid:1) = exp S (cid:0) Y i (cid:1) , (3.16)where the last equality again follows from (3.5 a ) together with the deﬁnition (3.12) of V i . Lemma 3.3

The i -th component of the differential of the similarity map S ( W ) ∈ W is given by dS i ( W )[ X ] = (cid:88) j ∈N i ω ij R S i ( W ) (cid:20) X j W j (cid:21) for all X ∈ T and i ∈ I . (3.17) Furthermore, the i -th component of the adjoint differential dS ( W ) (cid:62) : T → T with respect tothe standard Euclidean inner product on T ⊂ R n × c is given by dS i ( W ) (cid:62) [ X ] = (cid:88) j ∈N i ω ji Π (cid:20) R S j ( W ) X j W i (cid:21) for every X ∈ T and i ∈ I . (3.18) ontinuous-Domain Assignment Flow Proof

Deﬁne the map F i : W → R c by F i ( W ) := (cid:80) j ∈N i ω ij (cid:16) exp − S ( W j ) − ρ D F ; j (cid:17) ∈ R c for all W ∈ W . Let γ : ( − ε, ε ) → W be a smooth curve, with ε > , γ (0) = W and ˙ γ (0) = X .By (3.3), we then have dF i ( W )[ X ] = ddt F i ( γ ( t )) (cid:12)(cid:12) t =0 = (cid:88) j ∈N i ω ij ddt exp − S ( γ j ( t )) (cid:12)(cid:12) t =0 = (cid:88) j ∈N i ω ij Π (cid:104) X j W j (cid:105) . (3.19)Due to Lemma 3.2, we can express the i -th component of the similarity map as S i ( W ) =exp S (cid:0) F i ( W ) (cid:1) . Therefore, the differential of S i is given by dS i ( W )[ X ] = d exp S ( F i ( W )) (cid:2) dF i ( W )[ X ] (cid:3) = R exp S ( F i ( W )) (cid:2) dF i ( W )[ X ] (cid:3) (3.20 a ) = R S i ( W ) (cid:104) (cid:88) j ∈N i ω ij Π (cid:104) X j W j (cid:105)(cid:105) = (cid:88) j ∈N i ω ij R S i ( W ) (cid:104) X j W j (cid:105) , (3.20 b )where we used R S i ( W ) Π = R S i ( W ) from (2.17) to obtain the last equation.Now let W ∈ W and X, Y ∈ T be arbitrary. By assumption on the neighborhood structure(2.7), we have j ∈ N i if and only if i ∈ N j , i.e. ψ N i ( j ) = ψ N j ( i ) . Since R S i ( W ) ∈ R c × c is asymmetric matrix, we obtain (cid:68) dS ( W )[ X ] , Y (cid:69) = (cid:88) i ∈I (cid:68) dS i ( W )[ X ] , Y i (cid:69) = (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:68) R S i ( W ) (cid:104) X j W j (cid:105) , Y i (cid:69) (3.21 a ) = (cid:88) i ∈I (cid:88) j ∈I ψ N i ( j ) ω ij (cid:68) X j W j , R S i ( W ) [ Y i ] (cid:69) = (cid:88) i ∈I (cid:88) j ∈I ψ N j ( i ) ω ij (cid:68) X j , R S i ( W ) [ Y i ] W j (cid:69) (3.21 b ) = (cid:88) j ∈I (cid:88) i ∈N j ω ij (cid:68) X j , Π (cid:104) R S i ( W ) [ Y i ] W j (cid:105)(cid:69) = (cid:88) j ∈I (cid:68) X j , (cid:88) i ∈N j ω ij Π (cid:104) R S i ( W ) [ Y i ] W j (cid:105)(cid:69) . (3.21 c )On the other hand, we have (cid:68) dS ( W )[ X ] , Y (cid:69) = (cid:68) X, dS ( W ) (cid:62) [ Y ] (cid:69) = (cid:88) j ∈I (cid:68) X j , dS j ( W ) (cid:62) [ Y ] (cid:69) . (3.22)Because (3.21) and 3.22 hold for all X, Y ∈ T , the formula for dS i ( W ) (cid:62) [ X ] is proven. Theorem 3.4

Suppose c ≥ and there exists a node i ∈ I such that the distance vector D F ; i is not constant: D F ; i / ∈ R1 . Then no potential J : W → R exists satisfying R W S ( W ) = − grad J ( W ) , i.e. the assignment ﬂow (2.28) is not a Riemannian gradient descent ﬂow. Proof

By (2.17), we have R W S ( W ) = R W Π S ( W ) and R W : T → T is a linear isomor-phism (Lemma 3.1(3)). Therefore, the question of existence of a potential J : W → R for theassignment ﬂow (2.28) can be transferred to the Euclidean setting by applying ( R W | T ) − toboth sides of the equation R W S ( W ) = grad J ( W ) , i.e. R W S ( W ) = − grad J ( W ) = − R W ∂ J ( W ) ⇔ Π S ( W ) = − Π ∂ J ( W ) ∈ T . (3.23)If such a potential J exists, then the negative Hessian of J is given by − Π Hess J ( W ) = d (cid:0) − Π ∂ J (cid:1) ( W ) = d (Π ◦ S )( W ) = Π dS ( W ) = dS ( W ) , (3.24)where the last equation follows from dS ( W ) : T → T . Furthermore, Hess J ( W ) and therefore2 F. Savarino and C. Schn¨orr also dS ( W ) must be symmetric with respect to the Euclidean scalar product on T . Hence, inorder to prove that a potential cannot exist, we show that dS ( W ) is not symmetric at every point W ∈ W . To this end, we construct a W ∈ W and X ∈ T with dS ( W )[ X ] − dS ( W ) (cid:62) [ X ] (cid:54) = 0 .It sufﬁces to show dS i ( W )[ X ] − dS i ( W ) (cid:62) [ X ] (cid:54) = 0 for some row index i ∈ I . (3.25)To simplify notation, we write D i instead of D F ; i in the remainder of the proof. Due to thehypothesis, we have D i = D F ; i (cid:54) = R1 . (3.26)Let k, l ∈ [ c ] be indices such that D ik = min r ∈ [ c ] D ir and D il = max r ∈ [ c ] D ir . (3.27)Relation (3.26) implies D ik < D il and e − ρ D ik > e − ρ D il . (3.28)Deﬁne u = e k − e l ∈ T , e k , e l ∈ B c . (3.29)Since c ≥ , there is also a point p ∈ S with p (cid:54) = S and p k = p l , (3.30)e.g. by choosing < α < c and setting p k = p l = α and p r = ( c − − (1 − α ) for r (cid:54) = k, l .With these choices, we deﬁne the point W p ∈ W , W pj = exp p (cid:16) ρ D j (cid:17) for all j ∈ I . (3.31)Also, set v := exp − S ( p ) . Then W pj = exp S (cid:0) v + ρ D j (cid:1) by (3.5 a ) and Lemma 3.2 implies S i ( W ) = exp S (cid:16) (cid:88) j ∈N ( i ) ω ij (cid:0) exp − S ( W pj ) − ρ D j (cid:1)(cid:17) (3.32 a ) = exp S (cid:16) (cid:88) j ∈N ( i ) ω ij v (cid:17) = exp c ( v ) = p, (3.32 b )for all i ∈ I . Now, deﬁne X u ∈ T with X uk = (cid:40) u ∈ T , if k = iX uj = 0 , if k (cid:54) = i. (3.33)Using the expressions for dS i ( W P ) and dS i ( W p ) (cid:62) from Lemma 3.3, we obtain dS i ( W p )[ X u ] − dS i ( W p ) (cid:62) [ X u ] = ω ii R S i ( W p ) (cid:104) X ui W pi (cid:105) − ω ii Π (cid:104) R S i ( W p ) X ui W pi (cid:105) (3.34 a ) (3.32) = ω ii R p (cid:104) u exp p ( ρ D i ) (cid:105) − ω ii Π (cid:104) R p u exp p ( ρ D i ) (cid:105) (3.34 b ) (3.2) = ω ii (cid:104) p, e ρ D i (cid:105) (cid:16) R p (cid:104) up e − ρ D i (cid:105) − Π (cid:104) e − ρ D i p R p u (cid:105)(cid:17) . (3.34 c ) ontinuous-Domain Assignment Flow ω ii (cid:104) p, e ρ D i (cid:105) > , we only have to check the expression inside the brackets. As for the ﬁrstterm, using (2.16), we have R p (cid:104) up e − ρ D i (cid:105) = ue − ρ D i − (cid:10) u, e − ρ D i (cid:11) p. (3.35 a )Setting a := (cid:0) (cid:104) e − ρ D i , S (cid:105) − e − ρ D i (cid:1) , we obtain for the second term Π (cid:104) e − ρ D i p R p u (cid:105) = Π (cid:104) e − ρ D i u − (cid:104) u, p (cid:105) e − ρ D i (cid:105) = e − ρ D i u − (cid:104) u, e − ρ D i (cid:105) S + (cid:104) u, p (cid:105) a. (3.35 b )Thus, the term inside the brackets reads R p (cid:104) p ue − ρ D i (cid:105) − Π (cid:104) e − ρ D i p R p u (cid:105) = − (cid:10) u, e − ρ D i (cid:11) p + (cid:104) u, e − ρ D i (cid:105) S − (cid:104) u, p (cid:105) a (3.36 a ) = (cid:104) u, e − ρ D i (cid:105) (cid:0) S − p (cid:1) − (cid:104) u, p (cid:105) a. (3.36 b )(3.29) and (3.30) imply (cid:104) u, e − ρ D i (cid:105) = e − ρ D ik − e − ρ D il > and (cid:104) u, p (cid:105) = p k − p l = 0 (3.37)such that we can conclude (cid:104) u, e − ρ D i (cid:105) (cid:0) S − p (cid:1) − (cid:104) u, p (cid:105) a = ( e − ρ D ik − e − ρ D il ) (cid:0) S − p (cid:1) (cid:54) = 0 . (3.38)This proves (3.25) and consequently the theorem. Even though Theorem 3.4 says that no potential exists for the assignment ﬂow in general, wereveal in this section a ‘hidden’ potential ﬂow under an additional assumption. To this end, wedecouple the assignment ﬂow into two components and show that one component depends onthe second one. The dominating second one, therefore, provides a new parametrization of theassignment ﬂow. Assuming symmetry of the averaging matrix deﬁned below by (3.39), the dom-inating ﬂow becomes a Riemannian gradient descent ﬂow. The corresponding potential deﬁnedon a continuous domain will be studied in subsequent sections.For notational efﬁciency, we collect all weights (2.26) into the averaging matrix Ω ω ∈ R n × n with Ω ωij := ψ N i ( j ) ω ij = (cid:40) ω ij if j ∈ N i , else , for i, j ∈ I . (3.39) Ω ω encodes the spatial structure of the graph and the weights. For an arbitrary matrix M ∈ R n × c ,the average of its row vectors using the weights indexed by the neighborhood N i is given by (cid:88) k ∈N i ω ik M k = (cid:88) k ∈I Ω ωik M k = M (cid:62) Ω ωi . (3.40)Thus, all row vector averages are given as row vectors of the matrix Ω ω M .We now introduce a new representation of the assignment ﬂow.4 F. Savarino and C. Schn¨orr

Proposition 3.5

The assignment ﬂow (2.28) is equivalent to the system ˙ W = R W S with W (0) = W (3.41 a ) ˙ S = R S [Ω S ] with S (0) = S ( W ) . (3.41 b ) Remark 3.6

We observe that the ﬂow W ( t ) is completely determined by S ( t ) . In the following,we refer to the dominating part (3.41 b ) as the S -ﬂow. Proof

Let W ( t ) be a solution of the assignment ﬂow, i.e. ˙ W i = R W i S i ( W ) for all i ∈ I . Set S ( t ) := S ( W ( t )) . Then (3.41 a ) is immediate from the assumption on W . Using the expressionfor dS i ( W ) from Lemma 3.3 gives ˙ S i = ddt S ( W ) i = dS i ( W )[ ˙ W ] = (cid:88) j ∈N i ω ij R S i ( W ) (cid:104) ˙ W j W j (cid:105) . (3.42)Since W solves the assignment ﬂow and R S i ( W ) = R S i ( W ) Π by (2.17) with ker(Π ) = R1 c ,it follows using the explicit expresssion (2.16) of R S i ( W ) that R S i ( W ) (cid:104) ˙ W j W j (cid:105) = R S i ( W ) (cid:104) R W j S j ( W ) W j (cid:105) = R S i ( W ) (cid:104) S j ( W ) − (cid:104) W j , S j ( W ) (cid:105) c (cid:105) (3.43 a ) = R S i ( W ) (cid:2) S j ( W ) (cid:3) . (3.43 b )Back-substitution of this identity into (3.42), pulling the linear map R S i ( W ) out of the sum andkeeping S i ( W ) = S i in mind, results in ˙ S i = R S i ( W ) (cid:2) S j ( W ) (cid:3) = R S i (cid:104) (cid:88) j ∈N i ω ij S j (cid:105) = R S i [ S (cid:62) Ω ωi ] for all i ∈ I . (3.44)Collecting these vectors as row vectors of the matrix ˙ S gives (3.41 b ). Remark 3.7

Henceforth, we write S for the S -ﬂow S to stress the underlying connection to theassignment ﬂow and to simplify the notation. We next show that the S -ﬂow which essentially determines the assignment ﬂow (Remark 3.7)becomes a Riemannian descent ﬂow under the additional assumption that the averaging matrix(3.39) is symmetric. Proposition 3.8

Suppose the weights deﬁning the similarity map in (2.27) are symmetric, i.e. (Ω ω ) (cid:62) = Ω ω . Then the S -ﬂow (3.41 b ) is a Riemannian gradient decent ﬂow ˙ S = − grad J ( S ) ,induced by the potential J ( S ) := − (cid:104) S, Ω ω S (cid:105) , S ∈ W . (3.45) Proof

Let γ : ( − ε, ε ) → W , ε > , be any smooth curve with ˙ γ (0) = V ∈ R n × c and γ (0) = S .By the symmetry of Ω ω , we have (cid:104) ∂ J ( S ) , V (cid:105) = dJ ( S )[ V ] = ddt J ( γ ( t )) (cid:12)(cid:12) t =0 = −(cid:104) Ω ω S, V (cid:105) for all V ∈ R n × c . Therefore, ∂ J ( S ) = − Ω ω S . Thus, the Riemannian gradient is given by grad J ( S ) = R S [ ∂ J ( S )] = − R S [Ω ω S ] . ontinuous-Domain Assignment Flow L G = I n − Ω ω , (3.46)where I n ∈ R n × n is the identity matrix. Since I n = Diag(Ω ω n ) by (2.26) is the degreematrix of the symmetric averaging matrix Ω ω , L G can be regarded as Laplacian (matrix) ofthe underlying undirected weighted graph G = ( V , E ) . For the analysis of the S -ﬂow it will beconvenient to rewrite the potential (3.45) accordingly. Proposition 3.9

Under the assumption of Proposition 3.8, the potential (3.45) can be written inthe form J ( S ) = 12 (cid:104) S, L G S (cid:105) − (cid:107) S (cid:107) = 14 (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:107) S i − S j (cid:107) − (cid:107) S (cid:107) . (3.47) The matrix L G is symmetric, positive semideﬁnite and L G n = 0 . Proof

We have J ( S ) = − (cid:104) S, (Ω ω − I n ) S (cid:105) + (cid:104) S, S (cid:105) = ( (cid:104) S, L G S (cid:105) − (cid:107) S (cid:107) ) . Thus, we focuson the sum of (3.47).First, note that (cid:107) S j − S i (cid:107) = (cid:104) S j , S j − S i (cid:105) + (cid:104) S i , S i − S j (cid:105) . Since ψ N i ( j ) = ψ N j ( i ) and ω ij = ω ji by assumption, we have (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S j , S j − S i (cid:105) = (cid:88) i,j ∈I ψ N i ( j ) ω ij (cid:104) S j , S j − S i (cid:105) = (cid:88) i,j ∈I ψ N j ( i ) ω ji (cid:104) S j , S j − S i (cid:105) (3.48 a ) = (cid:88) j ∈I (cid:88) i ∈N j ω ji (cid:104) S j , S j − S i (cid:105) = (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S i , S i − S j (cid:105) , (3.48 b )where the last equality follows by renaming the indices i and j . Thus, using (2.26), (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:107) S i − S j (cid:107) = (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S j , S j − S i (cid:105) + (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S i , S i − S j (cid:105) (3.49 a ) = 2 (cid:88) i ∈I (cid:88) j ∈N i ω ij (cid:104) S i , S i − S j (cid:105) = 2 (cid:88) i ∈I (cid:68) S i , S i − (cid:88) j ∈N i ω ij S j (cid:69) (3.49 b ) = 2 (cid:88) i ∈I (cid:104) S i , ( LS ) i (cid:105) = 2 (cid:104) S, LS (cid:105) . (3.49 c )The properties of L G follow from the symmetry of Ω ω , nonnegativity of the quadratic form (3.49)and deﬁnition (3.46). In this section, we study a continuous-domain variational formulation of the potential of Propo-sition 3.9. We conﬁne ourselves to uniform weights (2.26) and neighborhoods (2.7) that only For undirected graphs, the graph Laplacian is commonly deﬁned by the weighted adjacency matrices with diagonal entries 0, whereas Ω ωii = ω ii >

0. The diagonal entries do not aﬀect thequadratic form (3.47) , however. F. Savarino and C. Schn¨orr contain the nearest neighbors of each vertex i , such that L G becomes the discretized ordinaryLaplacian. As a result, we consider the problem to minimize the functional E α : H ( M ; R c ) → R , (4.1 a ) E α ( S ) := (cid:90) M (cid:107) DS ( x ) (cid:107) − α (cid:107) S ( x ) (cid:107) d x , α > . (4.1 b )Throughout this section, M ⊂ R is a simply-connected bounded open subset in the Euclideanplane. Parameter α controls the interaction between regularization and enforcing integrality when S ( x ) , x ∈ M is restricted to values in the probability simplex.We prove well-posedness for vanishing (Section 4.1) and Dirichlet boundary conditions (Sec-tion 4.2), respectively, and specify explicitly the set of minimizers in the former case. The gra-dient descent ﬂow corresponding to the latter case, initialized by means of given data and withparameter value α = 1 , may be seen as continuous-domain extension of the assignment ﬂow,that is parametrized according to (3.5) and operates at the smallest spatial scale in terms of thesize |N i | of uniform neighborhoods (2.7) (in the discrete formulation (2.28): nearest neighboraveraging). We illustrate this by a numerical example (Section 4.3), based on discretizing (4.1)and applying an algorithm that mimics the S -ﬂow and converges to a local minimum of thenon-convex functional (4.1), by solving a sequence of convex programs.We point out that M could be turned into a Riemannian manifold using a metric that reﬂectsimages features (edges etc.), as was proposed with the Laplace-Beltrami framework for imagedenoising [ KMS00 ]. In this work we focus on the essential point, however, that distinguishesimage denoising from image labeling , i.e. the interaction of the two terms (4.1) that essentiallyis a consequence of the information geometry of the assignment manifold W (2.19). Based on (2.3) we deﬁne the closed convex set D ( M ) = { S ∈ H ( M ; R c ) : S ( x ) ∈ ∆ c a.e. in M} . (4.2)and focus on the variational problem inf S ∈ D ( M ) E α ( S ) , (4.3)with E α given by (4.1). E α is smooth but nonconvex. We specify the set of minimizers (Prop. 4.2).Recall notation (2.1). Lemma 4.1

Let p ∈ ∆ c . Then (cid:107) p (cid:107) = 1 if and only if p ∈ B c . Proof

The ‘if’ statement is obvious. As for the ‘only if’, suppose p (cid:54)∈ B c , i.e. p i < for all i ∈ [ c ] . Then p i < p i and (cid:107) p (cid:107) < (cid:107) p (cid:107) = 1 . Proposition 4.2

The functional E α : D ( M ) → R given by (4.1) is lower bounded, E α ( S ) ≥ − α Vol( M ) > −∞ , ∀ S ∈ D ( M ) . (4.4) ontinuous-Domain Assignment Flow This lower bound is attained at some point in arg min S ∈ D ( M ) E α ( S ) = (cid:40) { S e , . . . , S e c } , if α > , { S p : M → ∆ : p ∈ ∆ } , if α = 0 , (4.5) where, for any p ∈ ∆ , S p denotes the constant vector ﬁeld x (cid:55)→ S p ( x ) = p . Proof

Let p ∈ ∆ . Then (cid:107) p (cid:107) ≤ (cid:107) p (cid:107) = 1 . It follows for S ∈ D ( M ) that E α ( S ) ≥ − α (cid:107) S (cid:107) M ≥ − α (cid:107) (cid:107) M = − α Vol( M ) , (4.6)which is (4.4).We next show that the right-hand side of (4.5) speciﬁes minimizers of E α . For any p ∈ ∆ ,the constant vector ﬁeld S p is contained in D ( M ) . Consider speciﬁcally S e i , i ∈ [ c ] . Since (cid:107) S e i ( x ) (cid:107) = (cid:107) e i (cid:107) = 1 and DS e i ≡ , the lower bound is attained, E α ( S e i ) = − α Vol( M ) ,and the functions { S e , . . . , S e c } minimize E α , for every α ≥ . If α = 0 , then the constantfunctions S p are minimizers as well, for any p ∈ ∆ , since then E α ( S p ) = (cid:107) DS p (cid:107) M = 0 = − · Vol( M ) . (4.7)We conclude by showing that no minimizers other than (4.5) exist. Let S ∗ ∈ D ( M ) beanother minimizer of E α with E α ( S ∗ ) = − α Vol( M ) . We distinguish the two cases α = 0 and α > .If α = 0 , then S ∗ satisﬁes (4.7) and (cid:107) DS ∗ (cid:107) M = 0 . Since (cid:107) DS ∗ ; i (cid:107) M ≤ (cid:107) DS ∗ (cid:107) M = 0 forevery i ∈ [ c ] , S ∗ is constant by Lemma 2.2, i.e. a p ∈ ∆ exists such that S ∗ = S p a.e.If α > , then using the equation E α ( S ∗ ) = − α Vol( M ) and (cid:107) S ∗ ( x ) (cid:107) ≤ gives α Vol( M ) ≤ (cid:107) DS ∗ (cid:107) M + α Vol( M ) = (cid:107) DS ∗ (cid:107) M − E α ( S ∗ ) = α (cid:107) S ∗ (cid:107) M (4.8 a ) ≤ α (cid:107) (cid:107) M = α Vol( M ) , (4.8 b )which shows (cid:107) DS ∗ (cid:107) M = 0 and hence by Lemma 2.2 again S ∗ = S p for some p ∈ ∆ . Thepreceding inequalities also imply Vol( M ) = (cid:107) S ∗ (cid:107) M , i.e. (cid:107) S ∗ ( x ) (cid:107) = 1 a.e. By Lemma 4.1, weconclude S ∗ = S p with p ∈ B c , that is S ∗ ∈ { S e , . . . , S e c } .Proposition 4.2 highlights the effect of the concave term of the objective E α (4.1): labelings areenforced in the absence of data. Below, the latter are taken into account (i) by imposing non-zeroboundary conditions and (ii) by initalizing a corresponding gradient ﬂow (Section 4.3). In this section, we consider the case where boundary conditions are imposed by restricting thefeasible set of problem (4.3) to A g ( M ) = { S ∈ D ( M ) : S − g ∈ H ( M ; R c ) } = (cid:0) g + H ( M ; R c ) (cid:1) ∩ D ( M ) (4.9)for some ﬁxed g that prescribes simplex-valued boundary values (in the trace sense). As inter-section of a closed afﬁne subspace and a closed convex set, A g ( M ) is closed convex.Weak lower semicontinuity is a key property for proving the existence of minimizers. In thecase of E α (4.1) this is not immediate, due to the lack of convexity.8 F. Savarino and C. Schn¨orr

Proposition 4.3

The functional E α given by (4.1) is weak sequentially lower semicontinuouson A g ( M ) , i.e. for any sequence ( S n ) n ∈ N ⊂ A g ( M ) weakly converging to S ∈ A g ( M ) , theinequality E α ( S ) ≤ lim inf n →∞ E α ( S n ) (4.10) holds. Proof

Let S n (cid:42) S converge weakly in A g ( M ) ⊂ H ( M ; R c ) . Then, by Prop. 2.3(c), (cid:107) S (cid:107) M ≤ lim inf n →∞ (cid:107) S n (cid:107) M . (4.11)Since S, S n ∈ A g ( M ) , we also have ( S n − g ) (cid:42) ( S − g ) in H ( M ; R c ) by (4.9) andconsequently S n → S strongly in L ( M ; R c ) due to (2.34). Taking into account (4.11) and lim inf n →∞ (cid:107) s n (cid:107) M = lim n →∞ (cid:107) S n (cid:107) M = (cid:107) S (cid:107) M , we obtain E α ( S ) = (cid:107) S (cid:107) M − (1 + α ) (cid:107) S (cid:107) M ≤ lim inf n →∞ (cid:107) S n (cid:107) M + lim inf n →∞ (cid:0) − (1 + α ) (cid:107) S n (cid:107) M (cid:1) (4.12 a ) ≤ lim inf n →∞ E α ( S n ) . (4.12 b )We are now prepared to show that E α attains its minimal value on A g ( M ) , following the basicproof pattern of [ Zei85, Ch. 38 ]. Theorem 4.4

Let E α be given by (4.1) . There exists S ∗ ∈ A g ( M ) such that E ∗ α := E α ( S ∗ ) = inf S ∈A g ( M ) E α ( S ) . (4.13) Proof

Let ( S n ) n ∈ N ⊂ A g ( M ) be a minimizing sequence such that lim n →∞ E α ( S n ) = E ∗ α . (4.14)Then there exists some sufﬁciently large n ∈ N such that E ∗ α ≥ E α ( S n ) = (cid:107) S n (cid:107) M − (1 + α ) (cid:107) S n (cid:107) M , ∀ n ≥ n . (4.15)Since S n ( x ) ∈ ∆ for a.e. x ∈ M , we have (cid:107) S n (cid:107) M ≤ Vol( M ) and hence obtain (cid:107) S n (cid:107) M ≤ E ∗ α + (1 + α ) (cid:107) S n (cid:107) M ≤ E ∗ α + (1 + α ) Vol( M ) , ∀ n ≥ n . (4.16)Thus the sequence ( S n ) n ∈ N ⊂ H ( M ; R c ) is bounded and, by Prop. 2.3(c), we may extract aweakly converging subsequence S n k (cid:42) S ∗ ∈ H ( M ; R c ) . Since A g ( M ) ⊂ H ( M ; R c ) isclosed convex, Prop. 2.3(a) implies S ∗ ∈ A g ( M ) . Consequently, by Prop. 4.3 and (4.14), E α ( S ∗ ) ≤ lim inf k →∞ E α ( S n k ) = lim k →∞ E α ( S n k ) = E ∗ α (4.17)which implies E α ( S ∗ ) = E ∗ α , i.e. S ∗ ∈ A g ( M ) minimizes E α . ontinuous-Domain Assignment Flow We consider the variational problem (4.13) inf S ∈A g ( M ) (cid:90) M (cid:107) DS (cid:107) − α (cid:107) S (cid:107) dx, (4.18)for some ﬁxed g specifying the boundary values S | ∂ M = g | ∂ M , and the problem to compute alocal minimum numerically using an optimization scheme that mimics the S -ﬂow of Proposition3.5.Based on (4.9), we rewrite the problem in the form inf f ∈ H ( M ; R c ) (cid:110) (cid:107) D ( g + f ) (cid:107) M − α (cid:107) g + f (cid:107) M dx + δ D ( M ) ( g + f ) (cid:111) (4.19 a ) = inf f ∈ H ( M ; R c ) (cid:110) (cid:107) Df (cid:107) M + 2 (cid:104) Dg, Df (cid:105) M − α (cid:0) (cid:107) f (cid:107) M + 2 (cid:104) g, f (cid:105) M (cid:1) + δ D ( M ) ( g + f ) (cid:111) + C, (4.19 b )where the last constant C collects terms not depending on f . We discretize the problem asfollows. f becomes a vector f ∈ R c n with n = |I| subvectors f i ∈ R c , i ∈ I or alterna-tively with c = |J | subvectors f j , j ∈ J . The inner product (cid:104) g, f (cid:105) M is replaced by (cid:104) g, f (cid:105) = (cid:80) i ∈ [ n ] (cid:104) g i , f i (cid:105) = (cid:80) j ∈ [ c ] (cid:104) g j , f j (cid:105) . We keep the symbols f, g for simplicity and indicate the dis-cretized setting by the subscript n as introduced next. D becomes a gradient matrix D n that estimates the gradient of each subvector f j separately,such that L n f := D (cid:62) n D n f (4.20)is the basic discrete 5-point stencil Laplacian applied to each subvector f j . The feasible set D ( M ) is replaced by the closed convex set D n := { f ≥ (cid:104) c , f i (cid:105) = 1 , ∀ i ∈ I} . (4.21)Thus the discretized problem reads inf f (cid:110) (cid:107) D n f (cid:107) + 2 (cid:104) L n g − αg, f (cid:105) − α (cid:107) f (cid:107) + δ D n ( g + f ) (cid:111) . (4.22)Having computed a local minimum f ∗ , the corresponding local minimum of (4.18) is S ∗ = g + f ∗ .In order to compute f ∗ , we applied the proximal forward-backward scheme f ( k +1) = arg min f (cid:110) (cid:107) D n f (cid:107) +2 (cid:104) L n g − α ( g + f ( k ) ) , f (cid:105) + 12 τ k (cid:107) f − f ( k ) (cid:107) + δ D n ( g + f ) (cid:111) , k ≥ , (4.23)with proximal parameters τ k , k ∈ N and initialization f (0) i , i ∈ I speciﬁed further below. Theiterative scheme (4.23) is a special case of the PALM algorithm [ BST14, Sec. 3.7 ]. Ignoringthe proximal term, each problem (4.23) amounts to solve c (discretized) Dirichlet problems withthe boundary values of g j , j ∈ [ c ] imposed, and with right-hand sides that change during theiteration since they depend on f ( k ) . The solutions ( f j ) ( k ) , j ∈ J to these Dirichlet problemsdepend on each other, however, through the feasible set (4.21). At each iteration k , problem (4.23)can be solved by convex programming. The proximal parameters τ k act as stepsizes such that thesequence f ( k ) does not approach a local minimum too rapidly. Then the interplay between the0 F. Savarino and C. Schn¨orr

Figure 4.1. Evaluation of the numerical scheme (4.23) that mimics the S -ﬂow of Propo-sition 3.5. Parameter values: α = 1 , τ k = τ = 10 , ∀ k . Top , from left to right: Groundtruth, noisy input data f (0) , iterate f (100) and f ∗ resulting from f (100) by a trivial round-ing step. S ( k ) = f ( k ) + g diﬀers from f ( k ) by the boundary values corresponding tothe noisy input data. Inspecting the values of f (100) close to the boundary shows thatthe inﬂuence of boundary noise is minimal. Bottom , from left to right: The iterates f (10) , f (20) , f (30) , f (40) . Taking into account rounding as post-processing step, the se-quence f ( k ) quickly converges after rounding to a reasonable partition. About 50 moreiterations are required to ﬁx the values at merely few hundred remaining pixels. Slightrounding of the geometry of the components of the partition, in comparison to groundtruth, corresponds to using uniform weights (2.26) for the assignment ﬂow. linear form that adapts during the iteration and the regularizing effect of the Laplacians can ﬁnda labeling (partition) corresponding to a good local optimum.As for g , we chose g i = L i ( S ) , i ∈ I at boundary vertices i and g i = 0 at every interiorvertex i . Consequently, with the initialization f (0) i = L i ( S ) , i ∈ I at interior vertices (theboundary values of f are zero), the sequence S ( k ) = g + f ( k ) mimics the S -ﬂow of Proposition3.5 where the given data also show up in the initialization S (0) only.Figure 4.1 provides an illustration using an experiment adopted from [ ˚APSS17, Fig. 6 ], orig-inally designed to evaluate the performance of geometric regularization of label assignmentsthrough the assignment ﬂow in an unbiased way. Parameter values are speciﬁed in the caption.The result conﬁrms that the continuous-domain formulations discussed above represent the as-signment ﬂow at the smallest spatial scale. Let S ∗ solve the variational problem (4.18) . Then S ∗ satisﬁes the variationalinequality (cid:104) DS ∗ , DS − DS ∗ (cid:105) M − α (cid:104) S ∗ , S − S ∗ (cid:105) M ≥ , ∀ S ∈ A g ( M ) . (4.24) ontinuous-Domain Assignment Flow Proof

Functional E α given by (4.18) is Gateaux-differentiable with derivative (cid:104) E (cid:48) ( α )( S ∗ ) , S (cid:105) H − ( M ; R c ) × H ( M ; R c ) = 2 (cid:0) (cid:104) DS ∗ , S (cid:105) M − α (cid:104) S ∗ , S (cid:105) M (cid:1) . (4.25)The assertion follows from applying Theorem 2.4.We conclude this section by deriving a PDE corresponding to (4.24), that a minimizer S ∗ issupposed to satisfy in the weak sense. The derivation is formal in that we adopt the unrealisticregularity assumption S ∗ ∈ A g ( M ) , (4.26)with A g ( M ) deﬁned analogous to (4.9). While this will hold for the continuous-domain linear problems corresponding to (4.23) at each step k of the iteration and for sufﬁciently smooth ∂ M ,it will not hold in the limit k → ∞ , since we expect (and wish) S ∗ to become discontinuous,contrary to the regularity assumption (4.26) and the continuity implied by the Sobolev embeddingtheorem for M ⊂ R d with d = 2 . Nevertheless, since the PDE provides another interpretation ofthe assignment ﬂow, we state it – see (4.32) below – and hope it will stimulate further research.In view of the assumption (4.26), set S ∗ = g + f ∗ , f ∗ ∈ H ( M ; R c ) . (4.27)Inserting S ∗ and S = g + h, h ∈ H ( M ; R c ) , into (4.24) and partial integration gives (cid:104)− ∆ S ∗ − αS ∗ , h − f ∗ (cid:105) M ≥ , (4.28)where ∆ S ∗ = (∆ S ∗ ;1 , . . . , ∆ S ∗ ; c ) (cid:62) applies componentwise. Using the shorthands ν α ( S ∗ ) = − ∆ S ∗ − αS ∗ , (4.29 a ) µ α ( S ∗ ) = ν α ( S ∗ ) − (cid:104) ν α ( S ∗ ) , S ∗ (cid:105) R c , (4.29 b )where (cid:104) ν α ( S ∗ ) , S ∗ (cid:105) R denotes the function x (cid:55)→ (cid:10) ν α ( S ∗ )( x ) , S ∗ ( x ) (cid:11) , x ∈ M , we have (cid:104) µ α ( S ∗ ) , S ∗ (cid:105) M = 0 (4.30 a )since (cid:104) c , S ∗ ( x ) (cid:105) = 1 for a.e. x , and (cid:104) µ α ( S ∗ ) , S (cid:105) M = (cid:104) ν α ( S ∗ ) , h − f ∗ (cid:105) M ≥ , (4.30 b )which is (4.28). Since S ( x ) ≥ a.e. in M and may have arbitrary support, we deduce fromthe inequality (cid:104) µ α ( S ∗ ) , S (cid:105) M ≥ and from the self-duality of the nonnegative orthant R c + that µ α ( S ∗ ) ≥ a.e. in M . Since also S ∗ ≥ a.e., this implies that equation (4.30 a ) holds pointwisea.e. in M : µ α ( S ∗ )( x ) S ∗ ( x ) = ν α ( S ∗ )( x ) S ∗ ( x ) − (cid:10) ν α ( S ∗ )( x ) S ∗ ( x ) (cid:11) S ∗ ( x ) = 0 a.e. in M . (4.31)Substituting ν α ( S ∗ ) we deduce that a minimizer S ∗ = g + f ∗ characterized by the variationalinequality (4.24) weakly satisﬁes the PDE R S ∗ ( − ∆ S ∗ − αS ∗ ) = 0 , (4.32)where R S ∗ deﬁned by (2.16) applies R S ∗ ( x ) to vector ( − ∆ S ∗ − αS ∗ )( x ) at every x ∈ M .2 F. Savarino and C. Schn¨orr

Remark 4.6 (Comments) (1) We point out that computing a vector ﬁeld S ∗ satisfying (4.24) is difﬁcult in practice, due tothe nonconvexity of problem (4.18) . On the other hand, the algorithm proposed in Section 4.3in the result illustrated by Figure 4.1 shows that good suboptima can be computed by merelysolving a sequence of simple problems.(2) As already pointed out at the beginning of this section, the derivation of the PDE (4.32) is merely a formal one, due to the unrealistic regularity assumption (4.26) . In fact, since ker R S ∗ ( x ) = R1 c , equation (4.32) says that S ∗ is constant up to a set of measure zero.While the numerical result (Fig. 4.1) clearly reﬂects this, the discontinuity of S ∗ conﬂictswith assumption (4.26) . We presented a novel parametrization of the assignment ﬂow for contextual data classiﬁcation ongraphs. The dominating part of the ﬂow admits the interpretation as Riemannian gradient ﬂowwith respect to the underlying information geometry, unlike the original formulation of the as-signment ﬂow. A decomposition of the corresponding potential by means of a non-local graphLaplacian makes explicit the interaction of two processes: regularization of label assignments andgradual enforcement of unambiguous decisions. The assignment ﬂow combines these aspects ina seamless way, unlike traditional approaches where solutions to convex relaxations require post-processing. It is remarkable that this behaviour is solely induced by the underlying informationgeometry.We studied a continuous-domain variational formulation as counterpart of the discrete for-mulation restricted to a local discrete Laplacian (nearest neighbor interaction). A numerical al-gorithm in terms of a sequence of simple linear elliptic problems reproduces results that wereobtained with the original formulation of the assignment ﬂow using completely different numer-ics (geometric ODE integration). This illustrates the derived mathematical relations.We outline three attractive directions of further research. • We clariﬁed in Section 4 that the inherent smooth setting of the assignment ﬂow (2.28) trans-lates under suitable assumptions to the sequence of linear (discretized) elliptic PDE problems(4.23) together with a simple convex constraint. We did not touch on the limit problem, how-ever. More mathematical work is required here, cf. Remark 4.6.Since the assignment ﬂow returns image partitions when applied to image features on a gridgraph, the situation reminds us of the Mumford-Shah functional [

MS89 ] and its approximationby a sequence of Γ -converging smooth elliptic problems [ AT90 ]. Likewise, one may regardthe concave second term of (4.18) together with the convex constraint S ∈ A g as a vector-valued counterpart of the basic nonnegative double-well potential of scalar phase-ﬁeld modelsfor binary segmentation [ Ste91, CT18 ]. In these works, too, nonsmooth limit cases resultfrom Γ -converging simpler problems. • Adopting the viewpoint of evolutionary dynamics [

HS03 ] on label assignment, the assignmentﬂow may be characterized as spatially coupled replicator dynamics. To the best of our knowl-edge, our paper [ ˚APSS17 ] seems to be the ﬁrst one that used information theory to formulatethis spatial coupling. Some consequences of the geometry were elaborated in the present paperand discussed above. ontinuous-Domain Assignment Flow

TC04, dB13 ], applied mathematics[

NPB11, BPN14 ], including extensions to scenarios with an inﬁnite number of strategies(as opposed to selecting from a ﬁnite set of labels) – see [

AFMS18 ] and references therein.In this context, our work might stimulate researchers working on spatially extended evo-lutionary dynamics in various scientiﬁc disciplines. In particular, generalizing our approachto continuous-domain integro-differential models seem attractive that conform to the assig-ment ﬂow with non-local interactions (i.e. with larger neighborhoods |N i | , i ∈ I ) and theunderlying geometry. • Last but not least, our work may support a better understanding of learning with networks.Our preliminary work on learning the weights (2.26) using the linearized assignment ﬂow[

HSPS19 ] on a single graph (‘layer’) revealed the model expressiveness of this limited sce-nario, on the one hand, and that subdividing complex learning tasks in this way avoids ‘blackbox behaviour’, on the other hand. We hope that the continuous-domain perspective developedin this paper in terms of sequences of linear PDEs will support our further understanding oflearning with hierarchical ‘deeper’ architectures.

Acknowledgements

Financial support by the German Science Foundation (DFG), grant GRK 1653, is gratefully ac-knowledged. This work has also been stimulated by the Heidelberg Excellence Cluster STRUC-TURES, funded by the DFG under Germany’s Excellen Strategy EXC-2181/1 - 390900948.

References [ABM14] H. Attouch, G. Buttazzo, and G. Michaille,

Variational Analysis in Sovolev and BVSpaces: Applications to PDEs and Optimization , 2nd ed., SIAM, 2014.[AFMS18] L. Ambrosio, M. Fornasier, M. Morandotti, and G. Savar´e,

Spatially InhomogeneousEvolutionary Games , CoRR abs/1805.04027 (2018).[AN00] S.-I. Amari and H. Nagaoka,

Methods of Information Geometry , Amer. Math. Soc.and Oxford Univ. Press, 2000.[˚APSS17] F. ˚Astr¨om, S. Petra, B. Schmitzer, and C. Schn¨orr,

Image Labeling by Assignment ,Journal of Mathematical Imaging and Vision (2017), no. 2, 211–238.[ARP +

19] V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen,

On Instabilitiesof Deep Learning in Image Reconstruction - Does AI Come at a Cost? , CoRRabs/1902.05300 (2019).[AT90] L. Ambrosio and V. M. Tortorelli,

Approximation of Functional Depending on Jumpsby Elliptic Functional via Γ -Convergence , Comm. Pure Appl. Math. (1990),no. 8, 999–1036.[BPN14] A. S. Bratus, V. P. Posvyanskii, and A. S. Novozhilov, Replicator equations andSpace , Math. Modelling Natural Phenomena (2014), no. 3, 47–67.[BST14] J. Bolte, S. Sabach, and M. Teboulle, Proximal Alternating Linearized Minimizationfor Nonconvex and Nonsmooth Problems , Math. Progr., Ser. A (2014), no. 1-2,459–494.[CCP12] A. Chambolle, D. Cremers, and T. Pock,

A Convex Approach to Minimal Partitions ,SIAM J. Imag. Sci. (2012), no. 4, 1113–1158.[CDH16] P. V. Coveney, E. R. Dougherty, and R. R. Highﬁeld, Big Data Need Big Theory too ,Phil. Trans. R. Soc. Lond. A (2016), 20160153. F. Savarino and C. Schn¨orr [CT18] R. Cristoferi and M. Thorpe,

Large Data Limit for a Phase Transition Model withthe p -Laplacian on Point Clouds , Europ. J. Appl. Math. (2018), 1–47.[dB13] Russ deForest and A. Belmonte, Spatial Pattern Dynamics due to the Fitness Gra-dient Flux in Evolutionary Games , Physical Review E (2013), no. 6, 062138.[E17] Weinan E, A Proposal on Machine Learning via Dynamical Systems , Comm. Math.Statistics (2017), no. 1, 1–11.[EHL19] W. E, J. Han, and Q. Li, A Mean-Field Optimal Control Formulation of Deep Learn-ing , Res. Math. Sci. (2019), no. 10, 41 pages.[Ela17] M. Elad, Deep, Deep Trouble: Deep Learning’s Impact on Image Processing, Math-ematics, and Humanity , SIAM News (2017).[HR17] E. Haber and L. Ruthotto,

Stable Architectures for Deep Neural Networks , InverseProblems (2017), no. 1, 014004.[HS03] J. Hofbauer and K. Siegmund, Evolutionary Game Dynamics , Bull. Amer. Math.Soc. (2003), no. 4, 479–519.[HSPS19] R. H¨uhnerbein, F. Savarino, S. Petra, and C. Schn¨orr, Learning Adaptive Regular-ization for Image Labeling Using Geometric Assignment , Proc. SSVM, Springer,2019.[HZRS16] K. He, X. Zhang, S. Ren, and J. Sun,

Deep Residual Learning for Image Recognition ,Proc. CVPR, 2016.[Jos17] J. Jost,

Riemannian Geometry and Geometric Analysis , 7th ed., Springer-VerlagBerlin Heidelberg, 2017.[KMS00] R. Kimmel, R. Malladi, and N. Sochen,

Images as Embedded Maps and MinimalSurfaces: Movies, Color, Texture, and Volumetric Images , Int. J. Comp. Vision (2000), no. 2, 111–129.[LS11] J. Lellmann and C. Schn¨orr, Continuous Multiclass Labeling Approaches and Algo-rithms , SIAM J. Imag. Sci. (2011), no. 4, 1049–1096.[LT19] G.-H. Liu and E. A. Theodorou, Deep Learning Theory Review: An Optimal Controland Dynamical Systems Perspective , CoRR abs/1908.10920 (2019).[MS89] D. Mumford and J. Shah,

Optimal Approximations by Piecewise Smooth Functionsand Associated Variational Problems , Comm. Pure Appl. Math. (1989), 577–685.[NPB11] A. Novozhilov, V. P. PosvNovozh, and A. S. Bratus, On the Reaction–DiﬀusionReplicator Systems: Spatial Patterns and Asymptotic Behaviour , Russ. J. Numer.Anal. Math. Modelling (2011), no. 6, 555–564.[San10] W. H. Sandholm, Population Games and Evolutionary Dynamics , MIT Press, 2010.[Sch19] C. Schn¨orr,

Assignment Flows , Variational Methods for Nonlinear Geometric Dataand Applications (P. Grohs, M. Holler, and A. Weinmann, eds.), Springer (inpress), 2019.[Ste91] P. Sternberg,

Vector-Valued Local Minimizers of Nonconvex Variational Problems ,Rocky-Mountain J. Math. (1991), no. 2, 799–807.[TC04] A. Traulsen and J. C. Claussen, Similarity-Based Cooperation and Spatial Segrega-tion , Phys. Rev. E (2004), no. 4, 046128.[Zei85] E. Zeidler, Nonlinear Functional Analysis and its Applications , vol. 3, Springer, 1985.[Zie89] W. P. Ziemer,

Weakly Diﬀerentiable Functions , Springer, 1989.[ZSPS19] A. Zeilmann, F. Savarino, S. Petra, and C. Schn¨orr,

Geometric Numerical Integrationof the Assignment Flow , CoRR abs/1810.06970, Inverse Problems: in press (2019).[ZZPS19a] A. Zern, M. Zisler, S. Petra, and C. Schn¨orr,

Unsupervised Assignment Flow: LabelLearning on Feature Manifolds by Spatially Regularized Geometric Assignment ,CoRR abs/1904.10863 (2019).[ZZPS19b] M. Zisler, A. Zern, S. Petra, and C. Schn¨orr,