[PDF] A non-linear monotonicity principle and applications to Schrödinger type problems

Abstract

A basic idea in optimal transport is that optimizers can be characterized through a geometric property of their support sets called cyclical monotonicity. In recent years, similar "monotonicity principles" have found applications in other fields where infinite dimensional linear optimization problems play an important role. In this note, we observe how this approach can be transferred to non-linear optimization problems. Specifically we establish a monotonicity principle that is applicable to the Schr\"odinger problem and use it to characterize the structure of optimizers for target functionals beyond relative entropy. In contrast to classical convex duality approaches, a main novelty is that the monotonicity principle allows to deal also with non-convex functionals.

Full PDF

AA NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TOSCHR ¨ODINGER TYPE PROBLEMS

JULIO BACKHOFF-VERAGUAS, MATHIAS BEIGLB ¨OCK, AND GIOVANNI CONFORTIA bstract . A basic idea in optimal transport is that optimizers can be characterized through ageometric property of their support sets called cyclical monotonicity . In recent years, similar monotonicity principles have found applications in other ﬁelds where inﬁnite dimensional linearoptimization problems play an important role.In this note, we observe how this approach can be transferred to non-linear optimization prob-lems. Speciﬁcally we establish a monotonicity principle that is applicable to the Schr¨odingerproblem and use it to characterize the structure of optimizers for target functionals beyond rel-ative entropy. In contrast to classical convex duality approaches, a main novelty is that themonotonicity principle allows to deal also with non-convex functionals.keywords: cyclical monotonicity, monotonicity principle, Schr¨odinger problem, L diver-gence, non-linear optimization,

1. I ntroduction and main results

Motivation from optimal transport.

Given probabilities µ and ν on Polish spaces X and Y , and a cost function c : X × Y → R + , the Monge-Kantorovich problem is to ﬁnd a costminimizing transport plan. More precisely, writing cpl( µ, ν ) for the set of all measures on X × Y with X -marginal µ and Y -marginal ν , the problem is to ﬁndinf (cid:110)(cid:82) c d P : P ∈ cpl( µ, ν ) (cid:111) (OT)and to identify an optimal transport plan P ∗ ∈ cpl( µ, ν ).The notion of c-cyclical monotonicity leads to a geometric characterization of optimal cou-plings. Its relevance for (OT) has been highlighted by Gangbo and McCann [24], followingearlier work of Knott and Smith [31] and R¨uschendorf [41] among others.We give here slightly non-standard deﬁnition that is not inherently tied to the transport prob-lem and serves our exposition more directly A set Γ ⊆ X × Y is c-cyclically monotone if any measure α that is ﬁnite and supported onﬁnitely many points in Γ , is a cost-minimizing transport between its marginals. I.e., if α (cid:48) hasthe same marginals as α , then (cid:82) c d α ≤ (cid:82) c d α (cid:48) . (1.1)A transport plan γ is called c -cyclically monotone if it is concentrated on such a set Γ , i.e. ifthere is such a Γ with γ ( Γ ) = See [44, Exercise 2.21] for equivalence to the more familiar way of stating c -cyclical monotonicity of a set Γ :usually one requires that for any ( x , y ) , . . . , ( x n , y n ) ∈ Γ , y n + = y it holds (cid:80) ni = c ( x i , y i ) ≤ (cid:80) ni = c ( x i , y i + ). a r X i v : . [ m a t h . O C ] J a n JULIO BACKHOFF-VERAGUAS, MATHIAS BEIGLB ¨OCK, AND GIOVANNI CONFORTI

The equivalence of optimality and c -cyclical monotonicity has been established und progres-sively mild regularity assumption. Based on [1, 40, 43, 7, 14] the following ‘ MonotonicityPrinciple ’ holds true:

Theorem 1.1.

Let c : X × Y → [0 , ∞ ) be measurable and assume that P ∈ cpl ( µ, ν ) is a transportplan with ﬁnite costs (cid:82) c d P ∈ R + . Then P is optimal if and only if P is c-cyclically monotone. The importance of this result stems from the observation that it is often an elementary andfeasible task to see whether a transport behaves optimally on a ﬁnite number of points. Butthis would be a priori of no help for a problem where single points do not carry positive mass.Theorem 1.1 provides the required remedy to this obstacle as it establishes the connection tooptimality on a “pointwise” level.1.2.

Recent developments and aims of this article.

More recently, variants of this ‘mono-tonicity priniciple’ have been applied in transport problems for ﬁnitely or inﬁnitely many marginals[38, 19, 27, 8, 45], the martingale version of the optimal transport problem [9, 36, 11], stochasticportfolio theory [37], the Skorokhod embedding problem [5, 28], the distribution constrainedoptimal stopping problem [6, 10] and the weak transport problem [26, 3, 4].What all these articles have in common is that the original idea is applied to other inﬁnitedimensional linear optimization problems. In the present note, we advertise the idea that thisoptimality principle can be useful beyond linear problems and in fact to problems that are notsusceptible to a convex duality approach. Given the versatile applicability of the idea in variouslinear optimization problems, the extension to non-linear problems appears highly promising.In Section 1.3 we present the principal idea of what kind of structure such a monotonicityprinciple might take in applications to non-linear optimization problems. While the heuristicderivation in Section 1.3 is based on a purely formal linearization procedure, we rigorouslyestablish this result in Section 1.4 for a large subclass of non-linear problems. We then furtherspecify this rigorous monotonicity principle in the setup of a general and not necessarily convexversion of the Schr¨odinger problem: In Theorem 1.4 we show how this non-linear monotonicityprinciple can be used to obtain necessary optimality conditions, which are shown to be alsosu ﬃ cient for convex problems such as the classical Schr ¨odinger problem, see Theorem 1.6.Furthermore we derive novel variants of these conditions for more general entropy functionalsin Theorem 1.5.To illustrate the potential of our approach, we apply our results to obtain a shape theorem forthe optimal solutions of a non-convex Schr¨odinger problem with congestion. Furthermore, wediscuss brieﬂy how a natural generalization of our ﬁndings, which we plan to address in futureworks, would allow to advance considerably the understanding of the recently introduced meanﬁeld Schr¨odinger problem [2].1.3. A ‘formal’ non-linear monotonicity principle.

In this section we introduce some nota-tion and then state a non-linear monotonicity principle which is ‘formal’ in the sense that we donot give a rigorous proof or precise conditions under which it is expected to hold. In the nextsection we will then provide a rigorous version which is applicable to the Schr¨odinger problemand similar energy minimization problems.Let Ω be a Polish space with B its Borel sigma-algebra. Consider F a family of real-valuedfunctions on Ω . We suppose either of the following:(1) F is a subset of C b ( Ω ), the space of continuous bounded functions. NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TO SCHR ¨ODINGER TYPE PROBLEMS 3 (2) F is a countable sub-family of B b ( Ω ), the space of Borel bounded functions.We are given a functional G : P ( Ω ) → [0 , + ∞ ] , and we are interested in the following probleminf { G ( Q ) : Q ∈ P ( Ω ) , Q ∈ Adm } , (P)where Adm : = Adm F : = (cid:110)(cid:82) f d Q = , ∀ f ∈ F (cid:111) . The standing assumption on G is that there exist directional derivatives with representationvia functions, i.e. for any Q in the domain D ( G ) = { Q ∈ P , G ( Q ) < ∞} there exists δ G Q : Ω → ( −∞ , ∞ ] measurable such that ∀ ¯ Q ∈ D ( G ) , lim ε (cid:38) G ( Q + ε [ ¯ Q − Q ]) − G ( Q ) ε = (cid:82) Ω δ G Q ( ω )[ ¯ Q − Q ](d ω ) , where one implicitly assumed the limit to exist for all Q , ¯ Q ∈ D ( G ).Positive ﬁnite measures α, α (cid:48) with equal mass and ﬁnite support are are called competitors if (cid:82) f d( α − α (cid:48) ) = , ∀ f ∈ F . We then expect the following:

Formal Statement 1.2 (Non-Linear Monotonicity Principle, formal version) . Suppose Q ∗ ∈ Adm ∩ D ( G ) is an optimizer for Problem (P) . Then(1) Q ∗ attains the linearized problem inf (cid:110)(cid:82) Ω c ( ω ) Q (d ω ) : Q ∈ Adm ∩ D ( G ) (cid:111) , where c ( ω ) : = δ G Q ∗ ( ω ) , (2) There exist a Borel set Γ Q ∗ ⊆ Ω such that Q ∗ ( Γ Q ∗ ) = having the following property:given competitors α, ¯ α , with supp α ⊆ Γ Q ∗ we have (cid:82) δ G Q ∗ d α ≤ (cid:82) δ G Q ∗ d ¯ α. Formal derivation.

By optimality of Q ∗ and the fact that Adm is convex, we easily obtainlim ε (cid:38) G ( Q ∗ + ε [ ¯ Q − Q ∗ ]) − G ( Q ∗ ) ε = (cid:82) Ω δ G Q ∗ ( ω )[ ¯ Q − Q ∗ ](d ω ) ≥ , for all ¯ Q ∈ Adm ∩ D ( G ), showing that Q ∗ attains the linearized problem in (1). The monotonicityprinciple in [8, Theorem 1.4] applies, and we ﬁnd exactly the desired condition in (2). (cid:3) A Rigorous Non-Linear Monotonicity Principle.

We consider throughout a continuousfunction h : R + → R + satisfying at least: h is di ﬀ erentiable on (0 , ∞ ) and the limit h (cid:48) (0) : = lim x (cid:38) h (cid:48) ( x ) exists. (H)Throughout we ﬁx P ∈ P ( Ω ) and consider G ( Q ) : = G h ( Q ) : = (cid:40) (cid:82) Ω h (cid:16) d Q d P ( ω ) (cid:17) P (d ω ) , if Q (cid:28) P , + ∞ , otherwise . and the associated minimization probleminf { G h ( Q ) : Q ∈ P ( Ω ) , Q ∈ Adm } . (P h ) JULIO BACKHOFF-VERAGUAS, MATHIAS BEIGLB ¨OCK, AND GIOVANNI CONFORTI

Lemma 1.3 (Non-Linear Monotonicity Principle) . Suppose that h is twice di ﬀ erentiable on R + with h (cid:48)(cid:48) ≥ C everywhere for some C ∈ R and that lim x → + ∞ h (cid:48) ( x ) = + ∞ . Furthermore, assumethat either h (cid:48) is lower bounded or lim x ↓ h (cid:48) ( x ) = −∞ and let Q ∗ be an optimizer of Problem (P h ) .Then there exist sets Γ Q ∗ , Γ P such that P ( Γ P ) = Q ∗ ( Γ Q ∗ ) = and for all competitors α, α (cid:48) with supp( α ) ⊆ Γ Q ∗ , supp( α (cid:48) ) ⊆ Γ P we have (cid:82) h (cid:48) (cid:16) d Q ∗ d P (cid:17) d α ≤ (cid:82) h (cid:48) (cid:16) d Q ∗ d P (cid:17) d α (cid:48) . (1.2)1.5. Schr¨odinger-type Problems.

We specify the setting of Section 1.4. In this part we areinterested in the case Ω : = X × Y , for X , Y Polish spaces. As for the constraints set F , we are interested in F µ,ν : = F µ ∪ F ν , F µ : = { f − (cid:82) X f d µ : f ∈ C b ( X ) } F ν : = { g − (cid:82) Y g d ν : g ∈ C b ( Y ) } , for given probability measures µ ∈ P ( X ) , ν ∈ P ( Y ) , satisfying µ (cid:28) proj X ( P ) and ν (cid:28) proj Y ( P ) . With these speciﬁcations, our minimization problem (Problem (P h )) clearly becomes:inf (cid:110)(cid:82) X×Y h (cid:16) d Q d P ( x , y ) (cid:17) P (d x , d y ) : Q ∈ cpl( µ, ν ) (cid:111) . (1.3)Notice that for the choice h ( x ) = x log( x ), Problem (P h ) becomes the classical Schr ¨odingerproblem .We now rigorously derive necessary optimality conditions for Problem (1.3). The functions ϕ and ψ appearing in Theorem 1.4 can formally be seen as Lagrange multipliers and in the case h ( x ) = x log( x ) they are known as Schr¨odinger potentials, see [34, Sec 2.]. Theorem 1.4.

Assume that Problem (1.3) is ﬁnite and Q ∗ is an optimizer thereof. Importantlywe also assume that P ∼ µ ⊗ ν . Let h : [0 , ∞ ) → ( −∞ , ∞ ) be twice continuously di ﬀ erentiable, lim x → h (cid:48) (0) = −∞ , lim x → + ∞ h (cid:48) ( x ) = + ∞ and inf R + h (cid:48)(cid:48) > −∞ . Then Q ∗ ∼ P and there existmeasurable functions ϕ : X → [ −∞ , + ∞ ) and ψ : Y → [ −∞ , + ∞ ) such thath (cid:48) ◦ d Q ∗ d P ( x , y ) = ϕ ( x ) + ψ ( y ) , P − a . s . (1.4)It is worth remarking that the above theorem applies to h ( x ) = x log( x ) (where h (cid:48) ( x ) = + log( x )) but not to h ( x ) = x (where h (cid:48) ( x ) = x ). This latter case (and similar ones) is covered bythe following complementary theorem: Theorem 1.5.

Assume that Problem (1.3) is ﬁnite and Q ∗ is an optimizer thereof. Assumethat P ∼ µ ⊗ ν . Let h : [0 , ∞ ) → ( −∞ , ∞ ) be strictly increasing, continuously di ﬀ erentiable, lim x → h (cid:48) ( x ) = , lim x → + ∞ h (cid:48) ( x ) = + ∞ , inf R + h (cid:48)(cid:48) > −∞ . Then there exist measurable functions ϕ : X → [ −∞ , + ∞ ) and ψ : Y → [ −∞ , + ∞ ) such thath (cid:48) ◦ d Q ∗ d P ( x , y ) = ( ϕ ( x ) + ψ ( y )) + , P − a.s. (1.5)We remark that uniqueness of an optimizer to Problem (1.3) is guaranteed if h is strictlyconvex. On the other hand, Conditions (1.4)-(1.5) do not characterize optimizers even whenthese are unique (e.g. when h (cid:48) is not one-to-one). See L´eonard’s survey [34] on classical results around the Schr¨odinger problem and its probabilistic meaning.Recently this problem has seen a surge in interest owing to the overture to machine learning by Cuturi [20].

NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TO SCHR ¨ODINGER TYPE PROBLEMS 5

Comparison with the existing literature.

Minimization problems of the form (1.3) have beenstudied for a long time, the most notable example being the Schr¨odinger problem. Indeed,analogues of Theorem 1.4 for the case where h ( x ) = x log x have been obtained in seminal worksof Fortet and Beurling [23, 13]. In more recent works, Borwein and Lewis [15] and Borwein,Lewis and Nussbaum [16] proposed an approach to entropy minimization that combines ﬁxedpoint-arguments and convex optimization techniques. We refer to Gigli and Tamanini’s article[25] for adaptations of these results to the setting of RCD spaces. Convex duality is also atthe heart of the proof strategy of Pennanen and Perkki ¨o [39]. A di ﬀ erent viewpoint is adoptedby R ¨uschendorf and Thomsen [42]: therein the shape of the optimal measure is found as aconsequence of the closedness property of sum spaces of integrable functions. We also referto Carlier and Laborde [17] for multidimensional generalizations. A large part of the abovementioned results is surveyed by L´eonard in [34]. This author has also proven shape theoremsfor the Schrodinger problem analogous to Theorem 1.4 in [32, 33]. Cattiaux and Gamboa [18]treat the more general case when h is the log-Laplace transform of a probability measure: thiscondition implies that h is convex. However, it is not assumed there (unlike what we do here)that P ∼ µ ⊗ ν , but only P (cid:28) µ ⊗ ν is needed. Their proofs rely essentially on ideas and toolscoming from large deviations and on the earlier ﬁndings of [42]. To the best of our knowledge,the case when h is not convex has not been treated before the present article. As for Lemma1.3, a more explicit version in the particular case of the classical Schr¨odinger Problem has beenobtained in parallel by Bernton, Ghosal and Nutz in [12], where it is furthermore leveraged toobtain stability and large deviations estimates.We now study the converse direction: how structure of a measure implies optimality. Herewe do need to assume convexity. Theorem 1.6.

Let h : [0 , ∞ ) → ( −∞ , ∞ ) be strictly convex, lower-bounded, and continuouslydi ﬀ erentiable, lim x → h (cid:48) ( x ) = , lim x → + ∞ h (cid:48) ( x ) = + ∞ , and h (2 x ) ≤ ah ( x ) + bx + c for constantsa , b , c. Suppose that Q ∗ ∈ cpl ( µ, ν ) is absolutely continuous with respect to P , withh (cid:48) ◦ d Q ∗ d P ( x , y ) = ( ϕ ( x ) + ψ ( y )) + , P − a.s. (1.6) for measurable ϕ : X → [ −∞ , + ∞ ) and ψ : Y → [ −∞ , + ∞ ) . Then Q ∗ is optimal for (1.3) . With the same techniques used to prove Theorem 1.6, variants of this result can be establishedif h (cid:48) (0) ∈ [ −∞ , ∞ ), covering in particular the Schr¨odinger problem. As for Theorem 1.5, it isplain that it can be adapted to cover the case h (cid:48) (0) ∈ ( −∞ , ∞ ). A Toy Example: Schr¨odinger Problem with Congestion.

Let as say x ∈ X and y ∈ Y denoterespectively origins and destinations for car users in a city. Hence an origin-destination pair( x , y ) can stand for the route that a car has to travel from x to y . Experts have determinedthat P ∈ P ( X × Y ) is the optimal use of the road network (here P (d x , d y ) is the inﬁnitesimalproportion of cars taking route ( x , y )) in the stationary case. However, the actual proportion ofcar trips origins and car trip destinations are described by µ ∈ P ( X ) and ν ∈ P ( Y ) respectively,rather than proj X ( P ) and proj Y ( P ). In the vanilla version of the Schr¨odinger Problem we aim todetermine a minimizer Q ∗ of the relative entropy (cid:82) d Q d P log (cid:16) d Q d P (cid:17) d P over Q ∈ cpl( µ, ν ), Q (cid:28) P ,amounting to the distribution of car trips compatible with the experts’ guess P and the marginalinformation µ and ν . However, we may also want to consider congestion e ﬀ ects, codiﬁed by JULIO BACKHOFF-VERAGUAS, MATHIAS BEIGLB ¨OCK, AND GIOVANNI CONFORTI an added term f (cid:16) d Q d P (cid:17) with f ( · ) increasing, the idea being that adding tra ﬃ c above the experts’recommendation should be more costly than the opposite. This way we arrive at the non-convexSchr¨odinger-type problem of minimizing (cid:82) (cid:104) d Q d P log (cid:16) d Q d P (cid:17) + f (cid:16) d Q d P (cid:17)(cid:105) d P under the same constraints.The optimality condition in Theorem 1.4 now reads:(log + f (cid:48) ) (cid:16) d Q ∗ d P (cid:17) = ϕ ( x ) + ψ ( y ) , from which Q ∗ can even be determined depending on the choice of f . Some perspectives on the mean ﬁeld Schr¨odinger problem.

In the recent article [2] a meanﬁeld version of the Schr¨odinger problem has been introduced. A simpliﬁed discrete-time ver-sion of it consists in ﬁnding the most likely evolution conditionally to observations at initial anterminal times of the particle system ( X it ) i = ,..., N ; t = , , where ( X , . . . , X n ) are i.i.d. samples froma probability measure µ on R d and X it + − X it = − (cid:88) j ≤ N ∇ W ( X it − X jt ) + ξ it , i = , . . . , N , t = , . (1.7)Here the random variables ( ξ it ) i = ,..., N ; t = , are i.i.d. standard Gaussians. The large deviations ratefunction for the empirical distribution of the particle system (1.7) in the regime N → + ∞ isknown explicitly (see [21] for a general result in continuous time and [22] for the analysis ofthe toy model (1.7)) and leads to the following problem formulationinf (cid:40)(cid:90) h (cid:32) d Q d R ( Q ) ( x , x , x ) (cid:33) R ( Q )(d x , d x , d x ) : Q ∈ cpl( µ, ν ) (cid:41) . (1.8)In the above we denoted h ( x ) = x log x and, adapting the convention used throughout thispaper, we denoted by cpl( µ, ν ) the subset of P ( R d × R d × R d ) whose ﬁrst marginal ( t =

0) is µ and whose last ( t =

2) marginal is ν . Finally, for a given Q , R ( Q ) ∈ P ( R d × R d × R d ) is deﬁnedas the law of the controlled discrete stochastic di ﬀ erential equation  Z t + = Z t − (cid:82) ∇ W ( Z t − x t ) Q (d x , d x , d x ) + ξ t , t = , , Z ∼ µ, (1.9)where ( ξ , ξ , ξ ) are i.i.d. standard Gaussians. Despite several analogies with (1.3), includingthe fact that the function R ( · ) naturally introduces non-convexity into the problem, the analysisof (1.8) is outside the reach of this work, essentially because the “reference” measure R ( Q ) de-pends on Q . However, the heurisitcs put forward in the introduction based on the linearizationprocedure still apply and leads to natural conjectures on the kind of monotonicity principle andshape theorem for optimizers to be expected in this situation. For this reason, the present workis a ﬁrst step in the direction of developing and exploiting ever more powerful monotonicityprinciples. One of the main motivations for validating such conjectures for Problem (1.8) re-sides in the fact that a shape theorem for the mean ﬁeld Schr ¨odinger problems yields existenceof solutions for the coupled Fokker Planck-Hamilton Jacobi Bellman system describing the dy-namics of mean ﬁeld Schr ¨odinger bridges. We redirect the interested reader to [2, Sec 1.3] forthe precise form of such PDE system as well as for more explanations. NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TO SCHR ¨ODINGER TYPE PROBLEMS 7

2. P roofs

Proof of the Non-linear Monotonicity Principle: Lemma 1.3.

The proof requires twopreliminary results. The ﬁrst is a lemma telling essentially that, if G = G h directional derivativescan be computed with δ G Q ( ω ) = h (cid:48) (cid:16) d Q d P (cid:17) ( ω ) . More precisely, we will need this in the form of the following lemma:

Lemma 2.1.

Let h satisfy the hypotheses of Lemma 1.3. Consider now a probability measure Q and positive measures θ, θ (cid:48) satisfying(i) θ ( Ω ) = θ (cid:48) ( Ω ) .(ii) (cid:82) h (cid:16) d Q d P (cid:17) d P exists and is ﬁnite.(iii) θ ≤ Q , θ (cid:48) ≤ P .(iv) There is a constant l ∈ R such that − l ≤ h (cid:48) (cid:16) d Q d P (cid:17) ≤ l hold θ + θ (cid:48) -a.s.(v) (cid:82) h (cid:48) (cid:16) d Q d P (cid:17) d( θ (cid:48) − θ ) < .Setting Q ε : = Q + ε ( θ (cid:48) − θ ) we then ﬁnd that, for all ε > small enough, (cid:82) h (cid:16) d Q ε d P (cid:17) d P exists and (cid:82) h (cid:16) d Q ε d P (cid:17) d P < (cid:82) h (cid:16) d Q d P (cid:17) d P . Proof.

If 0 ≤ ε ≤ Q ε is a probability measure. By hypothesis h (cid:16) d Q ε d P (cid:17) ≥ h (cid:16) d Q d P (cid:17) + ε h (cid:48) (cid:16) d Q d P (cid:17) d( θ (cid:48) − θ )d P − ε C (cid:16) d( θ (cid:48) − θ )d P (cid:17) , (2.1)Combining (iii) and (iv) we getsup X×Y d( θ (cid:48) − θ )d P ≤ sup supp( θ ) ∪ supp( θ (cid:48) ) d θ (cid:48) d P + d θ d P ≤ + sup supp( θ ) d Q d P ≤ + sup ( h (cid:48) ) − ([0 , l ]) < + ∞ , where to obtain the last inequality we used that lim x → + ∞ h (cid:48) ( x ) = + ∞ . Using this result in (2.1)shows that (cid:82) h (cid:16) d Q ε d P (cid:17) d P exists and belongs to ( −∞ , + ∞ ]. Similarly, h (cid:16) d Q d P (cid:17) − h (cid:16) d Q ε d P (cid:17) ≥ ε h (cid:48) (cid:16) d Q ε d P (cid:17) d( θ − θ (cid:48) )d P − ε C (cid:16) d( θ (cid:48) − θ )d P (cid:17) . (2.2)Next, we observe that if we can prove that for γ = θ, θ (cid:48) we have (cid:82) h (cid:48) (cid:16) d Q ε d P (cid:17) d γ → (cid:82) h (cid:48) (cid:16) d Q d P (cid:17) d γ, then we obtain the conclusion dividing by ε on both sides in (2.2), integrating in d P and letting ε →

0. We only argue in the case when lim x ↓ h (cid:48) ( x ) = −∞ , the other case being simpler. In thiscase, condition (iv) implies that γ -a.s. d Q / d P takes values in a compact set of (0 , + ∞ ). Usingthis last observation and (iii) we deduce that γ -a.s. d Q ε / d P , viewed as a function of x , y and ε ,takes its values in a compact set of (0 , + ∞ ) provided ε is small enough. The desired conclusionfollows by dominated convergence. (cid:3) The second ingredient, towards the proof of Lemma 1.3, is the following result from [7],which is a consequence of a duality result by Kellerer [30].

JULIO BACKHOFF-VERAGUAS, MATHIAS BEIGLB ¨OCK, AND GIOVANNI CONFORTI

Lemma 2.2 ([7, Proposition 2.1]) . Let ( E i , m i ) , i ≤ k be Polish probability spaces, and M ananalytic subset of E × . . . × E k , then one of the following holds true:(i) there exist m i -null sets M i ⊆ E i such that M ⊆ (cid:83) ki = p − i ( M i ) , or(ii) there is a measure η on E × . . . × E k such that η ( M ) > and p i ( η ) ≤ m i for i = , . . . , k. All in all, we can prove Lemma 1.3 now:

Proof of Lemma 1.3.

Set d : = d Q ∗ d P and c : = h (cid:48) ◦ d .We want to ﬁnd ﬁnitely minimal sets Γ Q ∗ , Γ P supporting Q ∗ , P . To obtain this, it is su ﬃ cientto show that for each l ∈ N there are sets Γ Q ∗ , Γ P of full Q ∗ / P measure such that: for any ﬁnitemeasure α concentrated on at most l points in Γ Q ∗ and satisfying α ( Ω ) ≤ c (cid:22) supp α ≤ l ,there is no c -better competitor α (cid:48) on at most l points in Γ P and satisfying c (cid:22) supp α (cid:48) ≤ l . If weachieve this, we can just take the intersection over countably many such Γ Q ∗ , Γ P .Hence, ﬁx l and deﬁne M the subset of Ω l × Ω l through M = { (( z , . . . , z l ) , ( z (cid:48) , . . . , z (cid:48) l )) ∈ Ω l × Ω l : ∃ a measure α on Ω , α ( Ω ) ≤ , supp α ⊆ { z , . . . , z l } , − l ≤ c (cid:22) supp α ≤ l , s.t. there is a c -better competitor α (cid:48) , supp α (cid:48) ⊆ { z (cid:48) , . . . , z (cid:48) l } , − l ≤ c (cid:22) supp α (cid:48) ≤ l } . Note that M is a projection of the setˆ M = (cid:110) ( z , . . . , z l , α , . . . , α l , z (cid:48) , . . . , z (cid:48) l , α (cid:48) , . . . , α (cid:48) l , ) ∈ Ω l × R l + × Ω l × R l + : (cid:80) α i ≤ , (cid:80) α i = (cid:80) α (cid:48) i , − l ≤ c (cid:22) supp α ∪ supp α (cid:48) ≤ l , (cid:80) α i f ( z i ) = (cid:80) α (cid:48) i f ( z (cid:48) i ) for all f ∈ F , (cid:80) α i c ( z i ) > (cid:80) α (cid:48) i c ( z (cid:48) i ) (cid:111) . The set ˆ M is Borel; this is immediate if F is countable, and otherwise follows from the well-known argument that F ⊆ C b ( Ω ) contains a separating sequence. Hence M is an analytic set.We apply Lemma 2.2 to the l copies of the spaces ( Ω , Q ∗ ), ( Ω , P ) and the set M to obtainthe following: if (i) holds, there exist sets N , N with Q ∗ ( N ) = P ( N ) = M ⊆ N l × Ω l ∪ Ω l × N l . We set Γ Q ∗ : = Ω \ N , Γ P : = Ω \ N , which have full Q ∗ / P measurerespectively. From the deﬁnition of M it can be directly seen that Γ Q ∗ , Γ P are as needed.If (i) does not hold, (ii) has to. Hence, let us derive a contradiction from it.For j ≤ , i ≤ l , write p j for the projection of an element of Ω l × Ω l onto its (( j − × l + i )-thcomponent. We may assume that the measure η given by Point (ii) in Lemma 2.2 is concentratedon M , and also fulﬁlls p i ( η ) ≤ l Q ∗ , p i ( η ) ≤ l P for i = , . . . , l .We now apply Jankow – von Neumann uniformization [29, Theorem 18.1] to the set ˆ M todeﬁne a mapping M → ˆ M (2.3)( z , . . . , z l , z (cid:48) , . . . , z (cid:48) l ) (cid:55)→ (cid:0) z , . . . , z l , α ( z , z (cid:48) ) , . . . , α l ( z , z (cid:48) ) , z (cid:48) , . . . , z (cid:48) l , α (cid:48) ( z , z (cid:48) ) , . . . , α (cid:48) l ( z , z (cid:48) ) (cid:1) (2.4) [7, Proposition 2.1] is stated only for Borel sets, but the same proof applies in the case where M is analytic.We denote here by p i the projection onto the i -th coordinate. To be precise, we take E i = Ω , i = , . . . l , m i = Q ∗ if i ≤ l and m i = P otherwise. Noticing that the set M must be symmetric in its ﬁrst l coordinates, and also on the remaining l ones, we get that if a point is in M , then atleast one of its ﬁrst l coordinates are in a given Q ∗ -null set N , or one of the remaining l coordinates are in a given P -null set N . NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TO SCHR ¨ODINGER TYPE PROBLEMS 9 which is measurable with respect to the σ -algebra generated by the analytic subsets of Ω l × Ω l in the domain and the Borel sigma algebra of Ω l × R l + × Ω l × R l + in the range. In the above, wedenoted z i resp. z (cid:48) i is the i-th coordinate of z ∈ Ω l resp. z (cid:48) ∈ Ω l . Setting α ( z , z (cid:48) ) : = (cid:80) li = α i ( z , z (cid:48) ) δ z i , α (cid:48) ( z , z (cid:48) ) : = (cid:80) li = α (cid:48) i ( z , z (cid:48) ) δ z (cid:48) i , we thus obtain kernels ( z , z (cid:48) ) (cid:55)→ α ( z , z (cid:48) ) , ( z , z (cid:48) ) (cid:55)→ α (cid:48) ( z , z (cid:48) ) from Ω l × Ω l with the σ -algebragenerated by its analytic subsets to P ( Ω ) with its Borel-sets. We use these kernels to deﬁnemeasures θ, θ (cid:48) on the Borel-sets on Ω through θ ( B ) = (cid:82) α ( z , z (cid:48) ) ( B ) d η ( z , z (cid:48) ) , θ (cid:48) ( B ) = (cid:82) α (cid:48) ( z , z (cid:48) ) ( B ) d η ( z , z (cid:48) ) . (2.5)By construction θ ≤ Q ∗ . Indeed we have, θ ( B ) ≤ (cid:80) li = (cid:82) δ z i ( B )d η ( z , z (cid:48) ) = (cid:80) li = p i ( B ) ≤ Q ∗ ( B ) . Arguing similarly we obtain θ (cid:48) ≤ P . Moreover θ (cid:48) is a c -better competitor of θ . To see this, weﬁrst observe that for each f ∈ F we have (cid:82) Ω f (¯ z ) d θ (cid:48) (¯ z ) = (cid:82)(cid:82) f (¯ z ) d α (cid:48) ( z , z (cid:48) ) (¯ z )d η ( z , z (cid:48) ) = (cid:82)(cid:82) f (¯ z ) d α ( z , z (cid:48) ) (¯ z )d η ( z , z (cid:48) ) = (cid:82) Ω f (¯ z ) d θ (¯ z ) , (2.6)and similarly, since c ≤ l , ( θ + θ (cid:48) )-a.s. we obtain (cid:82) Ω c (¯ z ) d θ (cid:48) (¯ z ) = (cid:82)(cid:82) c (¯ z ) d α (cid:48) ( z , z (cid:48) ) (¯ z )d η ( z , z (cid:48) ) < (cid:82)(cid:82) c (¯ z ) d α ( z , z (cid:48) ) (¯ z )d η ( z , z (cid:48) ) = (cid:82) Ω c (¯ z ) d θ (¯ z ) . Therefore, since (cid:82) c d( θ (cid:48) − θ ) < Q ∗ ε = Q ∗ + ε ( θ (cid:48) − θ ),then (cid:82) h (cid:16) d Q ∗ ε d P (cid:17) d P < (cid:82) h (cid:16) d Q ∗ d P (cid:17) d P for ε small enough. Since (2.6) makes sure that Q ∗ ε ∈ Adm, we have derived a contradiction tothe optimality of Q ∗ . (cid:3) Proof of Necessity: Theorems 1.4 and 1.5.

Proof of Theorem 1.4 .

Let Γ Q ∗ , Γ P be as in Lemma 1.3. Passing to subsets if necessary we mayassume that proj X Γ Q ∗ = X , proj Y Γ Q ∗ = Y , Γ Q ∗ ⊆ Γ P . Apparently Q ∗ (cid:28) P . Hence, shrinking Γ Q ∗ by an irrelevant P -null set, we may assume that Γ Q ∗ ⊆ { d ( x , y ) > } = { h (cid:48) ( d ( x , y )) > −∞} , with d : = d Q ∗ / d P . In the present transport case the ﬁnitistic optimality property (1.2) boils down tocyclical monotonicity, i.e. we ﬁnd that for ( x i , y i ) ∈ Γ Q ∗ , i ≤ N , x N + = x , ( x i + , y i ) ∈ Γ P we have (cid:80) i ≤ N h (cid:48) ◦ d ( x i , y i ) ≤ (cid:80) i ≤ N h (cid:48) ◦ d ( x i + , y i ) . (2.7)We say that x i , y i , i ≤ N form a ( Γ Q ∗ , Γ P )-path if ( x i , y i ) ∈ Γ Q ∗ for i ≤ N , ( x i + , y i ) ∈ Γ P for i ≤ N − P ∼ µ ⊗ ν we can apply [7, Lemma 4.3] with the cost function c : = Γ P and c : = + ∞ otherwise, to obtain that there exist full measure subsets ˜ X ⊆ X , ˜ Y ⊆ Y suchthat for any points ( x , y ) , ( ¯ x , ¯ y ) ∈ ˜ Γ Q ∗ : = Γ Q ∩ ( ˜ X × ˜ Y ) there exists a ( ˜ Γ Q ∗ , Γ P )-path satisfying( x , y ) = ( x , y ) and ( x N , y N ) = ( ¯ x , ¯ y ). Passing to subsets if necessary, we can w.l.o.g. assumethat ˜ X = X , ˜ Y = Y , ˜ Γ Q ∗ = Γ Q ∗ . We use this to establish d ( x , y ) > x , y ) ∈ Γ P . (2.8) To see this, pick an arbitrary point ( x , ¯ y ) ∈ Γ P and points ¯ x , y such that ( ¯ x , ¯ y ) , ( x , y ) ∈ Γ Q ∗ anda ( Γ Q ∗ , Γ P )-path ( x , y ) : = ( x , y ) , ( x , y ) , . . . , ( x N , y N ) : = ( ¯ x , ¯ y ) which connects these points. By(2.7) we then have (with x N + = x ) (cid:80) i ≤ N h (cid:48) ◦ d ( x i , y i ) ≤ (cid:80) i ≤ N h (cid:48) ◦ d ( x i + , y i ) (2.9) ⇔ (cid:80) i ≤ N h (cid:48) ◦ d ( x i , y i ) ≤ (cid:80) i ≤ N − h (cid:48) ◦ d ( x i + , y i ) + h (cid:48) ◦ d ( x , ¯ y ) . (2.10)Since the left hand side is ﬁnitely valued and h (cid:48) ◦ d ( x i + , y i ) < ∞ for i ≤ N − h (cid:48) ◦ d ( x , ¯ y ) > −∞ . This establishes (2.8). It follows that P ∼ Q ∗ and, by passing to subsets ifneeded, we can assume without loss of generality that Γ Q ∗ = Γ P . Next, we say that ( x i , y i ) , i ≤ N form a Γ Q ∗ -loop if ( x i , y i ) , ( x i + , y i ) ∈ Γ Q ∗ for i ≤ N , where x N + : = x . Note that for any Γ Q ∗ -loopwe have (cid:80) i ≤ N h (cid:48) ◦ d ( x i , y i ) = (cid:80) i ≤ N h (cid:48) ◦ d ( x i + , y i ); (2.11)to see this, apply (2.7) twice, i.e. to the loop in the usual direction as well as to running the loopin the ‘reverse’ direction. By [35, Prop. 1], Condition 2.11 is necessary and su ﬃ cient to obtainfunctions ϕ, ψ satisfying h (cid:48) ◦ d ( x , y ) = ϕ ( x ) + ψ ( y ) , (2.12)for all ( x , y ) ∈ Γ Q ∗ .Fix x ∈ X , and observe that (2.12) yields (cid:80) i ≤ M h (cid:48) ◦ d ( x i + , y i ) − h (cid:48) ◦ d ( x i , y i ) = ϕ ( x ) − ϕ ( x ) , whenever ( x i , y i ) , ( x i + , y i ) ∈ Γ Q ∗ for i ≤ M is such that x M + = x . In particular we have ϕ ( x ) = inf (cid:8)(cid:80) i ≤ M h (cid:48) ◦ d ( x i + , y i ) − h (cid:48) ◦ d ( x i , y i ) + ϕ ( x ) : ( x i , y i ) , ( x i + , y i ) ∈ Γ Q ∗ for i ≤ M , x M + = x (cid:9) . (2.13)The right hand side of (2.13) is upper semi-analytic, hence ϕ is upper semi-analytic. Ofcourse (2.13) pertains if we replace the inf with a sup, hence ϕ is also lower semi-analytic.Putting the two together, we ﬁnd that ϕ is Borel. (cid:3) Proof of Theorem 1.5.

The start of the proof is the same as the one of Theorem 1.4: Let Γ Q ∗ , Γ P be as in Lemma 1.3. Apparently Q ∗ (cid:28) P and we may assume that Γ Q ∗ ⊆ Γ P . Redeﬁning d : = d Q ∗ d P on an irrelevant P -null set we may assume that d = + ∞ exactly on ( X × Y ) \ Γ P .As above, the ﬁnitistic optimality property amounts to cyclical monotonicity, i.e. we ﬁnd thatfor ( x i , y i ) ∈ Γ Q ∗ , i ≤ N , x N + = x we have (cid:80) i ≤ N h (cid:48) ◦ d ( x i , y i ) ≤ (cid:80) i ≤ N h (cid:48) ◦ d ( x i + , y i ) . (2.14)Passing to subsets if necessary we may assume that proj X Γ Q ∗ = X , proj Y Γ Q ∗ = Y , Γ Q ∗ ⊆ Γ P .We say that x i , y i , i ≤ N form a ( Γ , d )-path if ( x i , y i ) ∈ Γ for i ≤ N , d ( x i + , y i ) < ∞ for i ≤ N − P ∼ µ ⊗ ν we can apply [7, Lemma 4.3] (with the cost function c = Γ P and + ∞ else) to obtain the following: This can be assumed w.l.o.g. since for instance we can replace Γ P by Γ P ∩ X × proj Y ( Γ Q ∗ ) ∩ proj X ( Γ Q ∗ ) × Y . If M is ﬁxed in the r.h.s. of (2.13), then we would have the partial inﬁmum of a jointly Borel function, whichmust be upper semi-analytic. This pertain as we let M ∈ N . For the same reason that a set is Borel i ﬀ it and itscomplement are analytic (Suslin theorem) we have that ϕ must be Borel. Note that we do not have to assume ( x i , y i + ) ∈ Γ P since h (cid:48) ◦ d ( x i , y i + ) = + ∞ whenever ( x i , y i + ) (cid:60) Γ P . NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TO SCHR ¨ODINGER TYPE PROBLEMS 11

There exist full measure subsets X ⊆ X , Y ⊆ Y such that for any points ( x , y ) , ( ¯ x , ¯ y ) ∈ Γ : =Γ Q ∗ ∩ ( X × Y ) there exists a ( Γ , c )-path satisfying ( x , y ) = ( x , y ) and ( x N , y N ) = ( ¯ x , ¯ y ). Ofcourse, we immediately assume wlog that X = X , Y = Y , Γ = Γ Q ∗ .In the terms of [7] we would say that ( Γ Q ∗ , h (cid:48) ◦ d ) is connecting . It then follows from [7,Proposition 3.2] that there exist Borel-functions ϕ : X → [ −∞ , ∞ ) , ψ : Y → [ −∞ , ∞ ) such thatfor all x ∈ X , y ∈ Y ϕ ( x ) + ψ ( y ) ≤ h (cid:48) ◦ d ( x , y )with equality holding Q ∗ -a.s. Hence we also have h (cid:48) ◦ d Q ∗ d P ( x , y ) = ( ϕ ( x ) + ψ ( y )) d Q ∗ d P ( x , y ) > = ( ϕ ( x ) + ψ ( y )) + , P − a.s. (2.15) (cid:3) Proof of Su ﬃ ciency: Theorem 1.6. Proof of Theorem 1.6.

If Problem (1.3) has value + ∞ then there is nothing to prove. Hence, let Zd P ∈ cpl( µ, ν ) with I : = (cid:82) h ( Z )d P < ∞ .Denote h ∗ ( y ) = sup x ≥ { xy − h ( x ) } , and notice that under the assumptions on h we have h ∗ ( y ) = ( h (cid:48) ) − ( y + ) y − h ◦ ( h (cid:48) ) − ( y + ).Introduce ϕ n ( x ) = ( − n ) ∨ ϕ ( x ) ∧ n , ψ n ( x ) = ( − n ) ∨ ψ ( x ) ∧ n . Clearly (c.f. [43, Lemma 3]), on { ϕ + ψ ≥ } we have 0 ≤ ϕ n + ψ n (cid:37) ϕ + ψ , while on { ϕ + ψ ≤ } we have 0 ≥ ϕ n + ψ n (cid:38) ϕ + ψ .Since h ( Z ) ≥ ( ϕ n + ψ n ) Z − h ∗ ( ϕ n + ψ n ) , we ﬁnd I ≥ (cid:82) ( ϕ n + ψ n ) Z d P − (cid:82) h ∗ ( ϕ n + ψ n )d P = (cid:82) ϕ n d µ + (cid:82) ψ n d ν − (cid:82) h ∗ ( ϕ n + ψ n )d P = (cid:82) ( ϕ n + ψ n ) d Q ∗ d P d P − (cid:82) h ∗ ( ϕ n + ψ n )d P . Since h ∗ ( · ) is increasing, we have by monotone convergence (cid:82) ϕ + ψ ≥ h ∗ ( ϕ n + ψ n )d P → (cid:82) ϕ + ψ ≥ h ∗ ( ϕ + ψ )d P , and (cid:82) ϕ + ψ ≤ h ∗ ( ϕ n + ψ n )d P → (cid:82) ϕ + ψ ≤ h ∗ ( ϕ + ψ )d P . Since h ∗ ( · ) ≥ − h (0) we can collect integrals and conclude (cid:82) h ∗ ( ϕ n + ψ n )d P → (cid:82) h ∗ ( ϕ + ψ )d P , and the right hand side is ( −∞ , ∞ ]-valued.By convexity h (cid:16) d Q ∗ d P (cid:17) + d Q ∗ d P h (cid:48) (cid:16) d Q ∗ d P (cid:17) ≤ h (cid:16) d Q ∗ d P (cid:17) , and since by assumption on h the r.h.s. in P -integrable, we deduce that (cid:82) [ ϕ ( x ) + ψ ( y )] + d Q ∗ ( x , y ) = (cid:82) h (cid:48) (cid:16) d Q ∗ d P (cid:17) d Q ∗ < + ∞ . In similar fashion as above, we get (cid:82) ( ϕ n + ψ n ) d Q ∗ d P d P → (cid:82) ( ϕ + ψ ) d Q ∗ d P d P , since (cid:82) ϕ + ψ ≥ ( ϕ + ψ ) d Q ∗ d P d P < ∞ and in particular (cid:82) ( ϕ + ψ ) d Q ∗ d P d P ∈ [ −∞ , ∞ ). Collecting integrals we have found (cid:82) ( ϕ n + ψ n ) d Q ∗ d P d P − (cid:82) h ∗ ( ϕ n + ψ n )d P → (cid:82) ( ϕ + ψ ) d Q ∗ d P d P − (cid:82) h ∗ ( ϕ + ψ )d P = (cid:82) (cid:104) ( ϕ + ψ ) d Q ∗ d P − h ∗ ( ϕ + ψ ) (cid:105) d P = (cid:82) (cid:104) ( ϕ + ψ )( h (cid:48) ) − ([ ϕ ( x ) + ψ ( y )] + ) − h ∗ ( ϕ + ψ ) (cid:105) d P = (cid:82) h ◦ ( h (cid:48) ) − ([ ϕ ( x ) + ψ ( y )] + ) d P = (cid:82) h (cid:16) d Q ∗ d P (cid:17) d P . We conclude that I ≥ (cid:82) h (d Q ∗ / d P )d P . (cid:3) R eferences [1] L. Ambrosio and A. Pratelli. Existence and stability results in the L theory of optimal transportation. In Optimal transportation and applications (Martina Franca, 2001) , volume 1813 of

Lecture Notes in Math. ,pages 123–160. Springer, Berlin, 2003.[2] J. Backho ﬀ , G. Conforti, I. Gentil, and C. L´eonard. The mean ﬁeld schr¨odinger problem: ergodic behavior,entropy estimates and functional inequalities. Probability Theory and Related Fields , 178(1):475–530, 2020.[3] J. Backho ﬀ -Veraguas, M. Beiglb¨ock, M. Huesmann, and S. K¨allblad. Martingale benamou–brenier: a proba-bilistic perspective. Annals of Probability , 48(5):2258–2289, 2020.[4] J. Backho ﬀ -Veraguas, M. Beiglb¨ock, and G. Pammer. Existence, duality, and cyclical monotonicity for weaktransport costs. Calculus of Variations and Partial Di ﬀ erential Equations , 58(6):203, 2019.[5] M. Beiglb¨ock, A. Cox, and M. Huesmann. Optimal transport and Skorokhod embedding. Invent. Math. ,208(2):327–400, 2017.[6] M. Beiglb¨ock, M. Eder, C. Elgert, and U. Schmock. Geometry of distribution-constrained optimal stoppingproblems.

Probability theory and related ﬁelds , 172(1-2):71–101, 2018.[7] M. Beiglb¨ock, M. Goldstern, G. Maresch, and W. Schachermayer. Optimal and better transport plans.

J.Funct. Anal. , 256(6):1907–1927, 2009.[8] M. Beiglb¨ock and C. Griessler. A land of monotone plenty.

Annali della SNS , Vol. XIX, issue 1, Apr. 2019.[9] M. Beiglb¨ock and N. Juillet. On a problem of optimal transport under marginal martingale constraints.

Ann.Probab. , 44(1):42–106, 2016.[10] M. Beiglb¨ock, M. Nutz, and F. Stebegg. Fine properties of the optimal skorokhod embedding problem.

JEMS,to appear , 2020.[11] M. Beiglb¨ock, M. Nutz, and N. Touzi. Complete duality for martingale optimal transport on the line.

TheAnnals of Probability , 45(5):3038–3074, 2017.[12] E. Bernton, P. Ghosal, and M. Nutz. Entropic optimal transport: geometry and large deviations.

Work inProgress , 2021.[13] A. Beurling. An automorphism of product measures.

Annals of Mathematics , pages 189–200, 1960.[14] S. Bianchini and L. Caravenna. On optimality of c -cyclically monotone transference plans. C. R. Math. Acad.Sci. Paris , 348(11-12):613–618, 2010.[15] J. M. Borwein and A. S. Lewis. Decomposition of multivariate functions.

Canadian journal of mathematics ,44(3):463–482, 1992.[16] J. M. Borwein, A. S. Lewis, and R. D. Nussbaum. Entropy minimization, dad problems, and doubly stochastickernels.

Journal of Functional Analysis , 123(2):264–307, 1994.[17] G. Carlier and M. Laborde. A di ﬀ erential approach to the multi-marginal schroedinger system. SIAM Journalon Mathematical Analysis , 52(1):709–717, 2020.[18] P. Cattiaux, F. Gamboa, et al. Large deviations and variational theorems for marginal problems.

Bernoulli ,5(1):81–108, 1999.[19] M. Colombo, L. De Pascale, and S. Di Marino. Multimarginal optimal transport maps for one-dimensionalrepulsive costs.

Canad. J. Math. , 67(2):350–368, 2015.

NON-LINEAR MONOTONICITY PRINCIPLE AND APPLICATIONS TO SCHR ¨ODINGER TYPE PROBLEMS 13 [20] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.

Advances in neural informationprocessing systems , 26:2292–2300, 2013.[21] D. Dawson and J. G¨artner. Large deviations from the Mckean-Vlasov limit for weakly interacting di ﬀ usions. Stochastics: An International Journal of Probability and Stochastic Processes , 20(4):247–308, 1987.[22] M. Fischer. On the form of the large deviation rate function for the empirical measures of weakly interactingsystems.

Bernoulli , 20(4):1765–1801, 2014.[23] R. Fortet. R´esolution d’un systeme d’´equations de M. Schr¨odinger.

J. Math. Pure Appl. IX , 1:83–105, 1940.[24] W. Gangbo and R. McCann. The geometry of optimal transportation.

Acta Math. , 177(2):113–161, 1996.[25] N. Gigli and L. Tamanini. Second order di ﬀ erentiation formula on rcd(k,n) spaces. arXiv preprintarXiv:1802.02463 , 2018.[26] N. Gozlan and N. Juillet. On a mixture of brenier and strassen theorems. Proceedings of the London Mathe-matical Society , 120(3):434–463, 2020.[27] C. Griessler. c -cyclical monotonicity as a su ﬃ cient criterion for optimality in the multimarginal monge–kantorovich problem. Proceedings of the American Mathematical Society , 146(11):4735–4740, 2018.[28] G. Guo, X. Tan, and N. Touzi. On the monotonicity principle of optimal skorokhod embedding problem.

SIAM Journal on Control and Optimization , 54(5):2478–2489, 2016.[29] A. S. Kechris.

Classical descriptive set theory , volume 156 of

Graduate Texts in Mathematics . Springer-Verlag, New York, 1995.[30] H. Kellerer. Duality theorems for marginal problems.

Z. Wahrsch. Verw. Gebiete , 67(4):399–432, 1984.[31] M. Knott and C. Smith. On Hoe ﬀ ding-Fr´echet bounds and cyclic monotone relations. J. Multivariate Anal. ,40(2):328–334, 1992.[32] C. L´eonard. Minimizers of energy functionals.

Acta Mathematica Hungarica , 93(4):281–325, 2001.[33] C. L´eonard. Entropic projections and dominating points.

ESAIM: Probability and Statistics , 14:343–381,2010.[34] C. L´eonard. A survey of the Schr¨odinger problem and some of its connections with optimal transport.

DiscreteContin. Dyn. Syst. , 34(4):1533–1574, 2014.[35] B. D. Miller. Coordinatewise decomposition of group-valued Borel functions.

Fund. Math. , 196(2):119–126,2007.[36] M. Nutz and F. Stebegg. Canonical supermartingale couplings.

The Annals of Probability , 46(6):3351–3398,2018.[37] S. Pal and T.-K. L. Wong. The geometry of relative arbitrage.

Math. Financ. Econ. , 10(3):263–293, 2016.[38] B. Pass. On the local structure of optimal measures in the multi-marginal optimal transportation problem.

Calc. Var. Partial Di ﬀ erential Equations , 43(3-4):529–536, 2012.[39] T. Pennanen and A.-P. Perkki¨o. Convex duality in nonlinear optimal transport. Journal of Functional Analysis ,277(4):1029–1060, 2019.[40] A. Pratelli. On the su ﬃ ciency of c-cyclical monotonicity for optimality of transport plans. MathematischeZeitschrift , 258(3):677–690, 2008.[41] L. R¨uschendorf. On c -optimal random variables. Statist. Probab. Lett. , 27(3):267–270, 1996.[42] L. R¨uschendorf and W. Thomsen. Note on the schr¨odinger equation and i-projections.

Statistics and proba-bility letters , 17(5):369–375, 1993.[43] W. Schachermayer and J. Teichmann. Characterization of optimal transport plans for the Monge-Kantorovichproblem.

Proc. Amer. Math. Soc. , 137(2):519–529, 2009.[44] C. Villani.

Topics in optimal transportation , volume 58 of

Graduate Studies in Mathematics . American Math-ematical Society, Providence, RI, 2003.[45] D. A. Zaev. On the monge–kantorovich problem with additional linear constraints.