[PDF] Full characterization of optimal transport plans for concave costs

Abstract

This paper slightly improves a classical result by Gangbo and McCann (1996) about the structure of optimal transport plans for costs that are concave functions of the Euclidean distance. Since the main difficulty for proving the existence of an optimal map comes from the possible singularity of the cost at 0 , everything is quite easy if the supports of the two measures are disjoint; Gangbo and McCann proved the result under the assumption $\mu(\spt(\nu))=0$; in this paper we replace this assumption with the fact that the two measures are singular to each other. In this case it is possible to prove the existence of an optimal transport map, provided the starting measure μ does not give mass to small sets (i.e. (d−1)− rectifiable sets). When the measures are not singular the optimal transport plan decomposes into two parts, one concentrated on the diagonal and the other being a transport map between mutually singular measures.

Full PDF

aa r X i v : . [ m a t h . O C ] S e p FULL CHARACTERIZATION OF OPTIMAL TRANSPORTPLANS FOR CONCAVE COSTS

PAUL PEGON, DAVIDE PIAZZOLI, FILIPPO SANTAMBROGIO

Abstract.

This paper slightly improves a classical result by Gangbo andMcCann (1996) about the structure of optimal transport plans for costs thatare strictly concave and increasing functions of the Euclidean distance. Sincethe main diﬃculty for proving the existence of an optimal map comes from thepossible singularity of the cost at , everything is quite easy if the supports ofthe two measures are disjoint; Gangbo and McCann proved the result under theassumption µ (supp( ν )) = 0 ; in this paper we replace this assumption with thefact that the two measures are singular to each other. In this case it is possibleto prove the existence of an optimal transport map, provided the startingmeasure µ does not give mass to small sets (i.e. ( d − − rectiﬁable sets). Whenthe measures are not singular the optimal transport plan decomposes into twoparts, one concentrated on the diagonal and the other being a transport mapbetween mutually singular measures. MSC 2010 : 49J45, 49K21, 49Q20, 28A75

Keywords :

Monge-Kantorovich, transport maps, approximate gradient, recti-ﬁable sets, density points

Contents

1. Introduction 22. Tools 52.1. General facts on optimal transportation 52.2. The approximate gradient 93. The absolutely continuous case 104. The general case: µ does not give mass to small sets 124.1. A GMT lemma for density points 124.2. How to handle non-absolutely continuous measures µ µ − approximated gradient 18References 20 Date : December 12, 2017. Introduction

Optimal transport is nowadays a very powerful and widely studied theory formany applications and connections with other pieces of mathematics. The mini-mization problem ( M ) min (cid:26) Z c ( x, T ( x )) dµ ( x ) : T µ = ν (cid:27) proposed by Monge in 1781 (see [13]) has been deeply understood thanks to therelaxation proposed by Kantorovich in [10] in the form of a linear programmingproblem(1.1) ( K ) min (cid:26)Z R d × R d c dγ : γ ∈ Π( µ, ν ) (cid:27) . Here µ and ν are two probability measures on R d and c a cost function c : R d × R d → R . The set Π( µ, ν ) is the set of the so-called transport plans , i.e. Π( µ, ν ) = { γ ∈ P ( R d × R d ) : ( π x ) γ = µ, ( π y ) γ = ν } where π x and π y are thetwo projections of R d × R d onto R d . These probability measures over R d × R d arean alternative way to describe the displacement of the particles of µ : instead ofsaying, for each x , which is the destination T ( x ) of the particle originally locatedat x , we say for each pair ( x, y ) how many particles go from x to y . It is clearthat this description allows for more general movements, since from a single point x particles can a priori move to diﬀerent destinations y . If multiple destinationsreally occur, then this movement cannot be described through a map T .The Kantorovich problem is interesting in itself and carries many of the featuresof Monge’s one. Since it can be rigorously proven to be its relaxation in the senseof l.s.c. envelops, the minimal value of the two problems is the same, provided c is continuous. For many applications, dealing with the optimum of (K) is enough.Yet, a very classical question is whether the optimizer γ of (K) is such that foralmost every x only one point y can be such that ( x, y ) ∈ supp( γ ) . In this case γ will be of the form ( id × T ) µ and will provide an optimal transport map for (M).For several diﬀerent applications, from ﬂuid mechanics to diﬀerential geometry,the case which has been studied the most is the quadratic one, c ( x, y ) = | x − y | ,ﬁrst solved by Brenier in [2]. Other costs which are strictly convex functions of x − y , for instance all the powers | x − y | p , p > , can be dealt with in a similar way.Next Section will give a general strategy to prove the existence of a transport mapwhich ﬁts very well this case. The limit case c ( x, y ) = | x − y | , which was by theway the original interest of Monge, has also received much attention.Yet, another very natural case is that of concave costs, more precisely c ( x, y ) = ℓ ( | x − y | ) where ℓ : R + → R + is a strictly concave and increasing function. Fromthe economical and modelization point of view, this is the most natural choice:moving a mass has a cost which is proportionally less if the distance increases,as everybody can notice from travel fares. In many practical cases, moving twomasses on a distance d each is more expensive than moving one at distance d andkeeping at rest the other. The typical example is the power cost | x − y | α , α < .Notice that all these costs satisfy the triangle inequality and are thus distanceson R d . Among the other interesting features of these costs, let us mention two.From the theoretical point of view, there is the fact that all power costs | x − y | α with α < satisfy Ma-Trudinger-Wang assumption for regularity, see [12]. From PTIMAL TRANSPORT FOR CONCAVE COSTS 3 the computational point of view, the subadditivity properties of these costs allowfor some eﬃcient algorithms using local indicators, at least in the one dimensionaldiscrete case, see [7].Moreover, under strict convexity assumptions, these costs satisfy a strict triangleinequality (see Lemma 2.1). This last fact implies (see Proposition 2.2, but it isa classical fact) that the common mass between µ and ν must stay at rest. Thisgives a ﬁrst constraint on how to build optimal plans γ : look at µ and ν , take thecommon part µ ∧ ν , leave it on place, subtract it from the rest, and then build anoptimal transport between the two remainders, which will have no mass in common.Notice that when the cost c is linear in the Euclidean distance, then the commonmass may stay at rest but is not forced to do so (the very well known example isthe transport from µ = L [0 , and ν = L [ , ] , where both T ( x ) = x + and T ( x ) = x + 1 on [0 , ] and T ( x ) = x on ] , are optimal); on the contrary, whenthe cost is strictly convex in the Euclidean distance, in general the common mass does not stay at rest (in the previous example only the translation is optimal for c ( x, y ) = | x − y | p , p > ). Notice that the fact that the common mass stays at restimplies that in general there is no optimal map T , since whenever there is a set A with µ ( A ) > ( µ ∧ ν )( A ) = ν ( A ) then almost all the points of A must have twoimages: themselves, and another point outside A .Yet, this suggests to study the case where µ and ν are mutually singular, andthe best one can do would be proving the existence of an optimal map in this case.This is a good point, since the singularity of the function ( x, y ) ℓ ( | x − y | ) ismainly concentrated on the diagonal { x = y } (look at the example | x − y | α ), andwhen the two measures have no common mass almost no point x is transported to y = x . Yet, exploiting this fact needs some attention.First, a typical assumption on the starting measure is required: we need tosuppose that µ does not give mass to ( d − − rectiﬁable sets. This is standardand common with other costs, such as the quadratic one. From the technicalpoint of view, this is needed in order to guarantee µ − a.e. diﬀerentiability of theKantorovich potential, and counter-examples are known without this assumption(see next Section both for Kantorovich potentials and for counter-examples).Hence, if we add this assumption on µ , the easiest case is when µ and ν havedisjoint supports, since in this case there is a lower bound on | x − y | and this allowto stay away from the singularity. Yet, supp µ ∩ supp ν = ∅ is too restrictive, sinceeven in the case where µ and ν have smooth densities f and g it may happen that,after subtracting the common mass, the two supports meet on the region { f = g } .The problem has actually been solved in one of the ﬁrst papers about optimaltransportation, written by Gangbo and McCann in 1996, [9], where they choosethe slightly less restrictive assumption µ (supp( ν )) = 0 . This assumption coversthe example above of two continuous densities, but does not cover other cases suchas µ being the Lebesgue measure on a bounded domain Ω and ν being an atomicmeasure with an atom at each rational point, or other examples that one can buildwith fully supported absolutely continuous measures concentrated on disjoint sets A and Ω \ A (see Section 5.3). The present paper completes the proof by Gangbo andMcCann, making use of recent ideas on optimal transportation to tackle the generalcase; i.e. we solve the problem under the only assumption that µ and ν are singularto each other (and that µ does not give mass to “small” (i.e. ( d − − rectiﬁable)sets). From the case of mutually singular measures we can deduce how to deal withthe case of measures with a common . The title of the paper exactly refers to this PAUL PEGON, DAVIDE PIAZZOLI, FILIPPO SANTAMBROGIO fact: by “characterization of optimal transport plans” we mean “understanding theirstructure, composed by a diagonal part and a transport map out of the diagonal”,the word “full” stands for the fact that we arrive to the mimimal set of assumptionswith respect to previous works, and by “concave costs” we indeed mean “costs whichare strictly concave and increasing functions of the euclidean distance”.In Section 2 we present the main tools that we need. Section 2.1 is devotedto well-known facts from optimal transport theory: we recall the usual strategyto prove the existence of an optimal T based on the Kantorovich potential and,after proving that the common mass stays at rest via a c − cyclical monotonicityargument, we adapt them to the concave case. Section 2.2 recalls the notion andsome properties of approximate gradients, which we will use in the following Section.In Section 3 we start generalizing Gangbo and McCann’s result, in the casewhere µ is absolutely continuous. In this case we use the fact that almost everypoint x is sent to a diﬀerent point y = x together with density points argumentin order to prove that the Kantorovich potential is approximately diﬀerentiable al-most everywhere, according to the notions that we presented in Section 2.2. Noticethat this strategy is very much linked to many arguments recently used in optimaltransportation in [4, 6], where the existence of an optimal map is proven by restrict-ing the transport plan γ to a suitable set of Lebesgue points which is c − cyclicallymonotone. Here we do not need to address explicitly such a construction but theidea is very much similar. The result we obtain in this section is not contained in[9] but it does not contain their result neither.It is in Section 4 that we consider the most general case of an arbitrary measure µ not giving mass to small sets. The approximate diﬀerentiability of the potentialin Section 3 was based on Lebesgue points arguments which require to be adaptedto this new framework. This is why we present an interesting Geometric MeasureTheory Lemma (Lemma 4.1) which states that, whenever µ does not give mass tosmall sets, then µ − almost every point x is such that every cone with vertex at x ,of arbitrary size, direction and opening, has positive mass for µ . This lemma is notnew, but it is surprisingly not so well-known, at least in this very formulation, in thegeometric measure theory community (a weaker version of this lemma is actuallycontained in the classical book [8]). On the contrary, it starts being popular in theoptimal transport community (see [5], where the authors prove it and say that it hasbeen presented to them by T. Rajala), which is strange if we think that it is reallya GMT statement. Yet, even if not concerned directly with optimal transport, ithas been popularized thanks to its applications in optimal transport theory.For the sake of completeness, we present the proof of this lemma in Section 4.1,even if we stress that the contribution of the paper is not in such a proof; but inthe applications of this result to the diﬀerentiability of the Kantorovich potentialthat we face. This is what we do in Section 4.2, where we deﬁne an ad-hoc notionof gradient for the Kantorovich potential, by using this density result to prove thatit is well-deﬁned µ − a.e. and that it satisﬁes all the properties we need. Finally, weprove that the optimal γ is concentrated on a graph.In this way we can now state the main theorem of this paper. Deﬁne µ ∧ ν asthe maximal positive measure which is both less or equal than µ and than ν , and ( µ − ν ) + = µ − µ ∧ ν , so that the two measures µ and ν uniquely decompose intoa common part µ ∧ ν and two mutually singular parts ( µ − ν ) + and ( ν − µ ) + . Theorem 1.1.

Suppose that µ and ν are probability measures on R d such that ( µ − ν ) + gives no mass to all ( d − − rectiﬁable sets, and take the cost c ( x, y ) = ℓ ( | x − y | ) , PTIMAL TRANSPORT FOR CONCAVE COSTS 5 for ℓ : R + → R + strictly concave and increasing. Then there exists a unique optimaltransport plan γ , and it has the form ( id, id ) ( µ ∧ ν ) + ( id, T ) ( µ − ν ) + . This theorem is obtained as a corollary of Theorem 4.6 which concerns measureswith no common mass, whereas Theorem 3.3 is a simpliﬁed version of the samestatement, where the assumption that µ is absolutely continuous plays an importantrole.The paper ends with an appendix in two parts. One explains that, diﬀerentlyfrom convex costs, in the case of concave costs translations are never optimal, whilethe second one presents a discussion about the possibility of deﬁning a sort ofapproximate gradient adapted to the measure µ .2. Tools

General facts on optimal transportation.

We start the preliminaries ofthis paper with some important and well-known facts about Kantrovich linear pro-gramming problem(2.1) ( K ) min (cid:26)Z R d × R d c dγ : γ ∈ Π( µ, ν ) (cid:27) , Here the cost function c : R d × R d → R + is supposed to be continuous. We cansuppose the supports of the two measures to be compact, for simplicity, but we donot need it (yet, in this case it is better to suppose c to be uniformly continuous).We need to underline some main aspects of this problem (K). First, as any linearprogramming problem, it admits a dual problem, which reads(2.2) ( D ) max (cid:26)Z φ dµ + Z ψ dν : φ ( x )+ ψ ( y ) ≤ c ( x, y ) for all ( x, y ) ∈ R d × R d (cid:27) , where the supremum is computed over all pairs ( φ, ψ ) of continuous functions on R d . We refer to [15] for this duality relation. Here are some of the properties of(K), (D) and the connection between them: • (K) admits at least a solution γ , called optimal transport plan ; • (D) also has a solution ( φ, ψ ) ; • The functions ( φ, ψ ) are such that ψ ( y ) = inf x c ( x, y ) − φ ( x ) and φ ( x ) = inf y c ( x, y ) − ψ ( y ) (we say that they are conjugate to each other and φ = ψ c is the c − transformof ψ and ψ = φ c is the c − transform of φ ). Hence, the dual problemcan be expressed in terms of one only function φ , taking ψ = φ c (whichautomatically implies the constraint φ ( x ) + φ c ( y ) ≤ c ( x, y ) . Any optimal φ is called Kantorovich potential . • Given γ and a pair ( φ, ψ ) then γ is optimal for (K) and ( φ, ψ ) is optimalfor (D) if and only if the equality φ ( x ) + ψ ( y ) = c ( x, y ) holds for all ( x, y ) ∈ supp γ .An interesting consequence of this last property is the fact that the support Γ ofany optimal γ is c − cyclically monotone. Deﬁnition 1.

Given a function c : R d × R d → R , we say that a set Γ ⊂ R d × R d is c − cyclically monotone (brieﬂy c − CM) if, for every k ∈ N , every permutation σ PAUL PEGON, DAVIDE PIAZZOLI, FILIPPO SANTAMBROGIO of k elements and every ﬁnite family of points ( x , y ) , . . . , ( x k , y k ) ∈ Γ we have k X i =1 c ( x i , y i ) ≤ k X i =1 c ( x i , y σ ( i ) ) . The word “cyclical” refers to the fact that we can restrict our attention to cyclicalpermutations. The word “monotone” is a left-over from the case c ( x, y ) = − x · y .Indeed, it is easy to check that Γ is c − CM from the fact that P ki =1 c ( x i , y i ) = P ki =1 φ ( x i )+ ψ ( y i ) = P ki =1 φ ( x i )+ ψ ( y σ ( i ) ) ≤ P ki =1 c ( x i , y σ ( i ) ) . This fact has a veryinteresting consequence in the case we consider in this paper, i.e. c ( x, y ) = ℓ ( | x − y | ) with ℓ increasing and strictly concave. Lemma 2.1.

Let ℓ : R + → R + be a strictly concave and increasing function with ℓ (0) = 0 . Then ℓ is strictly subadditive, i.e. ℓ ( s + t ) < ℓ ( s ) + ℓ ( t ) for every s, t > . Also, if x, y, z are points in R n with x = y and y = z , then ℓ ( | x − z | ) <ℓ ( | x − y | ) + ℓ ( | y − z | ) .Proof. The subadditivity of positive concave functions is a classical fact which canbe proven in the following way. Take t, s > and consider the function g : [0 , t + s ] → R + deﬁned through g ( r ) = ℓ ( t + s − r ) + ℓ ( r ) . Then g is strictly concave (as a sumof two strictly concave functions), and hence its minimal value is attained (only)on the boundary of the interval. Since g (0) = g ( t + s ) = ℓ ( t + s ) + ℓ (0) = ℓ ( t + s ) we get ℓ ( s ) + ℓ ( t ) = g ( t ) > min g = g (0) = ℓ ( t + s ) .The second part of the statement is an easy consequence of the triangle inequalityand the monotonicity of ℓ : ℓ ( | x − z | ) ≤ ℓ ( | x − y | + | y − z | ) < ℓ ( | x − y | ) + ℓ ( | y − z | ) . (cid:3) This can be applied to the study of optimal transport plans.

Theorem 2.2.

Let γ be an optimal transport plan for the cost c ( x, y ) = ℓ ( | x − y | ) with ℓ : R + → R + strictly concave, increasing, and such that ℓ (0) = 0 . Let γ = γ D + γ O , where γ D is the restriction of γ to the diagonal D = { ( x, x ) : x ∈ R d } and γ O is the part outside the diagonal, i.e. the restriction to D c = ( R d × R d ) \ D .Then this decomposition is such that ( π x ) γ O and ( π y ) γ O are mutually singularmeasures.Proof. It is clear that γ O is concentrated on supp γ \ D and hence ( π x ) γ O isconcentrated on π x (supp γ \ D ) and ( π y ) γ O is concentrated on π y (supp γ \ D ) .We claim that these two sets are disjoint. Indeed suppose that a common point z belongs to both. Then, by deﬁnition, there exists y such that ( z, y ) ∈ supp γ \ D and x such that ( x, z ) ∈ supp γ \ D . This means that we can apply c − cyclicalmonotonicity to the points ( x, z ) and ( z, y ) and get ℓ ( | x − z | ) + ℓ ( | z − y | ) ≤ ℓ ( | x − y | ) + ℓ ( | z − z | ) = ℓ ( | x − y | ) < ℓ ( | x − z | ) + ℓ ( | z − y | ) , where the last strict inequality, justiﬁed by Lemma 2.1, gives a contradiction. (cid:3) As we said in the introduction, the theorem above has some important conse-quences. In particular, it states that for this class of “concave” costs the commonmass between µ and ν must stay at rest, which we can explain in details. In-deed, we have ( π x ) γ D = ( π y ) γ D and since the remaining parts ( π x ) γ O and ( π y ) γ O are mutually singular, the decompositions µ = ( π x ) γ D + ( π x ) γ O and ν = ( π y ) γ D + ( π y ) γ O imply ( π x ) γ D = ( π y ) γ D = µ ∧ ν , ( π x ) γ O = ( µ − ν ) + PTIMAL TRANSPORT FOR CONCAVE COSTS 7 and ( π y ) γ O = ( ν − µ ) + . Hence, we must look in particular at the optimal transportproblem between the two mutually singular measures ( µ − ν ) + and ( µ − ν ) − . Wewill show that, under some natural regularity assumptions on the starting measure,this problem admits the existence of an optimal map.From now on, we will just assume w.l.o.g. that µ and ν are mutually singular.If not, just remove the common part, since we can deal with it separately.Let us see which is the general strategy for proving existence of an optimal T when the cost c is of the form c ( x, y ) = h ( x − y ) (not necessarily depending only onthe norm | x − y | ). Here as well the duality plays an important role. Indeed, if weconsider an optimal transport plan γ and a Kantorovich potential φ with ψ = φ c ,we may write φ ( x ) + ψ ( y ) ≤ c ( x, y ) on R d × R d and φ ( x ) + ψ ( y ) = c ( x, y ) on supp γ. Once we have that, let us ﬁx a point ( x , y ) ∈ supp γ . One may deduce fromthe previous computations that(2.3) x φ ( x ) − h ( x − y ) is maximal at x = x and, if φ is diﬀerentiable at x and x is a interior point, one gets ∇ φ ( x ) = ∇ h ( x − y ) (we also assume that h is diﬀerentiable at x − y ). The easiest case isthat of a strictly convex function h , since one may inverse the relation passing to ∇ h ∗ thus getting x − y = ∇ h ∗ ( ∇ φ ( x )) . This allows to express y as a function of x , thus proving that there is only one point ( x , y ) ∈ supp γ and hence that γ comes from a transport T ( x ) = x − ∇ h ∗ ( ∇ φ ( x )) .It is possible to see that strict convexity of h is not the important assumption, butwe need ∇ h to be injective, which is also the case if h ( z ) = ℓ ( | z | ) since ∇ h ( z ) = ℓ ′ ( | z | ) z | z | and the modulus of this vector identiﬁes the modulus | z | (since ℓ ′ is strictlyincreasing) and the direction gives also the direction of z . We will deal later withthe case where ℓ is not diﬀerentiable at some points, and it will not be so diﬃcult.The only diﬃcult point is since h would be highly singular at the origin, butfortunately if µ and ν have no mass in common then the case x = y will benegligible.However, we need to guarantee that φ is diﬀerentiable a.e. with respect to µ .This is usually guaranteed by requiring µ to be absolutely continuous with respectto the Lebesgue measure, and using the fact that φ ( x ) = inf y h ( | x − y | ) − ψ ( y ) allows to prove Lipschitz continuity of φ if h is Lipschitz continuous. This is indeedthe main diﬃculty: concave functions on R + may have an inﬁnite slope at andbe non-Lipschitz. The fact that the case of distance is negligible is not enoughto restrict h to a set where it is Lipschitz, if we do not have a lower bound on | x − y | . This can be easily obtained if one supposes that supp µ ∩ supp ν = ∅ , butthis is in general not the case when µ and ν are obtained from the two originalmeasures by removing the common mass. The paper by Gangbo and McCannmakes a slightly less restrictive assumption, i.e. that µ (supp ν ) = 0 . This allows tosay that µ − almost any point is far from the support of ν and, since we only needlocal Lipschitz continuity, this is enough.In the next two sessions we will develop two increasingly hard arguments to pro-vide µ − a.e. diﬀerentiability for φ . The next session will assume µ to be absolutelycontinuous and will make use of the notion of approximate gradient. Then, wewill weaken the assumptions on µ supposing only that it does not give mass to ( d − − rectiﬁable sets. This will require a diﬀerent ad-hoc notion of gradient and a PAUL PEGON, DAVIDE PIAZZOLI, FILIPPO SANTAMBROGIO geometric measure theory lemma which replaces the notion of Lebesgue points. Theﬁrst proof in the case µ ≪ L d is given for the sake of simplicity and completeness.A consequence of this all is the existence of a transport map for c ( x, y ) = ℓ ( | x − y | ) every time that • µ and ν are mutually singular • µ does not give mass to “small sets”.This map can also be proven to be unique, as it is the case every time that wecan prove that every optimal plan is indeed induced by a map. Moreover, if weonly suppose b) and we admit the existence of a common mass between µ and ν (actually it is even enough to suppose that µ − µ ∧ ν does not give mass to “smallsets”), then the optimal γ is unique and composed of two parts: one on the diagonaland one induced by a transport map, which implies that every point has at mosttwo images, one of the two being the point itself.We ﬁnish this section with a - standard - counterexample which shows that theassumption that µ does not give mass to “small sets” is natural. Indeed, one canconsider µ = H A and ν = H B + H C where A , B and C are three vertical parallel segments in R whose vertexes lie onthe two line y = 0 and y = 1 and the abscissas are , and − , respectively, and H is the − dimensional Haudorﬀ measure. It is clear that no transport plan mayrealize a cost better than since, horizontally, every point needs to be displacedof a distance . Moreover, one can get a sequence of maps T n : A → B ∪ C bydividing A into n equal segments ( A i ) i =1 ,..., n and B and C into n segments each, ( B i ) i =1 ,...,n and ( C i ) i =1 ,...,n (all ordered downwards). Then deﬁne T n as a piecewiseaﬃne map which sends A i − onto B i and A i onto C i . In this way the cost of themap T n is less than ℓ (1 + 1 /n ) , which implies that the inﬁmum of the Kantorovichproblem is ℓ (1) , as well as the inﬁmum on transport maps only. Yet, no map T may obtain a cost ℓ (1) , as this would imply that all points are sent horizontally, butthis cannot respect the push-forward constraint. On the other hand, the transportplan associated to T n weakly converge to the transport plan T + µ + T − µ, where T ± ( x ) = x ± e and e = (1 , . This transport plan turns out to be the only optimaltransport plan and its cost is ℓ (1) .In this example we have two measures with disjoint supports and no optimaltransport map (every point x is forced to have two images x ± e ), but we can adda common mass, for instance taking µ = H A and ν = µ H B + H C and in such a case every point should have three images (part of the mass at x issent at x ± e and part stays at x ). PTIMAL TRANSPORT FOR CONCAVE COSTS 9

A BCC i B i A i A i − The approximate gradient.

In this section we recall some notions abouta measure-theoretical notion replacing the gradient for less regular functions. Theinterested reader can ﬁnd many details in [3].Let us start from the following observation: given a function f : Ω → R anda point x ∈ Ω , we say that f is diﬀerentiable at x ∈ Ω and that its gradient is ∇ f ( x ) ∈ R d if for every ǫ > the set { x ∈ Ω : | f ( x ) − f ( x ) − ∇ f ( x ) · ( x − x ) | > ǫ | x − x |} is at positive distance from x , i.e. if there exist a ball around x which doesnot meet it. Instead of this requirement, we could ask for a weaker condition,namely that x is a zero-density point for the same set (i.e. a Lebesgue point of itscomplement). More precisely, if there exists a vector v such that lim δ → |{ x ∈ Ω : | f ( x ) − f ( x ) − v · ( x − x ) | > ǫ | x − x |}|| B ( x , δ ) | = 0 then we say that f is approximately diﬀerentiable at x and its approximate gradientis v . The approximate gradient will be denoted by ∇ app f ( x ) . As one can expect,it enjoys several of the properties of v ( x ) , that we list here. • The approximate gradient, provided it exists, is unique. • The approximate gradient is nothing but the usual gradient if f is diﬀer-entiable. • The approximate gradient shares the usual algebraic properties of gradients,in particular ∇ app ( f + g )( x ) = ∇ app f ( x ) + ∇ app g ( x ) . • If x is a local minimum or local maximum for f , and if ∇ app f ( x ) exists,then ∇ app f ( x ) = 0 .these four properties are quite easy to check.Another very important property that we need is a consequence of the well-knowRademacher theorem, which states that Lipschitz functions are almost everywherediﬀerentiable. Proposition 2.3.

Let f, g : Ω → R be two functions deﬁned on a same domain Ω with g Lipschitz continuous. Let A ⊂ Ω be a Borel set such that f = g on A . Then f is approximately diﬀerentiable almost everywhere on A and ∇ app f ( x ) = ∇ g ( x ) for a.e. x ∈ A .Proof. It is enough to consider all the points in A which are Lebesgue points of A and at the same time diﬀerentiability points of g . These points cover almost all A .It is easy to check that the deﬁnition of approximate gradient of f at a point x issatisﬁed if we take v = ∇ g ( x ) . (cid:3) The absolutely continuous case

Suppose now µ ≪ Leb, and proceed according to the strategy presented inSection 2.1. Take x ∈ supp( µ ) : there exists a point y ∈ R d such that ( x , y ) ∈ supp( γ ) . Denote by φ a Kantorovich potential and by φ c its conjugate function.From Equation (2.3), provided we can diﬀerentiate, we get(3.1) ∇ φ ( x ) − ∇ ℓ ( | x − y | ) = ∇ φ ( x ) − ℓ ′ ( | x − y | ) x − y | x − y | , so that x uniquely determines y , but unfortunately neither φ nor ℓ are smoothenough to diﬀerentiate. To comply with this, we ﬁrst want to prove φ admits anapproximate gradient Lebesgue-a.e. From what we saw in Section 2.2 this wouldimply (if ℓ is diﬀerentiable, we will see later how to handle the case where it is not)that Equation (3.1) is satisﬁed if we replace the gradient with the approximategradient.Recall that we may suppose φ ( x ) = φ cc ( x ) = inf y ∈ R d ℓ ( | x − y | ) − φ c ( y ) . Now consider a countable family of closed balls B i generating the topology of R d , and for every i consider the function deﬁned as φ i ( x ) := inf y ∈ B i ℓ ( | x − y | ) − φ c ( y ) . for x ∈ R n . One cannot provide straight Lipschitz properties for φ i , since a priori y is arbitrarily close to x and in general ℓ is not Lipschitz close to . However φ i is Lipschitz on every B j such that dist ( B i , B j ) > . Indeed if x ∈ B j , y ∈ B i onehas | x − y | ≥ d > , therefore the Lipschitz constant of ℓ ( |· − y | ) − φ c ( y ) does notexceed ℓ ′ ( d ) (or the right derivative of ℓ at d ). It follows that φ i is Lipschitz on B j ,and its constant does not exceed ℓ ′ ( d ) .Then, by Prop. 2.3, φ has an approximate gradient almost everywhere on { φ = φ i } ∩ B j . By countable union, φ admits an approximate gradient a.e. on [ i,jd ( B i ,B j ) > [ { φ i = φ } ∩ B j ] . As a consequence of this and of the absolute continuity of µ , in order to prove that φ has an approximate gradient µ -almost everywhere, it is enough to prove that µ [ i,jd ( B i ,B j ) > { φ i = φ } ∩ B j ! = 1 . Lemma 3.1.

For every i and jπ x (supp γ ∩ ( B j × B i )) ⊂ { φ = φ i } ∩ B j . Proof.

Let ( x, y ) ∈ supp γ ∩ ( B j × B i ) . Then φ ( x ) + φ c ( y ) = l ( | x − y | ) . It followsthat φ i ( x ) = inf y ′ ∈ B i ℓ ( | x − y ′ | ) − φ c ( y ′ ) ≤ ℓ ( | x − y | ) − φ c ( y ) = φ ( x ) . On the other hand, for every x ∈ R n φ i ( x ) = inf y ∈ B i ℓ ( | x − y | ) − φ c ( y ) ≥ inf y ∈ R n ℓ ( | x − y | ) − φ c ( y ) = φ ( x ) . (cid:3) PTIMAL TRANSPORT FOR CONCAVE COSTS 11

As a consequence of this, µ [ i,jd ( B i ,B j ) > { φ i = φ } ∩ B j ! ≥ µ [ i,jd ( B i ,B j ) > π x (supp γ ∩ ( B j × B i )) ! = µ π x supp γ ∩ [ i,jd ( B i ,B j ) > B j × B i !! = µ ( π x (supp γ \ D ))= γ (cid:2) ( π x ) − ( π x (supp γ \ D )) (cid:3) ≥ γ (supp γ \ D ) = 1 since the diagonal is γ -negligible. In other words the following theorem is proved Theorem 3.2.

Let ℓ : [0 , + ∞ ) → R + be a concave function, and suppose µ and ν are two probability measures on R n , with µ ≪ Leb. Call φ a Kantorovitch potentialfor the transport problem with cost ℓ ( | x − y | ) . Then φ admits an approximatedgradient µ -a.e. We now come back to the proof of the main theorem of the paper, under theadditional assumption that µ is absolutely continuous. Theorem 3.3.

Suppose that µ and ν are two mutually singular probability measureson R d such that µ ≪ L d , and take the cost c ( x, y ) = ℓ ( | x − y | ) , for ℓ : R + → R + strictly concave and increasing. Then there exists a unique optimal transport plan γ and it is induced by a transport map.Proof. We will prove that any optimal transport plan γ is induced by a transportmap, which also implies uniqueness by standard techniques (see [15]). From thestrategy that has been presented in Section 2.1, and since we know that γ is con-centrated outside the diagonal D , it is enough to prove that, if ( x , y ) ∈ supp γ \ D ,then y is uniquely determined by x • Case 1: ℓ diﬀerentiable at | x − y | Let ( x , y ) ∈ supp γ \ D , and suppose that ℓ is diﬀerentiable at | x − y | .Then, if φ is approximately diﬀerentiable at x (which is true µ − a.e.) ∇ app φ ( x ) − ℓ ′ ( | x − y | ) x − y | x − y | . Thus, since ℓ was supposed strictly increasing and strictly concave, | x − y | = ( ℓ ′ ) − ( ∇ app φ ( x ) , and(3.2) y − x | y − x | = − ∇ app φ ( x ) ℓ ′ ( | x − y | ) . In other words y is uniquely determined by x . • Case 2: ℓ not diﬀerentiable at | x − y | This second case is even more striking. Consider the values of the rightand left derivatives ℓ ′ r ( | x − y | ) and ℓ ′ l ( | x − y | ) and pick a value p ∈ I :=[ ℓ ′ r ( | x − y | ) , ℓ ′ l ( | x − y | )] . We know by concavity that we have l ( t ) − l ( | x − y | ) ≤ p ( t − | x − y | ) for all t ∈ R . Consider the function φ ( · ) − [ l ( | x − y | ) + p ( |· − y | − | x − y | )] deﬁned on R n . Then, for every x ∈ R n , φ ( x ) − [ l ( | x − y | ) + p ( | x − y | − | x − y | )] ≤ φ ( x ) − l ( | x − y | ) ≤ φ ( x ) − l ( | x − y | )= φ ( x ) − [ l ( | x − y | ) + p ( | x − y | − | x − y | )] . In other words, φ ( · ) − [ l ( | x − y | ) + p ( |· − y | − | x − y | )] has a maximumin x . Since x = y , ∇ app φ ( x ) − p x − y | x − y | . It follows that |∇ app φ ( x ) | = p for every p ∈ I , and this leads to a contra-diction.This means that this second case can only occur on a negligible set of points x (those where φ is not approximately diﬀerentiable). (cid:3) Remark . The present paper is fully written under the assumption that ℓ is strictlyconcave and increasing , which implies ℓ ′ > . Yet, most of the analysis couldbe adapted to the case where ℓ is only supposed to be strictly concave, even ifmonotonicity is indeed very much reasonable from the modelization point of view.One of the problem in case ℓ ′ is not strictly positive is the fact that we risk to divideby in equation (3.2). This can be ﬁxed since it only happens at a maximum pointfor ℓ , and c − cyclical monotonicity can be used to prove that γ ( { ( x, y ) : | x − y | = m } = 0 when m = arg max ℓ together with a density points argument. Anotherdiﬃculty to be ﬁxed is the fact that with ℓ not increasing the cost c is not anymorebounded from below, and we would need to assume µ and ν compactly supported.Yet, this is not in the scopes of the present paper.4. The general case: µ does not give mass to small sets Let us now consider weaker assumptions, i.e. suppose that µ does not give massto small sets. Namely suppose that, for every A ⊂ R d which is H d − -rectiﬁable, wehave µ ( A ) = 0 .4.1. A GMT lemma for density points.

The following lemma is an interestingresult from Geometric Measure Theory that can be used instead of Lebesgue points-type results when we face a measure which is not absolutely continuous but “doesnot give mass to small sets”. It states that, in such a case, µ − a.e. point x is suchthat every cone exiting from x , even if very small, has positive µ mass. In particularit means that we can ﬁnd points of supp µ in almost arbitrary directions close to x .We give the proof of this lemma for the sake of completeness, and because itis not easy to ﬁnd hints or references for it in the literature about GMT. Indeed,the result is known in a part of the optimal transport community, where densitypoints (in particular Lebesgue points) play an important role in some strategiesfor the existence of optimal maps (see [4, 6]), and it has recently been detailed in[5]. It is also part of the folklore of some branches of the GMT community, but,also in this case, there is no written evidence of it in this form and under theseassumptions. Indeed, Lemma 3.3.5 in [8] proves the ( d − − rectiﬁability of any setsuch that every point admits the existence of a small two-sided cone containing noother point of the set, and this would allow to prove the statement we need, up PTIMAL TRANSPORT FOR CONCAVE COSTS 13 to the fact that in our case we only consider one-sided cones. The proof followsanyway the same main ideas, and the one that we propose here is especially writtenin terms of µ − negligibility. Lemma 4.1.

Let µ be a Borel measure on R d , and suppose that µ does not chargesmall sets. Then µ is concentrated on the set { x : ∀ ǫ > , ∀ δ > , ∀ u ∈ S d − , µ ( C ( x, u, δ, ǫ )) > } , where C ( x, u, δ, ǫ ) = C ( x, u, δ ) ∩ B ( x, ǫ ) := { y : h y − x, u i ≥ (1 − δ ) | y − x |} ∩ B ( x, ǫ ) Proof.

Equivalently we will prove that µ  [ u,δ,ǫ { x : µ ( C ( x, u, δ, ǫ )) = 0 }  = 0 . First notice that u , δ and ǫ may be taken each in a countable set. This means thatit is enough to ﬁx then una tantum u , δ and ǫ and then prove that µ ( { x : µ ( C ( x, u, δ, ǫ )) = 0 } ) = 0 . Moreover, for sake of simplicity, suppose u = (0 , . . . , − . Take now all cubes Q with sides parallel to the coordinate hyperplanes, centered in a point of Q d andwith sidelength belonging to Q + , and call the set of such cubes { Q n } . We can seethat { x : µ ( C ( x, u, δ, ǫ )) = 0 } ⊂ [ n { x ∈ Q n : µ ( Q n ∩ C ( x, u, δ )) = 0 } . Therefore we will show, for a ﬁxed cube Q , that µ ( { x ∈ Q : µ ( Q ∩ C ( x, u, δ )) = 0 } ) = 0 . Now write every y ∈ R d as y = ( y ′ , y d ) , with y ′ ∈ R d − . Then a quick computationshows that ( y ′ , y d ) ∈ C (( x ′ , x d ) , u, δ ) if and only if y d ≤ x d − − δ p δ (2 − δ ) | y ′ − x ′ | , and we set k ( δ ) := − δ √ δ (2 − δ ) .Deﬁne now X as the projection of Q along the ﬁrst d − coordinates, and forevery x ′ ∈ X z ( x ′ ) := sup { z ∈ R : µ ( Q ∩ C (( x ′ , z ) , u, δ )) = 0 } . Notice that the set in the supremum is never empty, by taking for instance z ≤ min { x d : x ∈ Q } .Let us study the function x ′ z ( x ′ ) and notice that it is k ( δ ) -Lipschitz continu-ous: indeed, if we had z ( x ′ ) < z ( x ′ ) − k ( δ ) | x ′ − x ′ | , then, the cone C (( x ′ , z ( x ′ )) , u, δ ) would be included in the interior of C (( x ′ , z ( x ′ )) , u, δ ) and hence the same wouldbe true for C (( x ′ , z ( x ′ ) + t ) , u, δ ) for small t > . Yet, this implies µ ( C (( x ′ , z ( x ′ ) + t ) , u, δ )) = 0 and hence the supremum deﬁning z ( x ′ ) should be at least z ( x ′ ) + t ,which is obviously a contradiction. By interchanging the role of x ′ and x ′ we get | z ( x ′ ) − z ( x ′ ) | ≤ k ( δ ) | x ′ − x ′ | . Now we take { x ∈ Q : µ ( Q ∩ C ( x, δ, u )) = 0 } = { ( x ′ , x d ) : x d ≤ z ( x ′ ) } = { ( x ′ , x d ) : x d < z ( x ′ ) } ∪ { ( x ′ , x d ) : x d = z ( x ′ ) } . The second set on the right hand side is µ − negligible since it is the graph of aLipschitz function of ( d − variables and µ is supposed not to give mass to thesesets. The ﬁrst set is also negligible since it is contained in the complement of supp µ .Indeed, if a point x = ( x ′ , x d ) satisﬁes x d < z ( x ′ ) , then we also have x d + t < z ( x ′ ) for small t > and the interior of the cone C (( x ′ , z ( x ′ ) + t ) , u, δ ) is µ − negligible.This means that x has a neighborhood where there is no mass of µ , and hence thatit does not belong to supp µ .This ends the proof. (cid:3) Notice that the above result is obviously false if we withdraw the hypothesis on µ : take a measure concentrated on a ( d − − manifold (for instance, an hyperplane).Then, for small ε and δ , the measure µ ( C ( x, u, δ, ǫ )) will be zero if u does not belongto the tangent space to the manifold at x .4.2. How to handle non-absolutely continuous measures µ . Consider againthe transport problem, and suppose µ does not charge small sets. Call γ theminimizer of the Kantorovitch problem, φ a maximizer of the dual problem. As inSection 3, we take a countable family of balls B i generating the topology of R n ,and for each i deﬁne φ i ( x ) := inf y ∈ B i l ( | x − y | ) − φ c ( y ) , x ∈ R d . Lemma 4.2. If B ⊂ R d is such that d (0 , B ) = d > and ℓ is concave and in-creasing, then the function B ∋ x ℓ ( | x | ) is semi-concave, and, more precisely, x ℓ ( | x | ) − ℓ ′ ( d ) | x | is concave (where ℓ ′ denotes the derivative, or right deriv-ative in case ℓ is not diﬀerentiable at d ).Proof. We just need to give an upper bound on the second derivatives of g ( x ) := ℓ ( | x | ) , which we will do in the case where ℓ is C . The general result will follow byapproximation. Compute ∂ i g ( x ) = ℓ ′ ( | x | ) x i | x | ; ∂ ij g ( x ) = ℓ ′′ ( | x | ) x i x j | x | + ℓ ′ ( | x | ) | x | δ ij − x i x j | x | . Hence, if we denote by ˆ x the unit vector x/ | x | , the Hessian of g is composed oftwo parts: the positive matrix ˆ x ⊗ ˆ x times the non-positive factor ℓ ′′ ( | x | ) , and thepositive matrix (Id − ˆ x ⊗ ˆ x ) times the positive factor ℓ ′ ( | x | ) . Since Id − ˆ x ⊗ ˆ x ≤ Id (in the sense of positive-deﬁnite symmetric matrices), this gives D g ≤ ℓ ′ ( | x | )Id ≤ ℓ ′ ( d )Id and thesis follows. (cid:3) Corollary 4.3.

For every i and j such that d ( B i , B j ) > , the function φ i issemi-concave and hence µ -a.e. diﬀerentiable on B j .Proof. The semiconcavity of φ i follows from its deﬁnition as an inﬁmum of semi-concave functions, with the same semicontinuity constant which is ℓ ′ ( d ( B i , B j )) .This implies that φ i has the same regularity and diﬀerentiability points of convexfunctions. It is well-known that convex functions are diﬀerentiable everywhere buton a ( d − − rectiﬁable set: this is a consequence of the more general fact that the set PTIMAL TRANSPORT FOR CONCAVE COSTS 15 where a convex-valued monotone multifunction in R d takes values of dimension atleast k can be covered by countably many ( d − k ) − dimensional graphs of Lipschitzfunctions (see [1]), applied to the subdiﬀerential multifunction.Since µ does not give mass to small sets, the set of non-diﬀerentiability pointsof φ i is also negligible. (cid:3) Remark . For each pair i , i ′ consider the measure µ i,i ′ := µ { φ = φ i = φ i ′ } , that is, for every A Borel, µ i,i ′ ( A ) = µ ( A ∩ { x : φ ( x ) = φ i ( x ) = φ i ′ ( x ) } ) . Since forevery small set A one has µ i,i ′ ( A ) ≤ µ ( A ) = 0 , Lemma 4.1 yields the existence of a µ -negligible set N i,i ′ ⊂ { φ = φ i = φ i ′ } such that for every x ∈ { φ = φ i = φ i ′ } \ N i,i ′ and for every u ∈ S d − , δ > , ǫ > , one has µ i,i ′ ( C ( x, u, δ, ǫ )) > . In particularit follows that C ( x, u, δ, ǫ ) ∩ { φ = φ i = φ i ′ } 6 = ∅ .This isotropy in the structure of { φ = φ i = φ i ′ } implies the following Proposition 4.4.

Consider N := [ i,i ′ N i,i ′ ∪ [ i,jd ( B i ,B j ) > { x ∈ B j : φ i not diﬀerentiable at x } , then N is µ − negligible. Moreover, consider A := ( π x )(supp γ \ D ) , which is a setwith µ ( A ) = 1 : then, for every x ∈ A \ N there exists i such that φ ( x ) = φ i ( x ) .Moreover, if φ ( x ) = φ i ( x ) = φ i ′ ( x ) , then ∇ φ i ( x ) = ∇ φ i ′ ( x ) . Proof.

The fact that N is negligible follows from Remark 2 and Corollary 4.3.Let x ∈ A \ N . There exists y such that ( x, y ) ∈ supp γ and x = y . It followsthat there exist two balls B i and B j with x ∈ B j , y ∈ B i and d ( B i , B j ) > . Then,by Lemma 3.1, φ ( x ) = φ i ( x ) .Suppose now, for the sake of simplicity, φ ( x ) = φ ( x ) = φ ( x ) . Then, since x / ∈ N , v := ∇ φ ( x ) and v := ∇ φ ( x ) are both well-deﬁned. By contradictionsuppose v = v . Then for every ǫ > there exists δ ( ǫ ) > such that for every ˜ x ∈ B ( x, δ ) | φ (˜ x ) − φ ( x ) − v · (˜ x − x ) | ≤ ǫ | ˜ x − x | and | φ (˜ x ) − φ ( x ) − v · (˜ x − x ) | ≤ ǫ | ˜ x − x | . Therefore, for such ˜ x , | φ (˜ x ) − φ (˜ x ) + ( v − v ) · (˜ x − x ) | = | φ (˜ x ) − φ ( x ) − v · (˜ x − x ) − [ φ (˜ x ) − φ ( x ) − v · (˜ x − x )] | ≤ ǫ | ˜ x − x | In order to have a contradiction, it is enough to choose ˜ x such that φ (˜ x ) = φ (˜ x ) and ( v − v ) · (˜ x − x ) > ǫ | ˜ x − x | . The latter may be expressed as (˜ x − x ) | ˜ x − x | · v − v | v − v | > ǫ | v − v | . In order to guarantee that this is possible, choose ǫ < | v − v | . Then, by Remark 2with C (cid:16) x, v − v | v − v | , − ǫ | v − v | , δ ( ǫ ) (cid:17) , we are done. (cid:3) To go on with our proof we need the following lemma.

Proposition 4.5.

Let f : Ω → R be diﬀerentiable at x ∈ Ω . Suppose B ⊂ Ω is such that x ∈ B and that for every u ∈ S n − , δ > and ǫ > one has B ∩ C ( x , u, δ, ǫ ) = ∅ . Moreover suppose that x is a local maximum for f on B .Then ∇ f ( x ) = 0 . Proof.

Write v := ∇ f ( x ) and suppose by contradiction v = 0 . Then for every ǫ > there exists δ ( ǫ ) > such that if | x − x | ≤ δ ( ǫ ) f ( x ) − f ( x ) ≥ v · ( x − x ) − ǫ | x − x | . Moreover, from the assumption on B , we may suppose x ∈ B \ { x } , x − x | x − x | · v | v | ≥ and | x − x | ≤ δ ( ǫ ) . Then we have f ( x ) − f ( x ) ≥ | x − x | ( | v | / − ǫ ) . Now choose ǫ < | v | . It follows that for every η ≤ δ ( ǫ ) we may choose x ∈ B \ x such that | x − x | ≤ η and f ( x ) > f ( x ) , which is a contradiction. (cid:3) Theorem 4.6.

Let µ and ν be Borel probability measures on R d , and suppose that µ does not charge small sets. Suppose moreover that µ and ν are mutually singular.Then, if ℓ : [0 , + ∞ ) → R is strictly concave and increasing, there exists a uniqueoptimal transport map for the cost ℓ ( | x − y | ) .Proof. We argue as in Section 3 and accordingly to the usual strategy.Let us take x ∈ A \ N , with ( x , y ) ∈ supp γ and x = y . We just need toshow that y is uniquely determined by x . By Proposition 4.4 φ ( x ) = φ i ( x ) forsome i . Since x ∈ arg max[ φ ( · ) − ℓ ( |· − y | )] , in particular x ∈ arg max[ φ i ( · ) − ℓ ( |· − y | )] | { φ = φ i } . • Case 1: ℓ diﬀerentiable at | x − y | .It follows that φ i ( · ) − ℓ ( |· − y | ) is diﬀerentiable at x . From Lemma 4.5we have ∇ φ i ( x ) − ∇ ℓ ( |· − y | ) | x = 0 . Notice that the value of the vector ∇ φ i ( x ) only depends on x and not on y , and does not depend on i .By Remark 2 with i = i ′ we may apply Proposition 4.5 to yield that ∇ φ i ( x ) = ℓ ′ ( | x − y | ) x − y | x − y | . It follows that | y − x | = ( ℓ ′ ) − ( |∇ φ i ( x ) | ) PTIMAL TRANSPORT FOR CONCAVE COSTS 17 and x − y | x − y | = ∇ φ i ( x ) ℓ ′ ( | x − y | ) . • Case 2: ℓ not diﬀerentiable at | x − y | Here as well we argue as in Section 3: pick a value p ∈ I := [ ℓ ′ r ( | x − y | ) ,ℓ ′ l ( | x − y | )] . Suppose φ ( x ) = φ i ( x ) and consider the function φ ( · ) − [ l ( | x − y | ) + p ( |· − y | − | x − y | )] . deﬁned on R n . Then, φ i ( · ) − [ l ( | x − y | ) + p ( |· − y | − | x − y | )] has amaximum point at x . It follows that |∇ φ i ( x ) | = p for every choice of p ,and this leads to a contradiction. (cid:3) Appendix

Concave costs and translations.

Here we wish to analyse a feature ofstrictly concave costs, namely that they penalize equal displacements.

Proposition 5.1.

Let µ and ν be mutually singular Borel measures on R d , andsuppose µ does not charge small sets. Let l : [0 , + ∞ ) → R be a C , increasing,strictly concave function, and suppose ℓ ′′ ( x ) < for every x > . Call γ theoptimal transport plan with respect to ℓ ( |·| ) . Then for every e ∈ R d \ { } γ ( { ( x, x + e ) : x ∈ R d } ) = 0 . Proof.

For e ∈ R d write γ e := γ { ( x, x + e ) : x ∈ R d } . Suppose by contradictionthat for some e the measure γ e does not vanish. It follows that µ e := ( π x ) γ e isnontrivial as well. Moreover it does not charge small sets, since µ e ≤ µ . ThereforeTheorem 4.1 implies that µ e is concentrated on(5.1) { x : ∀ ǫ > , ∀ δ > , ∀ u ∈ R d such that | u | = 1 , µ e ( C ( x, u, δ, ǫ )) > } , which we will refer to as A e . Clearly we have A e ⊂ supp µ e = π x (supp γ e ) . Now take x and x in supp µ e . Then ( x , x + e ) , ( x , x + e ) ∈ supp γ e , which is l -cyclically monotone. If we call ξ := x − x it follows that ℓ ( | e | ) ≤ ℓ ( | e + ξ | ) + ℓ ( | e − ξ | ) . Write then a second order Taylor expansion, to yield ℓ ( | e | ) ≤ ℓ ( | e | ) + X i,j ξ i (cid:20) ℓ ′′ ( | e | ) e i e j | e | + ℓ ′ ( | e | ) (cid:18) δ ij | e | − e i e j | e | (cid:19)(cid:21) ξ j + o ( | ξ | ) or, in other words, ≤ X i,j ξ i (cid:20) ℓ ′′ ( | e | ) e i e j | e | + ℓ ′ ( | e | ) (cid:18) δ ij − e i e j | e | (cid:19)(cid:21) ξ j + o ( | ξ | )= ℓ ′′ ( | e | ) | e | ( e · ξ ) + ℓ ′ ( | e | ) | ξ | − ℓ ′ ( | e | ) | e | ( e · ξ ) + o ( | ξ | ) . Now take any x ∈ A e , which is nonempty by our hypothesis. Then there exists x ∈ supp µ e such that ξ · e ≥ (1 − δ ) | ξ | | e | with δ arbitrarily small. Then by pluggingthis into the previous equation, and letting ξ → , we obtain a contradiction. (cid:3) The conclusion of the previous proposition is strongly diﬀerent from the convexcase. Indeed, consider any Borel measure µ on R d with compact support, and ﬁx avector e ∈ R d \ { } . Deﬁne τ e to be the translation by e and µ e := ( τ e ) µ as the translated µ . Then one may prove that τ e is an optimal transportation of µ onto µ e if the cost is convex, while Proposition 5.1 implies that translations arenever optimal for concave costs.In order to prove that τ e is an optimal transportation of µ onto µ e , consider anymap T which pushes forward µ to µ e . Then, by Jensen inequality, if x ℓ ( | x | ) isconvex, we have Z c ( x, T x ) dµ = Z ℓ ( | x − T x | ) dµ ≥ ℓ (cid:18) (cid:12)(cid:12)(cid:12)(cid:12)Z x − T x dµ (cid:12)(cid:12)(cid:12)(cid:12) (cid:19) = ℓ (cid:18) (cid:12)(cid:12)(cid:12)(cid:12)Z x dµ − Z x dµ e (cid:12)(cid:12)(cid:12)(cid:12) (cid:19) = ℓ ( | e | )= Z ℓ ( | x − τ e ( x ) | ) dµ . On the deﬁnition of a weak µ − approximated gradient. Even if notstrictly necessary for the sake of the paper, in this section we discuss the possibilityof using Lemma 4.1 in order to deﬁne a sort of approximated gradient with respectto a measure µ which gives no mass to small sets. The idea is that in Section 4 weproduced an ad-hoc choice of gradient, can we extend it to more general frameworksin order to use it in other situations? The answer will be both yes and no.Before entering into details, let us set the following language. Deﬁnition 2.

Let µ ∈ P ( R n ) . We deﬁne the isotropic set of µ by isot µ = { x : ∀ δ, u, ǫ, µ ( C ( x, u, δ, ǫ )) > } . If A is a Borel subset, the µ -isotropic set A is by deﬁnition isot µ ( A ) = A ∩ isot( µ A ) . We know that, if µ does not charge small sets, then for all Boret subset A (byLemma 4.1) we have µ ( A \ isot µ ( A )) = 0 . We now pass to a naive deﬁnition of gradient.

Deﬁnition 3.

We say that L ∈ R n is a µ -isotropic gradient of f : A → R at x iffor all ǫ > , x belongs to the set isot µ ( { y : | f ( y ) − ( f ( x ) + h L, y − x i| ) ≤ ǫ | y − x |} ) . This deﬁnition only means that we can ﬁnd points y ∈ supp µ D ǫ where D ǫ = | f ( y ) − ( f ( x ) + h L, y − x i| ) ≤ ǫ | y − x | in almost arbitrary directions and arbitrary close to x . It satisﬁes some properties,for instance Proposition 5.2. If L is a µ -isotropic gradient of f at x , where f has a localextremum, then L = 0 . PTIMAL TRANSPORT FOR CONCAVE COSTS 19

We do not give the proof, which is quite similar to that of Proposition 4.5 (indeed,most proofs will be skipped in this appendix subsection, since they simply recallthe proofs of Section 4). Yet, this deﬁnition is not enough to guarantee uniqueness.Indeed, take two disjoint sets

A, B ⊂ S with A ∪ B = S , such that the supportsof L A and L B are both the whole S (ﬁnding two such sets is a non-trivial,but classical exercise). Now take two diﬀerent vectors L A and L B and deﬁne f ( x ) =  if x = 0 ,L A · x if x | x | ∈ A,L B · x if x | x | ∈ B. It follows that both L A and L B are µ -isotropic gradient of f at any x ∈ R d suchthat L A · x = L B · x . Notice that this is a “small set”: it leaves open the question ofa possible µ − a.e. uniqueness, but the situation is anyway worse than what happensfor the approximate gradient (Section 2), where the gradient is necessarily uniqueat any point where it exists.Hence, we try to switch to a better deﬁnition of gradient. We rely on the followingobservation, which is essentially contained in the proof of Proposition 4.4 Proposition 5.3. If A ⊆ Ω ⊆ R n where A is a Borel subset, Ω open set, and φ, ψ : Ω → R , then ∇ φ ( x ) = ∇ ψ ( x ) on the set isot µ ( { x : φ ( x ) = ψ ( x ) , ∇ φ ( x ) and ∇ ψ ( x ) exist } ) . The following proposition is an attempt at deﬁning a uniquely determined µ -approximated gradient under hypotheses satisﬁed by the Kantorovitch potential inSection 4. Proposition 5.4.

Given A ⊆ Ω ⊆ R n , with A Borel and Ω open, given µ ∈ P ( R n ) which does not charge small sets, and f : A → R n measurable, we say that f is µ -diﬀerentiable on A if there exists a countable family ( φ n ) n ∈ N , φ n : Ω → R suchthat ∀ x ∈ A, ∃ n ∈ N , φ n ( x ) = f ( x ) and ∇ φ n ( x ) exists . Then there exists a µ -a.e. unique function ∇ µ f : A → R n enjoying the propertyfor all φ : Ω → R , ∇ µ f coincides with ∇ φ on µ -a.e. point x such that f ( x ) = φ ( x ) and ∇ φ ( x ) exists. (5.2)In order to prove the statement above, one can build ∇ µ f as being equal to ∇ φ n on almost all the set { x : f ( x ) = φ n ( x ) and ∇ φ n ( x ) exists } , which is well-deﬁneda.e. by Proposition 5.3. The same proposition allows us to check easily that itsatisﬁes the desired property, and is as such unique.The connection between the two notions we deﬁned is contained in the followingstatement. Proposition 5.5. If f : A → R is µ -diﬀerentiable, then for a.e x ∈ A , ∇ µ f ( x ) isa µ -isotropic gradient of f . Together with Proposition 5.2, this would allow us to follow the usual strat-egy described in Section 2.1 to prove the existence of an optimal T based on theKantorovich potential φ and ∇ µ φ . Acknowledgments

This paper is part of the work of the ANR project ANR-12-BS01-0014-01 GEOMETRYA, whose support is gratefully acknowledged by the third author. The work started when the second author was master student atParis-Sud, ﬁnanced by

Fondation Mathématique Jacques Hadamard , which is alsogratefully acknowledged for its support.

References [1]

G. Alberti and L. Ambrosio , A geometrical approach to monotone functions in R n , Math.Z. , 230 (1999), pp. 259–316.[2]

Y. Brenier , Polar factorization and monotone rearrangement of vector-valued functions,

Communications on Pure and Applied Mathematics

44, 375-417, 1991.[3]

L. C. Evans and R. F. Gariepy , Measure theory and ﬁne properties of functions , Studiesin Advanced Mathematics, CRC Press, Boca Raton, FL, 1992.[4]

T. Champion and L. De Pascale , The Monge problem in R d , Duke Math. J.

Volume 157,Number 3 (2011), 551-572.[5]

T. Champion and L. De Pascale , On the twist condition and c -monotone transport plans, Discr. Cont. Dyn. Syst. Ser. A , Vol 34, No 4, 2014, 1339–1353[6]

T. Champion, L. De Pascale and P. Juutinen,

The ∞ -Wasserstein distance: local so-lutions and existence of optimal transport maps, SIAM J. of Mathematical Analysis , 40,(2008), no. 1, 1-20.[7]

J. Delon, J. Salomon, A. Sobolevskii,

Local matching indicators for transport problemswith concave costs,

SIAM J. Disc. Math , 26 (2), pp. 801-827 (2012).[8]

H. Federer , Geometric Measure Theory , Classics in Mathematics, Springer, 1996 (reprintof the 1st ed. Berlin, Heidelberg, New York 1969 edition)[9]

W. Gangbo, R. McCann , The geometry of optimal transportation,

Acta Math. , 177 (1996),113–161.[10]

L.V. Kantorovich,

On the translocation of masses,

C.R. (Dokl.) Acad. Sci. URSS , 37(1942), 199-201.[11]

L.V. Kantorovich, , On a problem of Monge (in Russian),

Uspekhi Mat. Nauk.,

X.-N. Ma, N. S. Trudinger, et X.-J. Wang

Regularity of potential functions of theoptimal transportation problem.

Arch. Ration. Mech. Anal. , 177(2) :151–183, 2005.[13]

G. Monge,

Mémoire sur la théorie des Déblais et des Remblais,

Histoire de l’Académie desSciences de Paris , 1781.[14]

A. Pratelli,

On the suﬃciency of c-cyclical monotonicity for optimality of transport plans,

Math. Z.,

258 (2008), no. 3, 677–690[15]

C. Villani , Topics in Optimal Transportation . Graduate Studies in Mathematics, AMS,2003.

Paul Pegon – Filippo Santambrogio,Laboratoire de Mathématiques d’Orsay,Université Paris-Sud,91405 Orsay cedex, FRANCE, [email protected]@math.u-psud.fr