Numerical methods for matching for teams and Wasserstein barycenters
NNumerical methods for matching for teams andWasserstein barycenters
G. Carlier ∗ , A. Oberman † , E. Oudet ‡ November 14, 2014
Abstract
Equilibrium multi-population matching (matching for teams) is a prob-lem from mathematical economics which is related to multi-marginal op-timal transport. A special but important case is the Wasserstein barycen-ter problem, which has applications in image processing and statistics.Two algorithms are presented: a linear programming algorithm and anefficient nonsmooth optimization algorithm, which applies in the case ofthe Wasserstein barycenters. The measures are approximated by discretemeasures: convergence of the approximation is proved. Numerical resultsare presented which illustrate the efficiency of the algorithms.
Keywords: matching for teams, Wasserstein barycenters, duality, linearprogramming, numerical methods for nonsmooth convex minimization.
Optimal transport theory has received a lot of attention in the last decadesand is now recognized as a powerful tool in PDEs, geometry, and functionalinequalities (for which we refer to the monographs of Villani [Vil03]-[Vil09]).Given two Borel probability measures µ , µ , on metric spaces X and X ,respectively, and a cost function c ∈ C ( X × X , R ), the Monge-Kantorovichoptimal transport problem consists in finding the cheapest way to transport µ to µ for the cost c : W c ( µ , µ ) := inf γ ∈ Π( µ ,µ ) (cid:90) X × X c ( x , x ) γ ( dx , dx ) (MK)where Π( µ , µ ) denotes the set of transport plans between µ and µ , i.e. theset of probability measures on X × X having µ , µ , respectively as marginals. ∗ CEREMADE, UMR CNRS 7534, Universit´e Paris IX Dauphine, Pl. de Lattre de Tassigny,75775 Paris Cedex 16, FRANCE, [email protected] † Department of Mathematics and Statistics, McGill University, 805 Sherbrooke Street West,Montreal, CANADA, [email protected] ‡ Laboratoire Jean Kuntzmann, Universit´e Joseph Fourier, Tour IRMA, BP 53 51, rue desMath´ematiques F-38041 Grenoble Cedex 9, FRANCE, [email protected] . a r X i v : . [ m a t h . NA ] N ov ince this problem is of linear programming type, under very mild assumptions(e.g. when X and X are compact), the least transport cost W c ( µ , µ ) admitsa dual expression given by the Kantorovich duality formula: W c ( µ , µ ) := sup ϕ ∈ C ( X , R ) (cid:26)(cid:90) X ϕ ( x ) µ ( dx ) + (cid:90) X ϕ c ( x ) µ ( dx ) (cid:27) (1.1)where ϕ c denotes the c -transform of ϕ : ϕ c ( x ) := min x ∈ X { c ( x , x ) − ϕ ( x ) } . A particularly important example is the quadratic case where X = X = R d , µ and µ have finite second moments, and c ( x , x ) = | x − x | . This casewas first solved by Brenier [Bre91], who proved that whenever µ is absolutelycontinuous, there is a unique optimal transport plan that is given by the gradientof a convex potential. This important result relates optimal transport to Monge-Amp`ere equations. We refer to [BFO14] and the references therein for numericalmethods for optimal transport based on the Monge-Amp`ere equation.More generally, costs given by distances or convex power of distances areimportant because they lead to the so-called Wasserstein distances. More pre-cisely, when X = X (a metric space with distance d ) and c ( x , x ) = d ( x , x ) p for some p ≥
1, the value W c ( µ , µ ) in (MK) is the p -power of the so-called p -Wasserstein distance W p ( µ , µ ) between µ and µ : W p ( µ , µ ) := (cid:16) inf γ ∈ Π( µ ,µ ) (cid:90) X × X d ( x , x ) p γ ( dx , dx ) (cid:17) /p . In the present article, we are interested in solving numerically the follow-ing variant of the optimal transport problem which allows for more than twomarginals. Given (compact metric, say) spaces X , . . . , X I , equipped with Borelprobability measures ( µ , . . . µ I ) ∈ P ( X ) × . . . × P ( X I ), a (compact metric)space Z , and cost functions c i ∈ C ( X i × Z, R ), we look for a probability measure ν on Z solving: inf ν ∈P ( Z ) J ( ν ) := I (cid:88) i =1 W c i ( µ i , ν ) . (1.2)This problem was introduced in Carlier and Ekeland [CE10] in the frameworkof multi-population matching equlibrium; we will shortly recall in section 2 theeconomic interpretation of (1.2). Problem (1.2) is also a special case of multi-marginal optimal transport (a variant of (MK) where more than two marginalsare prescribed). Multi-marginal optimal transport is currently an active researchfield: compared to (two marginals) optimal transport, there are fewer theoret-ical results, and the complexity of general multi-marginal optimal transportproblems typically increases exponentially in the number of marginals. Regard-ing the rapidly developing theory of multi-marginal optimal transport, we referthe reader to the recent papers by Pass [Pas12a], [Pas12b], by Ghoussoub and2oauthors [GM13], [GM14] and the references therein for costs with special sym-metry properties, motivated in particular by challenging computational issuesin density functional theory in quantum physics.We now discuss a special, but important case of (1.2) which has a cleargeometric interpretation. Let all the X i ’s and Z coincide with R d , the measures µ i have finite second moments, and the costs be quadratic (i.e. c i ( x i , z ) := λ i | x i − z | for some weights λ i >
0, summing to 1 without loss of generality).In this case (1.2) takes the form:inf ν ∈P ( R d ) J ( ν ) := I (cid:88) i =1 λ i W ( µ i , ν ) (1.3)where W denotes the 2-Wasserstein distance. In analogy with the Euclideancase, a solution to (1.3) will be called a Wasserstein barycenter of the mea-sures µ i with weights λ i . Properties of Wasserstein barycenters were studiedby Agueh and Carlier [AC11]. Wasserstein barycenters interpolate between themeasures µ i ; the idea of interpolating between points of a metric space by min-imizing some weighted sum of squared distances goes back to the notion ofFr´echet mean. The case I = 2 is well-known. Letting the weights ( λ, − λ )vary, one obtains the classical notion of McCann’s interpolation [McC97] be-tween two probability measures. This interpolating curve is also a geodesic for W , and in their seminal paper [BB00] on the dynamic formulation of opti-mal transport, Benamou and Brenier gave a numerical scheme to compute thisgeodesic. Finding barycenters between more than two measures is more compli-cated (barycenters are not associative as soon as the dimension d of the ambientspace is larger than 2). From a Partial Differential Equations viewpoint, thisproblem requires to solve a system of Monge-Amp`ere equations, see (4.9)-(4.8)below. Interestingly, the Wasserstein barycenter problem recently found naturalapplications in image processing, see Peyr´e et al. [RPDB12] and statistics, seeBigot and Klein [BK12]. Of course, there are lots of variants of the interpolat-ing scheme given by the W -barycenter problem (1.3) and in particular one canreplace W by W p for some p ≥ = 1600 vari-ables, the linear program has 40 = 2 560 000 which is near the limitations oflinear programming algorithms (we performed experiments using CVX [GB10]and calling several academic and commercial optimization packages). Enlarg-ing the resolution of the measures quickly overwhelms the capabilities of thealgorithms.The problem (1.2) is even more challenging, since it involves multiple marginalsand an additional unknown measure. Resolving the barycenter measure on thefull grid generally leads to an intractable problem (but as shown by Cuturiand Doucet [CD14], some well-chosen smooth approximation can be solved inan efficient way). Our main contributions regarding numerical schemes for thegeneral problem (1.2) or the particular case of Wasserstein barycenters (1.3) areas follows: • We give a simple linear programming reformulation of (1.2) in subsec-tion 2.3 whose size is proportional to the number of marginals. Togetherwith a localization result that bounds the support of the unknown barycen-ter in subsection 2.2, one then obtains a tractable problem. We discretizethe problem to arrive at a finite dimensional linear programming problemin subsection 2.4. We prove convergence, in the sense of weak convergenceof measures, in subsection 2.5. • Numerical results are presented in section 3. These illustrate the validityof this linear programming approach. Barycenter problems with differentcosts are solved, as well as a matching for teams problem. • The second algorithm which is specialized to the case of Wassersteinbarycenter measures (1.3), is described and illustrated in section 4. Thisproblem uses the dual formulation of the problem explained in section 4,and special features of the quadratic cost. The efficient nonsmooth op-timization algorithm is described in subsection 4.3. Large size compu-tational examples are presented (on grids of size 200 , and for measuresresolved with 15000 points). The examples include barycenter measuresusing up to five measures, and an example in texture synthesis in subsec-tion 4.4. In this section, following [CE10], we first derive the generalized barycenter prob-lem (1.2) as an equivalent reformulation of an equilibrium problem for multi-4opulation matching arising in economics. Next, we study localization of thebarycenter measure. Then, we present an infinite dimensional linear program-ming reformulation of (1.2). This is followed by a discretization of the mea-sures, which results in a finite dimensional linear programming problem that istractable for moderate problem sizes. Finally, we address stability issues (in thesense of weak convergence of measures) when one approximates the measures µ i by some (discrete) measures. The model of Carlier and Ekeland [CE10] deals with the equilibrium of a marketfor a quality good (e.g. house, school, hospitals, . . . ). Producing the goodrequires assembling a team consisting of a buyer and a set of producers. Forinstance, in the case of houses, the producers could be a plumber, an electricianand a mason. The quality good has a range of feasible qualities (location,surface, number of rooms, facilities etc), denoted by Z which we assume to bea compact metric space.Each of the different populations (buyers, plumbers, electricians, masons...)is indexed by i ∈ { , . . . , I } . The agents in each population are hetererogeneous,characterized by a certain type which affects their (quality dependent) costfunction. For example, some masons are used to work with lower quality bricks,while other work with luxury stones, some electricians live quite far from thelocation of the house they work on, consumers differ in their tastes... To beprecise, for each population i , we are given a compact metric space of types, X i ,and a continuous cost function c i ∈ C ( X i × Z, R ) with the interpretation that c i ( x i , z ) is the cost for an agent of population i with type x i to work in a teamthat produces good z . The distribution of type x i in population i is known andgiven by some Borel probability measure µ i ∈ P ( X i ).The goal is to find an equilibrium production line ν ∈ P ( Z ) (together witha price system) which clears both the quality good and the labor market. Theequilibrium is described below, and as we shall see, it corresponds to the solutionof the (generalized) barycenter measure problem (2.5). In this setting, one looksin particular for an equilibrium system of monetary transfers (paid by the buyerto the producers). A system of transfers is a collection of continuous functions ϕ , . . . ϕ I : Z → R where ϕ i ( z ) is the amount paid to i by the other membersof the team for producing z . An obvious equilibrium requirement is that teamsare self-financed i.e. I (cid:88) i =1 ϕ i ( z ) = 0 , ∀ z ∈ Z. (2.1)Given transfers, ϕ , . . . ϕ I , an agent from population i with type x i ∈ X i , getsa net minimal cost given by the so-called c i -transform of ϕ i : ϕ c i i ( x i ) := min z ∈ Z { c i ( x i , z ) − ϕ i ( z ) } . (2.2)By construction, ϕ c i i ( x i ) + ϕ i ( z ) ≤ c i ( x i , z ), and since agents are rational, they5hoose cost minimizing qualities, i.e. a z ∈ Z such that ϕ c i i ( x i ) + ϕ i ( z ) = c i ( x i , z ) . (2.3)The final unknown is a collection of plans, γ i ∈ P ( X i × Z ), such that γ i ( A i × A ) represents the probability that an agent in population i has a type in A i ,and belongs to a team that produces a quality in A . At equilibrium, the firstmarginal of γ i should be µ i (this is equilibrium on the i -th labor market) andthe second marginal of γ i should not depend on i (this is equilibrium on thequality good market), this common marginal represents the equilibrium qualityline. An equilibrium can then be formally defined. It consists of a transfersystem ( ϕ , . . . ϕ I ) ∈ C ( Z, R ) I , probability measures γ i ∈ P ( X i × Z ), and aprobability measure ν ∈ P ( Z ), such that • teams are self-financed i.e. (2.1) holds, • γ i ∈ Π( µ i , ν ) for i = 1 , . . . , I (equilibrium on the labor markets and onthe good market), • (2.3) holds on the support of γ i for i = 1 , . . . , I , (i.e. agents choose costminimizing qualities).If an equilibrium quality line, ν , was known, then clearly the last two con-ditions above would imply that the plan γ i should be optimal for the Monge-Kantorovich problem: W c i ( µ i , ν ) := inf γ ∈ Π( µ i ,ν ) (cid:90) X i × Z c i ( x i , z ) γ ( dx i , dz ) . (2.4)In fact, it was proved in [CE10] that there is a purely variational charac-terization of equilibria, which is tightly related to the following convex probleminf ν ∈P ( Z ) J ( ν ) := I (cid:88) i =1 W c i ( µ i , ν ) (2.5)and its dual (concave maximization) formulation (see [CE10] or section 4 fordetails on this duality)sup (cid:40) I (cid:88) i =1 (cid:90) X i ϕ c i i ( x i ) µ i ( dx i ) : I (cid:88) i =1 ϕ i = 0 (cid:41) . (2.6) Theorem 2.1. ( ϕ i , γ i , ν ) is an equilibrium if and only if: • ν solves (2 . , • the transfers ( ϕ , . . . ϕ I ) solve (2.6), • for i = 1 , . . . , I , γ i solves the Monge-Kantorovich problem W c i ( µ i , ν ) . .2 Localization As noted in [CE10], the minimization problem (2.5), which characterizes equi-librium quality lines, can be reformulated as an optimal transport problem withmulti-marginal constraints, as follows. First define the cost c ( x ) := min z ∈ Z I (cid:88) i =1 c i ( x i , z ) , (2.7)where x = ( x , . . . , x I ). Let T ( x ) ∈ Z be a measurable selection of the solutionof the above minimization, meaning that T ( x ) ∈ Z satisfies I (cid:88) i =1 c i ( x i , T ( x )) = c ( x ) . Then consider the multi-marginal probleminf η ∈ Π( µ ,...,µ I ) (cid:90) X × ... × X I c ( x ) η ( dx ) , (2.8)where Π( µ , . . . , µ I ) denotes the set of probability measures on X × . . . × X I having ( µ , . . . , µ I ) as marginals. It is not difficult to see that if η solves (2.8)then ν := T η solves (2.5) (where as usual T η denotes the push forward of η through T , i.e. T η ( B ) := η ( T − ( B )) for every Borel set B ).Conversely, one can relate the minimizers of (2.5) to those of (2.8). In-deed, let ν solve (2.5) and let γ i ∈ Π( µ i , ν ) be an optimal plan for W c i ( µ i , ν ).Disintegrating γ i with respect to ν i.e. writing γ i = γ zi ⊗ ν and defining γ ∈ P ( X × . . . × X I × Z ) by: γ := ⊗ Ii =1 γ zi ⊗ ν and η as the marginal of γ on the variables ( x , . . . , x I ), one easily checks that • η ∈ Π( µ , . . . , µ I ) solves (2.8), • on the support of γ , spt( γ ), one has I (cid:88) i =1 c i ( x i , z ) = c ( x ) , • the previous relation, together with the fact that ν is the Z -marginalof γ and µ i its X i -marginal then imply a useful localization property:the support of the barycenter measure, spt( ν ), is contained in the set ofminimizers of the following problem.min z ∈ Z I (cid:88) i =1 c i ( x i , z ) for some x i ∈ spt µ i , i = 1 , . . . , I. (2.9)7ince the support of ν is unknown, which causes difficulties in practice, thelocalization property (2.9) gives a practical method for bounding the unknownsupport of the barycenter measure. The condition above results in a reduction ofthe dimensionality of discretized problems since it gives an a priori informationon the support of the unknown measure, at the expense of solving an optimiza-tion problem. However this optimization problem is decoupled on the domain Z : each point (or neighborhood) can be tested by looping through points (orsmall neighborhoods) in the domain Z and choices of points in the support setsspt( µ i ).In the case where X i and Z coincide with some ball of R d , and the costsare powers of distance, c i ( x i , z ) = λ i | x i − z | p (with λ i > (cid:80) λ i = 1,say) for some p ≥
1, one can easily derive an information on the unknownsupport. Indeed, using the optimality condition for the minimization problem(2.9), one deduces that spt( ν ) is included in the convex hull of the supportsof the µ i ’s. If we particularize further to the Wasserstein barycenter case, i.e.to the case p = 2, the solution of (2.9) is explictly given by the barycenter z = (cid:80) Ii =1 λ i x i so that the localization property (2.9) gives the following estimateon the barycenter measure ν :spt( ν ) ⊂ I (cid:88) i =1 λ i spt( µ i ) . (2.10) Multi-marginals optimal transport problems such as (2.8) are linear programs.For discrete marginals, such problems can in principle be solved exactly by thesimplex method. In practice however, the number of variables explodes with thenumber of marginals, which makes the problem quickly intractable. We shallsee below that one may take advantage of the fact that c is not any cost functionbut has the special structure (2.7). Interestingly, it was already proved by Pass[Pas12a] in the context of multi-marginal optimal transport that such costs aremuch more well-behaved than arbitrary costs of I variables.To find a more tractable linear programming reformulation of the matchingfor teams problem, it is better to go back to the very definition of an equilibriumin terms of couplings and to reformulate problem (2.5) asinf ( γ ,...,γ I ) ∈ Π I (cid:88) i =1 (cid:90) X i × Z c i ( x i , z ) γ i ( dx i , dz ) (2.11)where Π consists of all measures ( γ , . . . , γ I ) ∈ P ( X × Z ) × . . . × P ( X I × Z )such that • the marginal of γ i on the x i variable is µ i i.e. (cid:90) X i × Z ψ ( x i ) γ ( dx i , dz ) = (cid:90) X i ψ ( x i ) µ i ( dx i ) , ∀ ψ ∈ C ( X i ) , (2.12)8 the marginal of γ i on the z variable does not depend on i : (cid:90) Z ϕ ( z ) γ ( dx , dz ) = . . . = (cid:90) X I ϕ ( z ) γ I ( dx I , dz ) , ∀ ϕ ∈ C ( Z ) . (2.13)Clearly, if the γ i ’s solve (2.11) then their common marginal ν ∈ P ( Z ) solves(2.5) and the γ i ’s are optimal for the optimal transport problem W c i ( µ i , ν ).In other words, the γ i ’s are equilibrium couplings for the matching for teamsproblem.The constraints above being linear, Π is a convex and weakly ∗ compactsubset of P ( X × Z ) × . . . ×P ( X I × Z ) so that (2.11) admits solutions. Moreoverin the case of discrete µ i ’s and ν supported by N points, the number of variablesin the linear program (2.11) is linear (and not exponential as in the case of themulti-marginal optimal transport problem) in the number of marginals. The (a priori) infinite dimensional linear programming problem (2.11) of sub-section 2.3 can be discretized as follows. Let { S ji } N i j =1 be a partition of spt( µ i )and let { S k } N k =1 be a partition of Z (or better, of the support set estimated bythe method of subsection 2.2). Approximate the measures by weighted sums ofatoms µ Ai = N i (cid:88) j =1 µ ji δ x ji , for i = 1 , . . . , I, with µ ji = µ i ( S ji ) ν A = N (cid:88) k =1 ν k δ z k , with ν k = ν ( S k )where x ji and z k are representative points in the regions S ji , S k , respectively. Itis well-known, that µ Ai converges weakly ∗ to µ i as the diameter of the partition { S ji } N i j =1 tends to 0. More precisely, denoting by W the 1-Wasserstein distance(which metrizes the weak ∗ topology on probability measures): W ( µ Ai , µ i ) ≤ max j =1 ,...,N i diam( S ji ) . (2.14)Inserting the approximation defined above into the linear programming prob-lem (2.11, 2.12, 2.13) results in the following finite dimensional linear program-9ing problemminimize I (cid:88) i =1 (cid:88) j,k c i ( x ji , z k ) γ j,ki subject to: (cid:88) k γ j,ki = µ ji , for all i = 1 , . . . , I, and j = 1 , . . . , N i (cid:88) j γ j,k = . . . = (cid:88) j γ j,kI , for all k = 1 , . . . , N , (2.15)along with the non-negativity constraints γ j,ki ≥ . The (approximated) barycen-ter is then ν A = (cid:80) k ν k δ z k where the weight ν k is given by the common value, ν k = (cid:88) j γ j,ki , for any i = 1 , . . . , I. The linear programming problem above can be implemented in standard soft-ware packages. The size of the problem above is as follows. The number ofvariables is N × ( N + · · · + N I ) (or IN if each N = N = · · · = N I = N ).The number of constraints is ( N +1) × ( N + · · · + N I )+ IN (that is I ( N +2 N )when N = N = · · · = N I = N ). The size of this linear programming problemthus scales linearly with the number of marginals, for a given, fixed value of N (contrary to the multi-marginal formulation (2.8)). Since in practice, one considers approximation by discrete measures just as insubsection 2.4, we wish now to address the stability of the following convexproblem when one replaces the measures µ i by some discrete approximationinf ν ∈P ( Z ) J ( ν ) := I (cid:88) i =1 W c i ( µ i , ν ) . (2.16)To do so, one has to control the dependence of W c ( µ, ν ) in its three arguments( c, µ, ν ) ∈ C ( X × Z ) × P ( X ) × P ( Z ). We shall denote by d X and d Z thedistances on X and Z , take now ( c, µ, ν ) ∈ C ( X × Z ) × P ( X ) × P ( Z ) and( (cid:101) c, (cid:101) µ, (cid:101) ν ) ∈ C ( X × Z ) × P ( X ) × P ( Z ) and let ω X and ω Z be respectively amodulus of continuity of c and (cid:101) c with respect to x uniform in z and a modulusof continuity of c and (cid:101) c with respect to z uniform in x , that ismax( | c ( x, z ) − c ( x (cid:48) , z ) | , | (cid:101) c ( x, z ) − (cid:101) c ( x (cid:48) , z ) | ) ≤ ω X ( d X ( x, x (cid:48) )) , ∀ ( x, x (cid:48) , z ) ∈ X × X × Z andmax( | c ( x, z ) − c ( x, z (cid:48) ) | , | (cid:101) c ( x, z ) − (cid:101) c ( x, z (cid:48) ) | ) ≤ ω Z ( d Z ( z, z (cid:48) )) , ∀ ( x, z, z (cid:48) ) ∈ X × Z × Z. | W c ( µ, ν ) − W (cid:101) c ( µ, ν ) | ≤ (cid:107) c − (cid:101) c (cid:107) ∞ . (2.17)Let ϕ ∈ C ( Z ) be a solution in the Kantorovich dual of W (cid:101) c ( µ, ν ), that is W (cid:101) c ( µ, ν ) = (cid:90) X ϕ (cid:101) c dµ + (cid:90) Z ϕdν by the Kantorovich duality formula, we have W (cid:101) c ( (cid:101) µ, ν ) ≥ (cid:90) X ϕ (cid:101) c d (cid:101) µ + (cid:90) Z ϕdν. Hence, for every θ ∈ Π( µ, (cid:101) µ ), we have W (cid:101) c ( µ, ν ) − W (cid:101) c ( (cid:101) µ, ν ) ≤ (cid:90) X ϕ (cid:101) c d ( µ − (cid:101) µ ) = (cid:90) X × X ( ϕ (cid:101) c ( x ) − ϕ (cid:101) c ( x (cid:48) )) θ ( dx, dx (cid:48) ) . We then observe that ϕ (cid:101) c ( x ) − ϕ (cid:101) c ( x (cid:48) ) ≤ ω X ( d X ( x, x (cid:48) )) so that W (cid:101) c ( µ, ν ) − W (cid:101) c ( (cid:101) µ, ν ) ≤ W ω X ( µ, (cid:101) µ ) := inf θ ∈ Π( µ, (cid:101) µ ) (cid:90) X × X ω X ( d X ( x, x (cid:48) )) θ ( dx, dx (cid:48) ) . (2.18)Similarly W (cid:101) c ( (cid:101) µ, ν ) − W (cid:101) c ( (cid:101) µ, (cid:101) ν ) ≤ W ω Z ( ν, (cid:101) ν ) := inf η ∈ Π( ν, (cid:101) ν ) (cid:90) Z × Z ω Z ( d Z ( z, z (cid:48) )) η ( dz, dz (cid:48) ) . (2.19)Putting everything together, we get | W c ( µ, ν ) − W (cid:101) c ( (cid:101) µ, (cid:101) ν ) | ≤ (cid:107) c − (cid:101) c (cid:107) ∞ + W ω X ( µ, (cid:101) µ ) + W ω Z ( ν, (cid:101) ν ) . (2.20)We then remark that if µ n weakly ∗ converges to µ then W ω X ( µ, µ n ) → W ω X for ω X ( t ) = t ) between µ n and µ converges to 0, so that there is some θ n ∈ Π( µ, µ n ) which (up to a non relabeled subsequence) weakly ∗ converges tosome θ supported on the diagonal of X × X , hence W ω X ( µ, µ n ) ≤ (cid:90) X × X ω X ( d ( x, x (cid:48) )) θ n ( dx, dx (cid:48) ) → . Getting back to the approximation of (2.5), take sequences c ni ∈ C ( X i × Z ), µ ni ∈ P ( X i ), and c i ∈ C ( X i × Z ), µ i ∈ P ( X i ), such that (cid:107) c ni − c i (cid:107) ∞ → , µ ni (cid:42) ∗ µ i (2.21)and set: J ( ν ) := I (cid:88) i =1 W c i ( µ i , ν ) , J n ( ν ) := I (cid:88) i =1 W c ni ( µ ni , ν ) , ∀ ν ∈ P ( Z ) . (2.22)Denoting by ω iX i and ω iZ common continuity modulus of the c ni (the first onein x i uniformly in z and the second in z , uniformly in x i just as above) we thenhave: 11 roposition 2.2. For every ( ν, ν n ) ∈ P ( Z ) × P ( Z ) : | J ( ν ) − J n ( ν n ) | ≤ I (cid:88) i =1 (cid:107) c i − c ni (cid:107) ∞ + I (cid:88) i =1 [ W ω iXi ( µ ni , µ i ) + W ω iZ ( ν n , ν )] (2.23) this implies in particular • J n ( ν n ) → J ( ν ) whenever ν n (cid:42) ∗ ν , • a quantitative estimate for the stability of values: | inf P ( Z ) J − inf P ( Z ) J n | ≤ I (cid:88) i =1 (cid:107) c i − c ni (cid:107) ∞ + I (cid:88) i =1 W ω iXi ( µ ni , µ i ) (2.24) • if ν n minimizes J n then, up to a subsequence, it weakly ∗ converges to aminimizer of J .Proof. The statements directly follow from estimate (2.20) and the alreadyobserved fact that the right-hand side of (2.23) converges to 0 as soon as ν n (cid:42) ∗ ν .In the case where the cost functions c i = c ni are Lipschitz (with Lipschitzconstant Lip( c i )) and the approximated measures µ ni satisfy (for the usual 1-Wasserstein distance) W ( µ ni , µ i ) ≤ Cn , (2.23) above just takes the form | J ( ν ) − J n ( ν n ) | ≤ I (cid:88) i =1 Lip( c i ) (cid:16) Cn + W ( ν n , ν ) (cid:17) . In this section, we present various numerical simulations using the Linear Pro-gramming approach of sections 2.3 and 2.4. The localization method of 2.2 isused to approximate the support of the barycenter measure. An alternate ap-proach to approximating the support of the barycenter measure, which can becombined with localization is a a two stage solution approach: the first stage,using a coarse grid, gives an approximation of the support of the barycenter,the second stage gives a more accurate representation of the barycenter usinginformation on the support obtained in the first stage.All computations in this section were performed in MATLAB on a Mid2011 MacBook Air laptop. To solve (2.15) we use the software package CVX[GB10] [GB08] which is callable from MATLAB. The CVX language allows for avery concise description of the convex optimization problem, and allows for theuse of multiple solver libraries (e.g. MOSEK, Gurobi). The numerical solutionobtained is the correct up to tolerances near numerical precision.12 − − − − − − − − − − − Figure 1: Given measures in black. Figure left, centre, right: solution of thegeodesic problem with weights .25, .5, .75, respectively.
We considered two measures in the plane, and by varying the weights in thequadratic cost function, we computed three points on the geodesic path (orMcCann interpolant) represented in Figure 1. The computational time was lessthan a minute. The measures are illustrated by a circle centered on the atom(middle of the corresponding square) and a radius proportional to the weight.Both the shape of the support (square, diamond) and the density of the measureare illustrated in the figure: the interpolated measures are influenced by bothproperties. Figure 2 illustrates the two-stage support refinement strategy. − − − − − − − − Figure 2: Refinement for the geodesic problem. Left: solution on full grid.Right: solution on the implied support, but more resolved. The size of theproblems is the same, but the resolution is increased by a factor of three.
For the next set of examples, we took two uniform measures, the first corre-sponding to a vertically oriented rectangle, and the second corresponding toan horizonal rectangle. These measures are shown in Figure 3. First, we com-pared the convergence of the solutions for different grid sizes in Figure 4. Noticethat the general support of the computed measures seems stable, but there are13scillations in the density, for different resolutions.Next, we computed the barycenter with various power cost functions C ( x, y ) = | x − y | p , for p = 1 , , ,
4. The solutions we computed use grids of size 50 . Com-putational time was close to two minutes. A second run using grid size of 100 and a localization of the support took 30 to 45 minutes and is represented inFigure 5. The densities are plotted using a grayscale which corresponds to therelative values, however the grayscale is different for each figure. We also includeanother view of the density for p = 1 in Figure 6.The solutions have a complicated geometry. For the cost with p = 1, the sup-port of the barycenter is the entire convex hull of the supports of the measures,although the density is highly concentrated at the intersection of the measures.The density ranges from about 0.01 at the edges to 0.12 in the center.For the case p = 2, the density is supported on a square, but wider thanthe width of the rectangle. The density has some oscillations, but is strictlypositive (taking values in the range [.004,.005]). For the case p = 3, the densityis supported on a small octagonal shape, with zero density in the middle, andwith larger oscillations. For the case p = 4, a much larger octagonal shapeappears with a large zero density hole in the middle. The supports of thebarycenter measures are close to the ones estimated by localization. − − − − − − − − Figure 3: The measures m , m , m , m used in the examples which follow. − − − − − − Figure 4: Comparison of the numerical barycenter for measures m , m usingcost C ( x, y ) = | x − y | on different grid sizes: 25 , and 100 . Note thegeneral shape of the solutions are similar, but the density has more oscillationsat higher resolution. 14 − − − − − Figure 5: Barycenters of two rectangles m , m , with cost C ( x, y ) = | x − y | p for p = 1 , ,
3, using grid size 100 . − − Figure 6: Surface plot of the barycenter corresponding to p = 1.Finally, we computed the barycenters using all four measures from figure 3,see figure 7. − − − − − − Figure 7: Barycenters of the measures m , m , m , m (four rotated rectangles)with cost C ( x, y ) = | x − y | p for p = 1 , ,
3, using grid size 50 . We considered the matching for teams problem and used measures and costsas follows, also see Figure 8. Set the quality domain Z = [0 , and write z = ( z , z ) for points in Z . Set M , M , M to be measures which have constantdensity on their support, and let their supports be [1 , , [1 . , . × [1 , , × [1 . , . c is the negative of the buyer’s utility) are c ( x, z ) = − . x z + x z ) c ( x, z ) = c ( x, z ) = ( x + z ) + ( x + z ) . The solution concentrates mass at the boundary, and especially at the cor-ners of the domain. Figure 8: Solution of the matching for teams problem. Left: the three measures,and the solution. Right: surface plot of the solution. The solution concentratesmass mostly on the corners with some mass on the edges.
Let us now explain why the variational problem (2.6) can be naturally be seenas a dual formulation of (2.11) (see [CE10] for more details on this duality). Tothat end, let us observe that ( γ , . . . , γ I ) ∈ Π if and only if (2.12) holds for every i (these are the fixed µ i marginals constraints) and I (cid:88) i =1 (cid:90) X i × Z ϕ i ( z ) γ i ( dx i , dz ) = 0 , as soon as I (cid:88) i =1 ϕ i ( z ) = 0 , ∀ z ∈ Z. (4.1)Indeed, clearly if the γ i ’s have the same marginal on Z then (4.1) holds. Con-versely assume (4.1), let i (cid:54) = j and ϕ ∈ C ( Z ) then applying (4.1) to the po-tentials ϕ i = ϕ , ϕ j = − ϕ and ϕ k = 0 for k ∈ { , . . . , I } \ { i, j } we see that (cid:82) X i × Z ϕ ( z ) γ i ( dx i , dz ) = (cid:82) X j × Z ϕ ( z ) γ j ( dx j , dz ). This proves that (4.1) charac-terizes the fact that the γ i ’s share the same marginal on Z . This enables us torewrite (2.11) in inf-sup form:inf γ i ≥ sup (cid:40) L (( γ i ) i , ( ψ i ) i , ( ϕ i ) i ) : ψ i ∈ C ( X i ) , ϕ i ∈ C ( Z ) : I (cid:88) i =1 ϕ i = 0 (cid:41) (4.2)16here the Lagrangian L is given by L (( γ i ) i , ( ψ i ) i , ( ϕ i ) i ) := I (cid:88) i =1 (cid:90) X i × Z ( c i ( x i , z ) − ψ i ( x i ) − ϕ i ( z )) γ i ( dx i , dz )+ I (cid:88) i =1 (cid:90) X i ψ i ( x i ) µ i ( dx i ) . To obtain the desired dual formulation, we formally switch the inf and the sup(again, we refer to [CE10] for a rigorous derivation):sup ( ψ i ,ϕ i ) , (cid:80) ϕ i =0 inf γ i ≥ L (( γ i ) i , ( ψ i ) i , ( ϕ i ) i ) . We next observe thatinf γ i ≥ L (( γ i ) i , ( ψ i ) i , ( ϕ i ) i ) = I (cid:88) i =1 (cid:90) X i ψ i ( x i ) µ i ( dx i )+ I (cid:88) i =1 inf γ i ≥ (cid:90) X i × Z ( c i ( x i , z ) − ψ i ( x i ) − ϕ i ( z )) γ i ( dx i , dz )and the latter infimum is 0 when c i ( x i , z ) ≥ ψ i ( x i ) + ϕ i ( z ) , ∀ ( x i , z ) ∈ X i × Z (4.3)and −∞ otherwise. The dual of (2.11) therefore consists in maximizing I (cid:88) i =1 (cid:90) X i ψ i ( x i ) µ i ( dx i )subject to the constraints (4.3) and (cid:80) Ii =1 ϕ i = 0. For fixed ϕ i , the maximal ψ i that satisfies (4.3) being ψ i = ϕ c i i , we see that that the dual can be equivalentlyformulated as sup (cid:40) I (cid:88) i =1 (cid:90) X i ϕ c i i ( x i ) µ i ( dx i ) : I (cid:88) i =1 ϕ i = 0 (cid:41) (4.4)which is exactly (2.6). For the existence of solutions and the equality betweenthe infimum in (2.11) and the supremum in (4.4) (which is obtained by a slightlydifferent argument), we again refer to [CE10]. Now the optimality conditionsfor (2.11) and (4.4) are summarized by the equivalence between the followingassertions: • ( γ i ) i ∈ Π solves (2.11) and ( ϕ i ) i such that (cid:80) Ii =1 ϕ i = 0 solves (4.4), • (( γ i ) i , ( ϕ c i i ) i , ( ϕ i ) i ) is a saddle point of L , • for every i , one has ϕ c i i ( x i ) + ϕ i ( z ) = c i ( x i , z ) (4.5) γ i -almost everywhere on X i × Z or, equivalently, by continuity, on thesupport of γ i . 17 .2 The case of Wasserstein barycenters From now on, we restrict ourselves to the quadratic case where all the X i ’s and Z are some ball B (say) of R d and the costs c i are quadratic: c i ( x i , z ) := λ i | x i − z | where the λ i ’s are positive coefficients which we normalize in such a way thatthey sum to 1. In this case, (2.5) corresponds toinf ν ∈P ( B ) I (cid:88) i =1 λ i W ( µ i , ν ) (4.6)where W stands for the squared 2-Wasserstein distance. This problem hasbeen studied in details in [AC11] where uniqueness (under the assumption thatone of the measures does not give mass to small sets), characterization, L p or L ∞ regularity results are established for Wasserstein barycenters as well as aclose connection with the quadratic multimarginal optimal transport problem ofGangbo and ´Swi¸ech [GS98]. Since Wasserstein barycenters constitute a naturalway to interpolate between an arbitrary number of measures, they therefore alsofind applications in image processing [RPDB12] and statistics [BK12].Let us now informally give the optimality conditions for the barycenter usingonce again the dual formulation (4.4) (see [AC11] for details). In the presentquadratic setting, the formula for the c i -transform takes the form ϕ c i i ( x i ) = inf z (cid:26) λ i | x i − z | − ϕ i ( z ) (cid:27) , which, defining u i ( x i ) := 12 | x i | − ϕ c i i ( x i ) λ i can conveniently be rewritten as u i = (cid:16) | . | − ϕ i λ i (cid:17) ∗ (where ∗ denotes the usual Legendre transform). In particular, u i is convex(hence differentiable outside of a small set) and defining v i := u ∗ i we have12 | . | − ϕ i λ i ≥ (cid:16) | . | − ϕ i λ i (cid:17) ∗∗ = u ∗ i = v i . (4.7)Moreover the optimal coupling γ i is concentrated on the set where equality(4.5) holds which is equivalent to the relation u i ( x i ) + | z | − ϕ i ( z ) λ i = x i · z butrecalling (4.7), this implies that x i · z ≥ u i ( x i ) + v i ( z ) = u i ( x i ) + u ∗ i ( z ) ≥ x i · z so that z = ∇ u i ( x i ) (provided u i is differentiable at x i which is the case µ i a.e.as soon as µ i does not charge small sets...). This implies that the barycenter18hich is also the marginal ν that is common to all the γ i ’s is actually given by ν = ∇ u i µ i for every i and ∇ u i is the optimal transport between µ i and ν for W . As explained above, we can deduce from (4.7) and the fact that γ i -almosteverywhere equality (4.5) holds that for ν -a.e. z , one has12 | z | − ϕ i ( z ) λ i = v i ( z ) . Recalling that the ϕ i have to sum to 0, we deduce that I (cid:88) i =1 λ i v i ( z ) = | z | ν . The optimality conditions for the barycenter ν = ∇ u i µ i = ∇ v i ∗ µ i therefore, at least formally take the form of the system of Monge-Amp`ere equations ν = µ i ( ∇ v i ) det( D v i ) , i = 1 , . . . , I (4.9)which is supplemented with equation (4.8) on the support of ν . We shall see inthe next paragraph how to compute numerically in an efficient way the potentials ϕ i . Discretization of the dual problem.
We assume in all this section thatthe sets X i ’s and Z are subsets of R d for some d = 1 , I (cid:88) i =1 (cid:90) X i ϕ c i i ( x i ) µ i ( dx i ) (4.10)where ϕ c i i ( x i ) = inf z (cid:26) λ i | x i − z | − ϕ i ( z ) (cid:27) , under the linear constraint (cid:80) Ii =1 ϕ i ( z ) = 0 for all z ∈ Z . This formulationleads to the following natural discretization of Wasserstein quadratic barycentercomputation. Suppose ( y ji , ν ji ) j =1 ,...,N i ⊂ X i × R + is a convergent quantizationof the measures µ i . More explicitly, we assume that for all i = 1 , . . . , I lim N i →∞ c ( N i ) N i (cid:88) j =1 ν ji δ y ji = µ i in the sense of the weak convergence of measures. In order to only considerprobability measures, we set c ( N i ) = ( (cid:80) N i j =1 ν ji ) − . Additionally, we suppose19hat ( z k ) is a dense countable family of points of Z . Based on (4.10) and previousquantizations, for a given N k ∈ N , our discrete optimization problem of I × N k variables readsΦ(( ϕ k ) , . . . , ( ϕ kI )) = I (cid:88) i =1 c ( N i ) N i (cid:88) j =1 ν ji min k =1 ,...,N k (cid:26) λ i | y ji − z k | − ϕ ki (cid:27) (4.11)under the N k pointwise linear constraints: (cid:88) i ϕ ki = 0 , ∀ k = 1 , . . . , N k . (4.12)This optimization problem in its dual form can be seen as a large scale non-smooth concave maximization problem. We discuss in the next paragraph al-ternatives that have been developed to solve numerically this type of problems. Non smooth algorithms.
Many different approaches have been introducedin the last decades to approximate optimal solution of non-smooth concave (orconvex) problems, e.g. gradient sampling methods [BLO05] and bundle methods[LMfASA78]. These algorithms make use of a partial or complete descriptionof superdifferentials in order to identify ascent directions (see next paragraph).Even though Proposition 4.1 describes explicitly the whole superdifferential,finding an effective ascent direction in practice is made difficult by the largedimension of some superdifferentials. Additionally, those approaches are essen-tially of order one and follow the singular parts of the graph of the cost function.These two facts could explain a slow rate of convergence when starting from aninitial point far away from any optimal vector.One surprisingly efficient alternative for minimizing non-smooth functions isthe use of quasi-Newton methods. It is known [Pow76] that if the maximizedfunction, Φ, is twice continuously differentiable and the suplevel set Φ ≥ Φ( x )is bounded, then the sequence of function values generated by the BFGS methodwith inexact Armijo-Wolfe line search, starting from x converge to the max-imal value of Φ. More recently, it has been pointed out by different authors[HUL96, LO09, LO08] that variable metric algorithms may produce in somecases sequences which converge to an optimal point in the sense of Clarke. Themathematical analysis of this good behavior has just been initiated in recentpapers of Overton [LO09, LO08]. This efficiency could be explained heuristi-cally by the fact that the approximated inverse of the Hessian matrix has aspectral decomposition in two subspaces which describe the two different localbehaviors of the cost Φ: a subspace associated to the regular directions of thecost function Φ, and the subspace of eigenvectors whose eigenvalues are smallin absolute value, which correspond to the singular directions of Φ.It has been observed in simple situations that L-BFGS (low memory versionof Broyden-Fletcher-Goldfarb-Shanno algorithm) algorithms are sometimes ableto converge to an optimal point. In more standard examples, where concentra-tion can occur for instance, L-BFGS approach fails to converge. This expected20ehavior for strongly non-smooth functions illustrates the need for using morespecific non-smooth techniques close from the optimal point. The costly, butreliable, bundle type algorithms have demonstrated their efficiency in this con-text.We will not give here a detailed study of quasi Newton methods applied tooptimal transportation which would be out of the scope of this paper. We onlypoint out that the L-BFGS algorithm combined with a bundle approach givesa rather efficient way to solve this type of problem. We refer to [HMM07] for acareful study and an efficient implementation of this kind of hybrid algorithm. Gradient computation.
The previous approach relies on the capability ofproviding at every iteration one supergradient vector of the current iterate. Itis straightforward to obtain the following characterization of the supergradientof the discrete dual cost Φ:
Proposition 4.1.
Let ( ϕ k ) , . . . , ( ϕ kI ) ∈ R N k × I . Then (( v k ) , . . . , ( v kI )) ∈ ∂ Φ(( ϕ k ) , . . . , ( ϕ kI )) if and only if it is a convex combination of the finite set of extremal vectorsdefined in the following way. Let ϕ k ( i,j ) ∗ i be any selection of minimizing valuesinvolved in the dual cost. That is ∀ i, jk ( i, j ) ∗ ∈ argmin k =1 ,...,N k (cid:26) λ i | y ji − z k | − ϕ ki (cid:27) . (4.13) Then, the set of extremal vectors is the finite collection of all vectors ( e k ∗ ) = ( (cid:88) j (cid:88) k − c ( N i ) ν ji δ k ( i,j ) ∗ ( k )) (4.14) for any selection ( k ( i, j ) ∗ ) . A crucial observation that has been raised in [MO12] is the fact that thecomputation of a vector of the superdifferential does not require generically anorder of I × N k × (cid:80) N i operations. Actually the following formulation makesit possible to use a special data structure called kd-tree which in most casesreduces the complexity of finding one vector of ∂ Φ(( ϕ k ) , . . . , ( ϕ kI )). Notice thatfor very large scale problems, the so called Approximate Nearest Neighbor Search could provide a relevant tool to relax our problem. In all our experiments weperformed exact searches.Let i, j be given and assume we want to evaluate the minimal value M = min k (cid:26) λ i | y ji − z k | − ϕ ki (cid:27) . Let us then define c i = max k ϕ ki and M = − c i + min k (cid:26) λ i | y ji − z k | + c i − ϕ ki (cid:27) . M = − c i + min k || P ji − Q ki || where P ji = ( (cid:114) λ i y ji ,
0) is a fixed vector of R d +1 , Q ki = ( (cid:114) λ i z k , (cid:113) c i − ϕ ki ) and || . || stands for the standard euclidean norm of R d +1 . Thus our supergradient re-quest reduces to identify one closest point of P ji among points of ( Q ki ) k . Observethat the family ( Q ki ) k does not depend of the parameter j . This task is a stan-dard operation in computational geometry which can be performed efficientlywith kd-tree structures. By using such tools, we can reduce the complexity ofthe supergradient request in the generic case to an order of I × log N k × (cid:80) N i operations. Observe that if the N i ’s and N k are of same order N , one requestis generically of complexity of order I N log N . Numerical quantization and localization.
We suppose that all the mea-sures µ i are compactly supported. We use the discretization of Section 2.4.The support of the unknown barycenter measure is bounded using the resultsof Section 2.2, in particular, (2.10) Reconstruction of the barycenter density.
One additional difficulty asso-ciated to the dual formulation is the fact that optimal dual vectors only give animplicit description of the barycentric measure. In order to recover the supportand the density of the barycentric measure, we introduce the following leastsquare procedure.Every optimal dual vector ( ϕ ki ) k must be associated to an optimal transportfrom µ i to the barycentric measure. A crucial observation is the fact thatevery associated map transports the µ i to the same measure. We exploit thisoptimality condition to recover the barycentric measure through the coefficients f j,ki which describe the mass transported from y ji to z k . By optimality, somemass can be transported from y ji to z k if and only if G i,j ( ϕ ki ) = min k (cid:48) =1 ,...,N k G i,j ( ϕ k (cid:48) i ) . (4.15)where G i,j ( ϕ k (cid:48) i ) = λ i | y ji − z k | − ϕ k (cid:48) i . Let us fix some parameter ε >
0. Basedon the previous observation, we only consider the unknown coefficients f j,ki forwhich G i,j ( ϕ ki ) is less than the optimal value (4.15) plus ε . In order to recoverthe barycentric measure, we look for the set of coefficients which generate thesame measures in an optimal least square sense. More precisely, we solve thesparse least square problem:min f j,ki (cid:88) l,m (cid:88) k ( (cid:88) j f j,kl ν jl − (cid:88) j (cid:48) f j (cid:48) ,km ν j (cid:48) m ) (4.16)22igure 9: Classical McCann interpolation between translated measuresamong non negative coefficients less than one which satisfy the linear constraints ∀ i, j, (cid:88) k f j,ki = 1 . As detailed in the previous paragraphs, our approach relies first on a non smoothoptimization step using an hybrid LBFGS/Bundle algorithm and a fast com-putation of supergradient vectors. In a second phase, a sparse least squareproblem is solved in order to recover an approximation of barycentric density.Let us point out that in all the following examples, the first optimization stepwas stopped after one hour of computation on a standard PC. This costly stepcould be dramatically sped up in using a straightforward parallelized cost eval-uation.We validate our approach by considering different test cases for which ana-lytic descriptions of barycenters are available. The simplest situation is the caseof the barycenters of a measure of density ρ ( . ) and a translated measure of den-sity ρ ( . + V ) where V is some fixed vector. In this trivial case, the isobarycentricmeasure is of course the measure of density ρ ( . + V / ρ = χ c where χ c is the characteristic function of a unit square. In this exper-iment, we used a grid of size 200 ×
200 and a recovering parameter ε = 10 − . Inthe least square optimization problem (4.16), we obtain an error of order 10 − for every quadratic term.Next we applied our approach to the case of Gaussian measures. A completedescription of barycenters of Gaussian measures has been given in [AC11]: con-sider a family of Gaussian measures µ i ( m i , S i ) of means ( m i ) i and covariancematrices ( S i ) i . Then, the barycentric measure associated to the non-negativeweights ( λ i ) i is a Gaussian measure of mean the barycenter of the ( m i , λ i ) i .23oreover, its covariance matrix is the only definite positive matrix S solutionof the equation (cid:88) i λ i ( S / S i S / ) / = S. (4.17)Figure 10: Isobarycenter computation of three gaussian measures by a global(first row) and a localized approach (second row).We denote by N ( m i , σ i ) a Gaussian of mean m i and of covariance matrixequal to σ i Id . We considered two different test cases and applied for both ourglobal and localized approaches. In all the experiments the number of samplingpoints of the given measures and of the barycentric measure have been fixedfor the global approach to ∀ i, N i = N z = 15 × . For the localized approachby Minkowski sum, we imposed ∀ i, N i = N z = 5 × . The first test caseconsists in approximating the isobarycenter of the three Gaussian measures ofrandom standard variations N ((0 . , . , / . N (( − . , − , / .
89) and N ((1 , − . , / . N ( m i , σ i ) , λ i ) i =1 ,..., where the m i are the vertices ofa regular pentagon with σ i = 1 /
50 and λ i = 1 / i is odd and σ i = 1 / λ i = 2 / × − for every quadratic term of (4.16). Acknowledgements:
The authors are grateful to the hospitality of BIRSwhere the present work was initiated at the occasion on a focused group meet-ing on Numerical methods for optimal transport. They are happy to thankMartial Agueh, Jean-David Benamou and Brendan Pass for many fruitful con-versations. The first author gratefully acknowledges the support of the ANR,through the projects ISOTACE and OPTIFORM and INRIA through the ”ac-tion exploratoire” MOKAPLAN.
References [AC11] Martial Agueh and Guillaume Carlier. Barycenters in the wasser-stein space.
SIAM Journal on Mathematical Analysis , 43(2):904–924, 2011. || m thi − m i || | σ thi − σ i | First test case 0 .
003 0 . . . .
002 0 . . . Numer. Math. , 84(3):375–393, 2000.[BFO14] Jean-David Benamou, Brittany D Froese, and Adam M Oberman.Numerical solution of the optimal transportation problem usingthe monge–amp`ere equation.
Journal of Computational Physics ,260:107–126, 2014.[BK12] J´er´emie Bigot and Thierry Klein. Consistent estimation of apopulation barycenter in the wasserstein space. arXiv preprintarXiv:1212.2562 , 2012.[BLO05] J.V. Burke, A.S. Lewis, and M.L. Overton. A robust gradient sam-pling algorithm for nonsmooth, nonconvex optimization.
SIAMJournal on Optimization , 15(3):751–779, 2005.[Bre91] Yann Brenier. Polar factorization and monotone rearrangement ofvector-valued functions.
Comm. Pure Appl. Math. , 44(4):375–417,1991.[CD14] Marco Cuturi and Arnaud Doucet. Fast computation of Wasser-stein barycenters. Proceedings of the 31st International Confer-ence on Machine Learning (ICML-14), pages 685–693. 2014.27CE10] Guillaume Carlier and Ivar Ekeland. Matching for teams.
Eco-nomic Theory , 42(2):397–418, 2010.[GB08] Michael Grant and Stephen Boyd. Graph implementations fornonsmooth convex programs. In V. Blondel, S. Boyd, andH. Kimura, editors,
Recent Advances in Learning and Control ,Lecture Notes in Control and Information Sciences, pages 95–110.Springer-Verlag Limited, 2008. http://stanford.edu/~boyd/graph_dcp.html .[GB10] M. Grant and S. Boyd. CVX: Matlab software for disciplinedconvex programming, version 1.21. http://cvxr.com/cvx , May2010.[GGM11] Bruno Galerne, Yann Gousseau, and Jean-Michel Morel. Ran-dom phase textures: theory and synthesis.
IEEE Trans. ImageProcess. , 20(1):257–267, 2011.[GM13] Nassif Ghoussoub and Abbas Moameni. A self-dual polar factor-ization for vector fields.
Comm. Pure Appl. Math. , 66(6):905–933,2013.[GM14] Nassif Ghoussoub and Bernard Maurey. Remarks on multi-marginals symmetric Monge-Kantorovich problems.
DiscreteCont. Dyn. Syst , 34(4):1465–1480, 2014.[GS98] Wilfrid Gangbo and Andrzej Swiech. Optimal maps for the mul-tidimensional Monge-Kantorovich problem.
Communications onpure and applied mathematics , 51(1):23–45, 1998.[HMM07] Napsu Haarala, Kaisa Miettinen, and Marko M M¨akel¨a. Globallyconvergent limited memory bundle method for large-scale nons-mooth optimization.
Mathematical programming , 109(1):181–205,2007.[HUL96] J.B. Hiriart-Urruty and C. Lemar´echal.
Convex analysis and min-imization algorithms , volume 1. Springer, 1996.[LMfASA78] C. Lemar´echal, R. Mifflin, and International Institute for AppliedSystems Analysis.
Nonsmooth optimization
SIAM Journal of Optimization, submitted for publication , 2009.28McC97] Robert J. McCann. A convexity principle for interacting gases.
Adv. Math. , 128(1):153–179, 1997.[MO12] Quentin M´erigot and ´Edouard Oudet. Discrete optimal transport:complexity, geometry and applications. Preprint, 2012.[Pas12a] Brendan Pass. Multi-marginal optimal transport and multi-agentmatching problems: uniqueness and structure of solutions. arXivpreprint arXiv:1210.7372 , 2012.[Pas12b] Brendan Pass. On the local structure of optimal measures in themulti-marginal optimal transportation problem.
Calc. Var. Par-tial Differential Equations , 43(3-4):529–536, 2012.[Pow76] M.J.D. Powell. Some global convergence properties of a variablemetric algorithm for minimization without exact line searches.
Nonlinear programming , 9:53–72, 1976.[PPO14] Nicolas Papadakis, Gabriel Peyr´e, and Edouard Oudet. Optimaltransport with proximal splitting.
SIAM Journal on Imaging Sci-ences , 7(1):212–238, 2014.[RPDB12] Julien Rabin, Gabriel Peyr´e, Julie Delon, and Marc Bernot.Wasserstein barycenter and its application to texture mixing. In
Scale Space and Variational Methods in Computer Vision , pages435–446. Springer, 2012.[Sch03] Alexander Schrijver.
Combinatorial optimization: polyhedra andefficiency , volume 24. Springer, 2003.[Vil03] C´edric Villani.
Topics in optimal transportation , volume 58. AMSBookstore, 2003.[Vil09] C´edric Villani.