The boundary method for semi-discrete optimal transport partitions and Wasserstein distance computation
TThe boundary method for semi-discrete optimal transportpartitions and Wasserstein distance computation (cid:73)
Luca Dieci
School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332 U.S.A.Tel.: + + J.D. Walsh III
Naval Surface Warfare Center, Panama City Division (X24), 110 Vernon Ave., Panama City, FL 32407 U.S.A.Tel.: + + Abstract
We introduce a new technique, which we call the boundary method , for solving semi-discreteoptimal transport problems with a wide range of cost functions. The boundary method reducesthe e ff ective dimension of the problem, thus improving complexity. For cost functions equalto a p -norm with p ∈ (1 , ∞ ), we provide mathematical justification, convergence analysis, andalgorithmic development. Our testing supports the boundary method with these p -norms, as wellas other, more general cost functions. Keywords:
Optimal transport, Monge-Kantorovich, semi-discrete, Wasserstein distance,boundary method
1. Introduction
In this work, we consider a new solution method for optimal transport problems. Numer-ical optimal transport has applications in a wide range of fields, but the scaling properties andground cost restrictions of current numerical methods make it di ffi cult to find solutions for manyapplications.The boundary method we propose focuses on a broad class of optimal transportation prob-lems: semi-discrete optimal transport . Many other techniques assume semi-discrete transport,either implicitly or explicitly, as semi-discrete formulations can be used to approximate solutionsto fully continuous problems, and the semi-discrete optimal transport problem is of practical rel-evance itself.Key challenges in numerical optimal transport are: (a) the design of numerical methodscapable of handling general ground costs, (b) e ffi cient computation of the Wasserstein metric, (cid:73) This material is based upon work supported by the National Science Foundation Graduate Research FellowshipProgram under Grant No. DGE-1650044. Any opinions, findings, and conclusions or recommendations expressed in thismaterial are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Email addresses: [email protected] (Luca Dieci), [email protected] (J.D. Walsh III)
Preprint submitted to Elsevier Friday 3 rd May, 2019 a r X i v : . [ m a t h . NA ] M a y nd (c) solutions of three (or higher) dimensional problems. The boundary method addressesthese concerns by solving problems where the ground cost is a p -norm, p ∈ (1 , ∞ ), and by doingso in a way that reduces the e ff ective dimension of the transport problem. The theory of optimal transport dates back to the work by Monge in 1781, [1]. In the 1940s,Kantorovich’s papers [2, 3] relaxed Monge’s requirement that no mass be split, creating we nowknow as the Monge-Kantorovich problem.
Definition 1.1 (Monge-Kantorovich problem).
Let X , Y ⊆ R d , let µ and ν be probability den-sities defined on X and Y, and let c ( x , y ) : X × Y → R be a measurable ground cost function.Define the set of transport plans Π ( µ, ν ) : = (cid:40) π ∈ P ( X × Y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π [ A × Y ] = µ [ A ] , π [ X × B ] = ν [ B ] , ∀ meas. A ⊆ X , B ⊆ Y (cid:41) , (1.1) where P ( X × Y ) is the set of probability measures on the product space, and define the primalcost function P : Π ( µ, ν ) → R asP ( π ) : = (cid:90) X × Y c ( x , y ) d π ( x , y ) . (1.2) The Monge-Kantorovich problem is to find the optimal primal cost P ∗ : = inf π ∈ Π ( µ, ν ) P ( π ) , (1.3) and an associated optimal transport plan π ∗ : = arg inf π ∈ Π ( µ, ν ) P ( π ) . (1.4)Kantorovich also identified the problem’s dual formulation. Definition 1.2 (Dual formulation).
Define the set of functions Φ c ( µ, ν ) : = (cid:40) ( ϕ, ψ ) ∈ L ( d µ ) × L ( d ν ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ϕ ( x ) + ψ ( y ) ≤ c ( x , y ) , d µ a.e. x ∈ X , d ν a.e. y ∈ Y (cid:41) . (1.5) Let the dual cost function, D : Φ c ( µ, ν ) → R , be defined asD ( ϕ, ψ ) : = (cid:90) X ϕ d µ + (cid:90) Y ψ d ν. (1.6) Then, the optimal dual cost is D ∗ : = sup ( ϕ, ψ ) ∈ Φ c ( µ, ν ) D ( ϕ, ψ ) , (1.7) and an optimal dual pair is given by ( ϕ ∗ , ψ ∗ ) : = arg sup ( ϕ, ψ ) ∈ Φ c ( µ, ν ) D ( ϕ, ψ ) . (1.8)2hen the ground cost is a distance function (often but not necessarily Euclidean), Monge-Kantorovich solutions are related to the Wasserstein metric , a distance between probability dis-tributions: W ( µ, ν ) : = inf π ∈ Π ( µ, ν ) (cid:90) X × Y c ( x , y ) d π ( x , y ) . (1.9)We have W ( µ, ν ) = P ∗ = D ∗ , and hence, we may refer to any of these as the Wassersteindistance , the optimal transport cost , or simply the optimal cost . Remark 1. W ( µ, ν ) is often written as W , with µ and ν implied. Furthermore, as Equa-tion (1.9) makes clear, W ( µ, ν ) also depends on the ground cost function c ( x , y ) . In the lit-erature, the Wasserstein distance formula often assumes the ground cost to be a specific prede-termined function, usually the Euclidean distance (cid:107) x − y (cid:107) . Definition 1.3 (Monge problem).
In certain cases, there exists at least one solution to the semi-discrete Monge-Kantorovich problem that does not split transported masses. In other words,there exists some π ∗ such that π ∗ ( x , y ) = π ∗ T ∗ ( x , y ) : = µ ( x ) δ [ y = T ∗ ( x )] , (1.10) where T ∗ : X → Y is a measurable map called optimal transport map . When such a π ∗ exists,we say the solution also solves the Monge problem.If the Monge-Kantorovich problem has a solution which solves the Monge problem, we canassume without loss of generality that every π ∈ Π ( µ, ν ) satisfies π ( x , y ) = π T ( x , y ) : = µ ( x ) δ [ y = T ( x )] , (1.11) for some measurable transport map T : X → Y, and that the primal cost can be writtenP ( π ) : = (cid:90) X c ( x , T ( x )) d µ ( x ) . (1.12) The semi-discrete optimal transport problem we consider is the Monge-Kantorovich problemof Definition 1.1, with restrictions on µ and ν , and c .(1) Assume that µ satisfies the following:(a) µ is absolutely continuous with respect to the Lebesgue measure.(b) The support of µ is contained in the convex compact region A ⊆ X .(Since A ⊂ R d , it must also be the case that A is simply connected.)(2) Assume ν has exactly n ≥ { y i } ni = ⊆ Y .(3) Assume c is a p -norm with p ∈ (1 , ∞ ).As we will show, each of these conditions is required for one or more of the theorems given inSection 3. Condition (1)(a) ensures that the value of µ is bounded, which is required to showWasserstein distance convergence in Theorem 3.25. Conditions (1)(a), (1)(b), (2), and (3) are allused to satisfy the conditions of Corollary 4 of [6], which we apply to show the µ -a.e. uniquenessof the solution in Theorem 3.7. See also [4, p. 207], a definition of the Wasserstein metric W p with p ∈ [0 , ∞ ). One can also write π ∗ T ∗ as (Id × T ∗ ) µ . Our notation is from [4, p. 3]. The alternative notation is used in [5]. See Theorem 3.6, below, for a full statement of this result. .2.1. Semi-discrete transport and the Monge problem Since µ is absolutely continuous, | S | = µ ( S ) = S in X . Hence, µ is nonatomic. Because c is continuous and µ is nonatomic, at least one solution to the semi-discrete Monge-Kantorovich problem also satisfies the Monge problem, described in Defini-tion 1.3; see Theorem B in [7]. Thus, by applying Equation (1.11), we can assume without lossof generality that any transport plan π partitions A into n sets A i , where A i is the set of points in A that are transported by the map T to y i . Using this partitioning scheme in combination withEquation (1.12) allows us to rewrite the primal cost function for the semi-discrete problem as P ( π ) : = n (cid:88) i = (cid:90) A i c ( x , y i ) d µ ( x ) . (1.13) Using this idea of sets A i , we are ready to describe the shift characterization of the semi-discrete optimal transport problem. The definition of the characterization, which follows, isbased on one given by R¨uschendorf and Uckelmann in [8, 9]. Definition 1.4 (Shift characterization).
Let { a i } ni = be a set of n finite values, referred to as shifts . Define F ( x ) : = max ≤ i ≤ n { a i − c ( x , y i ) } . (1.14) For i ∈ N n , where N n = { , . . . , n } , letA i : = { x ∈ A | F ( x ) = a i − c ( x , y i ) } . (1.15) Note that ∪ ni = A i = A. The problem of determining an optimal transport plan π ∗ is equivalent todetermining shifts { a i } ni = such that for all i ∈ N n , the total mass transported from A i to y i equals ν ( y i ) . The shift characterization is derived from the dual cost function given in Equation (1.6). Forany D ( ϕ, ψ ), suppose we define ϕ (cid:48) ( x ) = sup y ∈ Y { ψ ( y ) − c ( x , y ) } . (1.16)Then D ( ϕ (cid:48) , ψ ) ≥ D ( ϕ, ψ ) for all ψ .For the semidiscrete problem, ϕ (cid:48) is exactly Equation (1.14), and the shifts a i correspond tothe value of ψ at each Dirac mass y i . Hence, the discrete problem is no more than a special caseof the general continuous problem where µ is a continuous density function and ν an empiricalmeasure. For a detailed derivation, see [5].In the same way, the sets A i correspond to the subdi ff erentials ∂ c ( y i ). For a general cost func-tion c , the sets A i are referred to in analysis as Laguerre cells , and the map generated by the sets A i over A is called a Laguerre diagram . As we will discuss further on, the boundaries betweenLaguerre cells are typically sections of hypersurfaces. When c ( x , y ) = (cid:107) y − x (cid:107) , the boundariesare sections of hyperplanes, and the map as called a power diagram . See [10] for a detailedevaluation of this special case. There are also cost functions where, for certain arrangements of { y i } , the boundaries between Laguerre cells have positive Lebesgue measure in R d . An exampleis shown in Figure 4(c). 4 .4. Numerical approaches to the MK problem Applications of optimal transport are found in many areas of research, including medicine,economics, image processing, machine learning, physics, and many others; e.g., see [11, 12, 13,14, 15]. For that reason, many people have focused their research on numerical methods for theMonge-Kantorovich problem.The solution to a semi-discrete problem can be approximated by treating the problem as fullydiscrete, and the solution to a fully continuous problem can be approximated by treating it aseither semi- or fully discrete. By “treating,” we refer primarily to assumptions about continuity:in practice, nearly every approach fully discretizes the problem, and the complexity of suchapproaches is relative to the measure of the discretization.The semi-discrete problem has received significant attention in its role as a discretization ofthe continuous problem (where continuity assumptions are employed over X but not Y ). Substan-tial e ff ort has been taken to quantify the extent to which solutions to such semi-discrete problemsapproximate the solution to the original continuous problem; for example, see [16]. However, thesemi-discrete problem has interesting applications in its own right. Recent developments includeworks in economics [17, 18, 19], image processing [20], and optics [21, 22]. In addition, thepower and flexibility of Laguerre cell tesselation (vs. Voronoi) drive ongoing research in physicsand other fields.When the ground cost for the semi-discrete problem is the squared 2-norm, (cid:107)·(cid:107) , signifi-cant numerical progress has been achieved. In 1988, Oliker and Prussner introduced what cameto be called the Oliker-Prussner algorithm for nonlinear Monge-Amp`ere-type equations in R ;see [23]. Oliker and Prussner were significantly ahead of their time. A 1992 paper by Aurenham-mer et al., [24], while describing a di ff erent algorithm (Newton’s method), explicitly connectedthe Oliker and Prussner’s approach to semi-discrete transport and its resulting “Voronoi-type di-agrams.” In 1998 Aurenhammer et al. published [25], a revision that clarified important details,and incorporated an argument from [6] to guarantee that the sets A i partition A µ -a.e. More recentalgorithms appear in [26, 16].When sets A i and A j share a boundary, for some i (cid:44) j , there is a monotone relationshipbetween the volume of A i and the di ff erence of shifts, a i − a j . The Oliker-Prussner approachand the boundary method both exploit this relationship, though in very di ff erent ways. Whetherapplying the Oliker-Prussner algorithm or some variation such as Newton’s method, the Oliker-Prussner approach begins with approximated sets ˜ A i , and directly perturbs the approximated shiftdi ff erence ˜ a i − ˜ a j in order to bring µ ( ˜ A i ) closer to ν ( y i ). This approach is extended over all the shiftdi ff erences, making it, in essence, a method for solving the Monge-Kantorovich dual problemwith c = (cid:107)·(cid:107) . Because the squared 2-norm is strictly convex, and it ensures that the boundary foreach adjacent A i and A j is a hyperplane, algorithms based on the Oliker-Prussner approach aregenerally able to quantify convergence behavior and guarantee termination after a finite numberof refinement steps.Numerous e ff orts have been made to extend the approach proposed by Oliker and Prussner.An application-focused paper by Ca ff arelli et al. extends the Oliker-Prussner algorithm to R ,assuming special geometries [27]. L´evy presents a parallelized Newton’s method for three di-mensions, one which scales well when Y consists of large numbers of Dirac masses [28]. Otherworks, such as [29], attempt to integrate the Oliker-Prussner approach with the Wide Stencil They refer to a set of shift di ff erences { a i − a j | i , j ∈ N n , i < j } as a weight vector . c = (cid:107)·(cid:107) .A few authors have attempted to develop approaches for ground costs other than the squared2-norm. Most of these do not employ Oliker-Prussner. In [9], R¨uschendorf and Uckelmannreport on numerical experiments with ground costs given by the Euclidean distance taken to thepowers 2, 3, 4, and 10. They assume that µ is the uniform distribution, and test various weightsand placements for the set { y i } ni = . When an exact solution cannot be directly determined, theyfully discretize the problem and use a linear programming solver.In [32], Schmitzer works with cost functions c = (cid:107)·(cid:107) p for p ∈ (1 , ∞ ), and applies a formof adaptive scaling done by “shielding” regions: his method attempts to determine points ofinfluence in order to solve primarily local problems. He restricts his examples to R .Solving the semi-discrete problem for the 2-norm is discussed in [33]. Starting with analternative form of Equation (1.17), taken from [34], Barrett and Prigozhin develop a mixedformulation of the Monge-Kantorovich problem, which they solve using a standard finite elementdiscretization.Kitagawa’s 2014 paper, [35], o ff ers a potentially broad generalization of the Oliker-Prussneralgorithm, which works for ground costs other than (cid:107)·(cid:107) , provided those ground costs satisfy strictconditions, including Strong Ma-Trudinger-Wang; see also [36]. His proposals, while denselytheoretical, do not include numerics or an explicit iterative scheme.As [26] states, the special case c = (cid:107)·(cid:107) has two methods specifically designed for solvingsemi-discrete problems directly: the Oliker-Prussner algorithm and the damped Newton methodsproposed in papers like [25]. Both rely on some variant of what we call the Oliker-Prussnerapproach, described above. However, approaches developed for fully discrete or continuoustransport can also be applied to the semi-discrete problems, though with varying degrees ofe ff ectiveness. R¨uschendorf and Uckelmann apply a discrete linear program solver in [9], and thesolver Barrett and Prigozhin use in [33] was developed for continuous transport.Discrete methods assume a fully discrete ( X , µ ) and ( Y , ν ), and solve the resulting minimiza-tion problem using network flow minimization techniques. As described in [37], there are over20 established methods for solving such problems, and at least seven software packages capableof handling one or more of these methods.Most approaches to the fully continuous Monge-Kantorovich problem assume specific groundcosts and solve using techniques developed for elliptic partial di ff erential equations, particularlythose of the Monge-Amp`ere-type: − ∇ · ( a ∇ u ) = f , where |∇ u | ≤ , a ≥ , and |∇ u | < = ⇒ a = . (1.17)If the ground cost function is strictly convex, or otherwise satisfies the Ma-Trudinger-Wang reg-ularity conditions described in [36], such problems are well-posed. To date, the requirementsof well-posedness have largely restricted the application of such continuous methods to well-behaved cost functions such as (cid:107)·(cid:107) or a regularized Euclidean distance. Continuous methodscurrently in use apply finite di ff erence, gradient descent, or the iterative Bregman projections(a.k.a. Sinkhorn-Knopp) algorithm, all attempting to map X to a fully discretized Y [38, 39, 40].As we will show, the boundary method o ff ers a new approach to solving semi-discrete trans-port, distinct from all of those described above. By and large, the solution methods describedabove only work for a specific fixed cost, usually c = (cid:107)·(cid:107) . The boundary method quickly solves In [33], the partition of A is called an “optimal coupling.” p -norm, with p ∈ (1 , ∞ ),the boundary method provides a global rate of convergence that is proportional to the volume of A .
2. Boundary Method
At a high level, the idea behind the boundary method is simple: track only the boundariesbetween regions, without resolving the regions’ interiors. To do this in practice and obtain ane ffi cient technique, we must account for the interplay between discretization, a mechanism fordiscarding interior regions, and a fast solver.At its heart, the boundary method can be viewed as an adaptive refinement technique, onewhich focuses on the shared region boundaries. The method discards interior regions, but awell-chosen initial discretization prevents any corresponding loss of accuracy. The boundarymethod’s strategy progressively refines the boundaries between individual regions A i . Thus, bythe method’s very nature, any initial configuration must enclose the boundary in a way that allowsit to be distinguished from the region interiors. The necessary conditions for a well-chosen initialdiscretization are presented in Theorem 3.21 and discussed in detail in Remark 5. For all i , j ∈ N n such that i (cid:44) j , let A i j : = A i ∩ A j . (2.1)The boundary set is defined as B : = (cid:91) ≤ i < n (cid:91) i < j ≤ n A i j , (2.2)and for each i ∈ N n , let the strict interior of A i be defined as˚ A i : = A i \ B . (2.3)For all i , j ∈ N n such that i (cid:44) j , define g i j : X → R as g i j ( x ) : = c ( x , y i ) − c ( x , y j ) . (2.4)By Corollary 3.11 below, B (cid:44) ∅ and for each x ∈ B there exist i , j ∈ N n , i (cid:44) j , such that x ∈ A i j .Because x ∈ A i , we have F ( x ) = a i − c ( x , y i ), and because x ∈ A j , we have F ( x ) = a j − c ( x , y j ).Combining and rearranging these, we get g i j ( x ) = a i − a j , ∀ x ∈ A i j . (2.5)Thus, Equation (2.5) implies that A i j is a subset of a level set of g i j ; the value a i − a j is constant,regardless of which x ∈ A i j is chosen. Using this information, for each i , j ∈ N n , i (cid:44) j , such that A i j (cid:44) ∅ , we can define the constant shift di ff erencea i j : = g i j ( x i j ) ∀ x i j ∈ A i j . (2.6)Given a su ffi ciently large set of linearly independent equations of the form given in Equa-tion (2.6), one could determine most or all of the shifts { a i } ni = . As we show in Theorem 3.13, it7s possible to obtain exactly ( n −
1) linearly independent equations of the desired form, but a setof n such independent equations does not exist.Since we know that the set of shifts allows exactly one degree of freedom, the boundarymethod’s approach is to obtain ( n −
1) well-chosen a i j values, fix one a i , and use linearly inde-pendent equations of the form given in Equation (2.5) to solve for the remaining ( n −
1) shifts.The crucial observation is that for the a i ’s, there is no need to retain information about interior ofthe regions.The Wasserstein distance can also be computed without saving region interiors. Once wehave determined that R ⊂ A i for some region R , the (partial) Wasserstein distance correspondingto R is equal to P R : = (cid:90) R c ( x , y i ) d µ ( x ) , (2.7)and the total Wasserstein distance P ∗ is equal to the sum of all such partial distances P R , computedover every A i .Recognizing these facts, inherent in the shift characterization, inspired both the boundarymethod’s name and its guiding principles, summarized below:Do not solve for the entire transport plan;rather, identify region boundaries.To illustrate how this principle is implemented, we present the following example. Example 2.1.
Let X = Y = [0 , . Assume µ is the uniform probability density, so for all Borelsets S ⊆ A, µ ( S ) = | S | , and that ν has uniform discrete probability density, so ν ( y i ) = / n for ≤ i ≤ n. Take n = , with the five points where ν has nonzero density distributed as shown inFigure 1.Let c be the squared Euclidean norm, (cid:107) y − x (cid:107) . Suppose a discretization with width − issu ffi cient to provide the desired accuracy and that we apply the boundary method with initialwidth − .Assume (cid:101) P is the partial transport cost : the sum cost of transport over all regions P R so far,where P R is defined as in Equation (2.7) . Each iteration consists of two steps. In Step (1), wediscretize the remaining parts of A using the given width, and we solve the discrete transportproblem. In Step (2), we compute the transport cost of all boxes in the interior of each region,add those costs to (cid:101) P, and discard the computed boxes. For the discard, remove the transportedmass from ν , and remove the transported boxes from A (so those regions can be safely ignoredduring any future discretized transport computations).Figure 1 shows the state of the boundary method during the first iteration. In Figure 1(a),we have just completed Step (1): the discrete transport map has been computed, but we have notidentified interior points or added anything to the partial transport cost (cid:101) P. Figure 1(b) shows thestate of the algorithm after Step (2): the interior regions have been identified (shown in gray),the partial transport cost has been computed for those regions, giving us (cid:101) P = . , and thoseregions have been discarded.Figure 2 shows the state of the boundary method algorithm during the second iteration. Here,the regions eliminated in Iteration 1 are shown in a darker gray, to distinguish new interiors fromthose previously removed. In Figure 2(a), Step (1) has just been completed. As can be seen bycomparing Figure 1(b) to Figure 2(a), the boundary and interior regions are the same onesthat we had at the end of the first iteration, but refining the boundary set to width w = − y y y y (a) Iteration 1, Step (1): (cid:101) P = . y y y y y (b) Iteration 1, Step (2): (cid:101) P = . w = − , computed regions in gray allows us to compute a more refined transport map. Since the regions in gray were discardedat the end of Iteration 1 Step (2), they are not part of the discrete transport solution computedduring Iteration 2. Because Step (1) does not add to the identified interior regions, the partialWasserstein distance (cid:101) P is also unchanged from Figure 1(b).After Step (2) of the second iteration, shown in Figure 2(b), more of the interiors havebeen identified. The partial transport cost shows a corresponding increase: we now have (cid:101) P = . . Because we have achieved our desired refinement, a width of − , we end theiterative process.We have not computed any transport cost for the white areas remaining in Figure 2(b). Hence, (cid:101) P is strictly less than the actual transport cost P ∗ . We may want to perform further computationson those white areas in order to approximate the remaining transport cost and calculate an errorbound for our approximation. y y y y y (a) Iteration 2, Step (1): (cid:101) P = . y y y y y (b) Iteration 2, Step (2): (cid:101) P = . w = − , computed regions in gray We will now formalize the process described in Example 2.1. As described below, the bound-ary method generates a grid A r over the unevaluated region of A , and uses it to determine the9ubgrid B r containing the boundary set B . This subgrid is determined by finding an optimaltransport solution from the grid A r to the point set { y i } ni = .Although not strictly necessary, we will restrict ourselves to A = [0 , l ] d and apply a Cartesiangrid over that region. At the r -th refinement level of the algorithm, the grid will thus consist ofa collection of boxes with width w r in each dimension of our discretization. By a slight abuseof notation, we use x r to refer to such a box, centered at the point x . Thus, µ ( x r ) refers to the µ -measure of the box of width w r centered at x . Neighboring boxes are those with center points that di ff er by no more than one unit in anydiscretization index. The set of neighbors of x is denoted N ( x ) (defined in Equation (3.11),below). Because regions of µ -measure zero need not be transported to any particular y i , boxes ofpositive weight that are adjacent to such regions are always retained. We refer to such a box asan edge box . Thus, the set of edge boxes isedg( A r ) : = { x ∈ A r | µ ( x ) > ∃ x n ∈ N ( x ) such that µ ( x n ) = } . (2.8)Because A contains the support of µ , every box of positive mass that is adjacent to the boundaryof A is an edge box.A box whose neighbors and itself all have positive measure is referred to as an internal box .The set of internal boxes isint( A r ) : = { x ∈ A r | µ ( x ) > µ ( x n ) > x n ∈ N ( x ) } . (2.9)Boxes of µ -measure zero are not part of edg( A r ) or int( A r ) and they are discarded when theoptimal transport problem is solved. We need not be concerned about losing a region A i due tothis discard process, since this would imply µ ( A i ) = ν ( y i ) =
0, which contradictsthe conditions in Section 1.2).Region interiors are identified by comparing the destination of each x ∈ int( A r ) to the desti-nations of its neighbors. Edge boxes are never considered part of a region interior, so they arepassed directly to B r .In order to remove identified region interiors, we also maintain a running total of the untrans-ported mass, given by partial measure ˜ ν . To preserve the balance of the transport problem, eachtime a region x r is transported from A to y i , the remaining amount that can be transported to y i ,˜ ν ( y i ), must be reduced by µ ( x r ).We can approximate the Wasserstein distance P ∗ by generating a running total over regioninteriors: (cid:101) P . This (cid:101) P is an increasing function of r , and for all r , P ∗ ≥ (cid:101) P . The Wasserstein distanceover any remaining boundary region is evaluated at completion. Remark 2.
Further approximations may be required for a truly general algorithm. Dependingon µ , it may be necessary to approximate the mass of each box, µ ( x r ) . Depending on µ and c, theWasserstein distance over each box, given by (cid:90) x r c ( z , y i ) d µ ( z ) , may also require approximation.However, in this work we assume that the integrals can be computed exactly. In practice, this isnot a significant limitation. Most numerical applications focus on the exactly-computable caseswhere µ is uniform and c is the Euclidean or squared-Euclidean distance. Furthermore, as weshow in Section 4.1, the set of exactly-computable options is quite large.2.2.1. Step (1): solving the discrete optimal transport problem The proofs in Section 3 assume the discrete solver is exact, but in practice we achieve goodresults using solvers whose error satisfies reasonable bounds. Thus, the ideal discrete algorithm10 oundary method algorithm (0) Set (cid:101) P =
0, ˜ ν = ν , and r =
1. Create A r = A from A .(1) Solve the discretized transport solution.(2) For each x ∈ int( A r ):Are the neighbors of x all transported to the same y i ? • If so, then x r is in the interior of A i : – [optional] Add (cid:90) x r c ( z , y i ) d µ ( z ) to (cid:101) P . – Reduce the value of ˜ ν ( y i ) by µ ( x r ). – Remove x from int( A r ).The sets edg( A r ) and the reduced set int( A r ) combine to form B r .(3) Is the desired refinement reached? • If not: – Refine B r to create A r + , increment r , and go to Step (1).Optionally, once the desired refinement level is reached:(4) Use B r to identify ( n −
1) appropriate shift di ff erences { a i j } and solve for the shifts { a i } ni = .(5) Use (cid:101) P and B r to approximate W ( µ, ν ).should be fast , have controlled error, and possess reasonable scaling properties. To satisfy theserequirements, and to bypass the shortcomings of standard discrete approaches, we have turned tothe distributed relaxation methods known as auction algorithms; see [41] and [42]. (As it turnsout, there are natural connections between auction algorithms and the Oliker-Prussner algorithmfor semi-discrete transport; see [43] for details).We chose to apply a new auction algorithm, the general auction , which we developed andpresented in [44]. The general auction is so named because it is based directly on the (moregeneral) real-valued transport problem, rather than the integer-valued assignment problem whichforms the foundation of other auction algorithms. As described in [44], it o ff ers significant per-formance advantages over other auction algorithms. Public domain C ++ software implementingthe general auction can be found on the Internet at [45]. Once we have reached a desired level of refinement for the boundary, we can use the set B r to identify ( n −
1) shift di ff erences a i j . Finding the shift di ff erences is not necessary oncewe have the boundary (which is why Step (4) is optional), but the shift di ff erences allow one toreconstruct the entire transport map.By completing Step (4), one can reduce the transport map in R d to a set of n real numbers a i , greatly reducing storage requirements. Also, building the reconstructed transport map, andcomparing the value of each µ ( A i ) to its corresponding ν ( y i ), e ff ectively evaluates the actual (vs.worst case) error associated with the boundary method’s solution.It is also worth considering that the exact shifts { a i } ni = correspond to a transport map givingthe exact optimal solution of our semi-discrete problem. The approximated shifts { ˜ a i } ni = , unlessgenerating the same shift di ff erences, correspond to a transport map giving the exact optimalsolution to a di ff erent semi-discrete problem, one whose measure ν at each y i , i ∈ N n , correspondsto the value of µ ( ˜ A i ). Hence, | µ ( ˜ A i ) − ν ( y i ) | is the error in measure when approximating A i by ˜ A i .11 .2.3. Step (5): approximating the Wasserstein distance Because some applications focus on determining the transport map, rather than the Wasser-stein distance, Step (5) is optional. One could also skip the computation of (cid:101) P in Step (2), sincethe Wasserstein distance can be computed in full using only the transport map defined by theboundary set. However, we find it convenient to compute as much of the distance as possiblewithin the boundary method algorithm, establishing (cid:101) P one box at a time during Step (2). Bythe time we reach Step (5), the partial Wasserstein distance (cid:101) P includes the exact cost of all theidentified interior regions, and all that remains is to determine the cost of the regions associatedwith B r .
3. Mathematical support
In this section, we provide mathematical support for the boundary method, assuming that allcomputations are solved exactly: both the discrete optimal transport problems handled by thegeneral auction and the determinations of mass and Wasserstein distance for individual boxes(see Remark 2). We present three types of results: on the shift characterization, on our system ofequations, and, finally, on the boundary method itself.
Here we examine the features of the shift characterization, defined in Section 1.3, and con-sider what they can tell us about the semi-discrete optimal transport problem itself. While manyof these results can be found in other works (e.g., [5]), detailing them fixes notation and sets thestage for the original theorems developed in the following sections.First, in Lemmas 3.1 and 3.2, we develop theoretical support for the boundary method.
Lemma 3.1.
Let a i and A i be defined as in Definition 1.4. Fix i ∈ N n . If x ∈ A i and j ∈ N n , j (cid:44) i,then the following hold: g i j ( x ) ≤ a i − a j , (3.1) g i j ( x ) = a i − a j ⇐⇒ x ∈ A i j , and (3.2) g i j ( x ) < a i − a j ⇐⇒ x ∈ A i \ A j , (3.3) where g i j is defined in Equation (2.5) and A i j in Equation (2.1) . P roof . Let us show Equation (3.1). By the definitions of A i and F , a i − c ( x , y i ) = F ( x ) ≥ a j − c ( x , y j ) . Rearranging terms gives c ( x , y i ) − c ( x , y j ) ≤ a i − a j . To show Equation (3.2), first note that Section 2.1 already explains how x ∈ A i j implies g i j ( x ) = a i − a j . Consider the converse: Assume that g i j ( x ) = a i − a j . Rewriting, we find that a j − c ( x , y j ) = a i − c ( x , y i ) = F ( x ), with F defined in Equation (1.14). This implies x ∈ A j , andsince x ∈ A i , therefore x ∈ A i j . Equation (3.3) is a consequence of Equations (3.1) and (3.2). Lemma 3.2.
Let a i and A i be defined as in Definition 1.4 and A i j as in Equation (2.1) . Assumec satisfies the triangle inequality. For all i , j ∈ N n , i (cid:44) j, a) If c ( y i , y j ) = a i − a j , then A j ⊆ A i j .(b) If c ( y i , y j ) < a i − a j , then A j = ∅ . P roof . For Part (a), because c satisfies the triangle inequality, for all x ∈ A , c ( x , y i ) ≤ c ( x , y j ) + c ( y i , y j ) c ( x , y i ) ≤ c ( x , y j ) + a i − a j a j − c ( x , y j ) ≤ a i − c ( x , y i ) . (3.4)Suppose x ∈ A j . Then a i − c ( x , y i ) ≥ a j − c ( x , y j ) = F ( x ), by Equation (1.14). Because F isdefined as the maximum such di ff erence, this implies a i − c ( x , y i ) = F ( x ), and so x ∈ A i . Further,since x is an element of A i and A j , x ∈ A i j . Therefore, A j ⊆ A i j .To show (b), note that (3.4) now gives a j − c ( x , y j ) < a i − c ( x , y i ). Hence, for all x ∈ A , F ( x ) ≥ a i − c ( x , y i ) > a j − c ( x , y j ). Therefore, A j = ∅ . Lemma 3.3.
Let F ( x ) be defined by Equation (1.14) . If the ground cost function c ( x , y ) is con-tinuous on X × Y, then F ( x ) is a continuous function of x . P roof . Assume c is defined as a continuous function in X × Y . Thus, for all i ∈ N n , a i − c ( x , y i )is a continuous function of x . Since F is the maximum of a finite set of continuous functions, F is itself a continuous function of x . Definition 3.4 ( F induces a µ -partition of A ). Let F be as defined in Equation (1.14) , and thesets A i as defined in Equation (1.15) for i ∈ N n . Then one says F induces a µ -partition of the setA if1. µ ( A ) < ∞ ,2. for all i , j ∈ N n , i (cid:44) j, µ ( A i j ) = (for A i j as defined in Equation (2.1) ),3. (cid:80) ni = µ ( A i ) = µ ( A ) , and4. for all i ∈ N n , µ ( A i ) = ν ( y i ) > . Lemma 3.5.
Suppose one has a semi-discrete transport problem, as described in Section 1.2.Let F be as defined in Equation (1.14) , the sets A i as defined in Equation (1.15) for i ∈ N n , andB as defined in Equation (2.2) . Then F induces a µ -partition of A if and only if µ ( B ) = . P roof . If F induces a µ -partition of A , by Definition 3.4, µ ( B ) =
0. For the converse, assume F and the sets A i are defined as given, and let A i j be defined by Equation (2.1). Because µ isa probability density function, µ ( A ) = < ∞ . Because µ is a non-negative measure, µ ( B ) = i , j ∈ N n , i (cid:44) j , µ ( A i j ) = µ -measurable set S ⊆ X , S = S ∪ S , µ ( S ) + µ ( S ) = µ ( S ) + µ ( S ∩ S ) , and since µ ( X ) < ∞ , µ ( S ) = µ ( S ) + µ ( S ) − µ ( S ∩ S ) . Proceeding inductively, it follows that S = n (cid:91) i = S i , all µ -measurable = ⇒ µ ( S ) = n (cid:88) i = µ ( S i ) − n (cid:88) i = n (cid:88) j = j (cid:44) i µ ( S i ∩ S j ) . = µ ( A ) = n (cid:88) i = µ ( A i ) − n (cid:88) i = n (cid:88) j = j (cid:44) i µ ( A i j ) = n (cid:88) i = µ ( A i ) . For all i , j ∈ N n , i (cid:44) j , µ ( A i ∩ A j ) =
0, and therefore µ ( A i ) = ν ( y i ). Remark 3.
Instances of µ ( B ) > appear quite often, though (as we will show) µ ( B ) = forthe p-norm cost functions we have assumed. For an example of µ ( B ) > in the literature, seeFigure 37 of [46]. We include a nearly identical example as Figure 4(c) of our paper, along witha discussion of this behavior. Given our definition of the semi-discrete problem in Section 1.2, Corollary 4 of [6] providesa su ffi cient condition for the existence of a Monge solution that is unique µ -a.e. For convenience,we restate their conclusion here, as the following: Theorem 3.6.
Given the definition of g i j in Equation (2.5) , suppose that the support of ν is finite,c is continuous, µ is tight, and µ (cid:16) { x ∈ A | g i j ( x ) = k } (cid:17) = ∀ i , j ∈ N n , i (cid:44) j , ∀ k ∈ R . (3.5) Then there exists an optimal transport map, T : X → Y, that solves the Monge problem, and Tis µ -a.e. unique. This condition leads directly to the following theorem.
Theorem 3.7.
A semi-discrete transport problem, as described in Section 1.2, has an associatedtransport map T , a function F, as described in Equation (1.14) , and sets { A i } ni = , as described inEquation (1.15) , such that for all x ∈ A, x ∈ ˚ A i for some i ∈ N n = ⇒ T ( x ) = y i , (3.6) where ˚ A i is the strict interior of A i , as defined in Equation (2.3) . In other words, F induces a µ -partition of A and T agrees with F on A \ B. Furthermore, T is unique µ -a.e. P roof . Let A i j be defined as given in Equation (2.1), B as given in Equation (2.2), and g i j asin Equation (2.5). Consider the requirements given in Section 1.2. Condition (2) ensures that ν is finite, and Condition (3) implies c is continuous. We know that A ⊆ R d , so A is a Polishspace, and Condition (1)(b) assures us that A is compact. Because every probability measure ona compact Polish space is tight , µ must be tight. Because Condition (3) requires that the groundcost is equal to a p -norm with p ∈ (1 , ∞ ), (cid:12)(cid:12)(cid:12) { x ∈ A | g i j ( x ) = k } (cid:12)(cid:12)(cid:12) = ∀ i , j ∈ N n , i (cid:44) j , ∀ k ∈ R . (3.7)By Condition (1)(a), µ is absolutely continuous, and so µ (cid:16) { x ∈ A | g i j ( x ) = k } (cid:17) = ∀ i , j ∈ N n , i (cid:44) j , ∀ k ∈ R , see e.g. Theorem 3.2 of [47, p. 29]
14s required by Equation (3.5) (see [48] for another argument). Therefore, the conditions ofTheorem 3.6 are satisfied.Let the function F and sets { A i } ni = be as described in Definition 1.4. For any i , j ∈ N n , i (cid:44) j , A i j ⊆ { x ∈ A | g i j ( x ) = k } for some fixed k ∈ R . Hence, it follows that µ ( B ) =
0, and thus, byLemma 3.5, F induces a µ -partition of A . Therefore, we can construct a transport plan T thatsatisfies the semi-discrete problem and agrees with F on A \ B . Furthermore, by Theorem 3.6, T is unique µ -a.e. Remark 4.
A close reading of the text of Theorem 3.7 reveals that the guarantee of µ ( B ) = derives directly from the fact that | B | = ; see Equation (3.7) . Absolute convergence does therest. In practice, this means that the boundary method forces a unique transport map on allof A, even regions where µ vanishes and any other map would achieve the same Wassersteinmeasure. For an example of this, see Figure 6. This behavior stems from a natural (unstated)corollary to Theorem 3.7: the boundaries identified by our method are a.e. unique with respect tothe Lebesgue measure. The convexity of A is required to guarantee the existence of the requisiteboundaries. Otherwise, the network of regions might not form a connected graph.3.2. Existence of linearly independent boundary equations To prove the existence of ( n −
1) linearly independent equations of the form shown in Equa-tion (2.6), we will investigate the structure of the boundary set using a connected graph. Definition 3.8.
Assume the definition of A i j given in Equation (2.1) . Let G be a graph withn vertices v , . . . , v n . The edge ( v i , v j ) is contained in the edge set of G if and only if A i j isnon-empty. We refer to G as the adjacency graph of our transport problem. Lemma 3.9.
Let G be defined as given in Definition 3.8. If the set A is convex and compact, thenG is a connected graph. P roof . Assume to the contrary that G is not a connected graph. Then we can write G as theunion of two disjoint nonempty subgraphs, G = G ∪ G , such that no vertex v in G has a pathconnecting it to any vertex v in G .Construct ˜ A : = (cid:91) v i ∈ G A i and ˜ A : = (cid:91) v j ∈ G A j , where each subset is defined as in Equation (1.15). Since G (cid:44) ∅ and G (cid:44) ∅ , ˜ A (cid:44) ∅ and˜ A (cid:44) ∅ . Because G and G are disjoint, and no paths connect them, it follows that ˜ A ∩ ˜ A = ∅ .Since the union of G and G is G , ˜ A ∪ ˜ A = A .Suppose A i ⊆ ˜ A , A j ⊆ ˜ A . Then A i j = ∅ . Because A is a compact set, A is a closed andbounded, and hence the definition given in Equation (1.15) implies that A i and A j must each alsobe closed and bounded. Thus, A i and A j are disjoint compact sets in the Hausdor ff space R d .This implies A i and A j are separated by some positive distance (cid:15) i j . Because this is true for all A i ⊆ ˜ A and A j ⊆ ˜ A , there exists (cid:15) >
0, the minimum over all such (cid:15) i j .Let x ∈ ˜ A , x ∈ ˜ A , and for all t ∈ [0 , x t = (1 − t ) x + t x . For a di ff erent approach, where the cost is the squared-Euclidean distance, see [10]. (cid:15) >
0, there exists ( t , t ) ⊆ [0 , | t − t | ≥ (cid:15) , such that t ∈ ( t , t ) implies x t (cid:60) ˜ A ∪ ˜ A = A . This contradicts the convexity of A . Hence, G is connected. Corollary 3.10.
Assume n ≥ and let A i j be defined by Equation (2.1) . If i ∈ N n , there existsj ∈ N n , such that j (cid:44) i and A i j (cid:44) ∅ . P roof . Assume the contrary for some i , and apply Definition 3.8. Since n ≥ G includes atleast two vertices, and v i is disconnected from the rest of G , which contradicts Lemma 3.9. Corollary 3.11.
Let A i j be defined by Equation (2.1) and B by Equation (2.2) . If n ≥ , then theboundary set B is nonempty, and for each x ∈ B, there exist i , j ∈ N n such that i (cid:44) j and x ∈ A i j . P roof . This follows from Corollary 3.10 and the definition of B in Equation (2.2). Lemma 3.12.
Assume a shift characterization, as described in Definition 1.4, where n ≥ andthe shifts { a i } ni = are unknown. Let G be the adjacency graph of the transport problem given inDefinition 3.8, and let H be a subgraph of G that includes all n vertices. Define the system ofequations S : = { a i − a j = a i j | ( v i , v j ) ∈ the edge set of H } , (3.8) where each a i j is given by some constant. The system of equations S is linearly independent withrespect to the shifts { a i } ni = if and only if H contains no cycles. P roof . ( = ⇒ ) Suppose H contains the cycle ( v i , v i , . . . , v i k , v i ). Then S contains the linearsystem M a i ... a i k − a i k = a i i ... a i k − i k a i k i , where M = − . . . − − . Because det( M ) =
0, we know S is linearly dependent.( ⇐ = ) Suppose instead that S is linearly dependent. Given the form of the equations in S , wecan assume without loss of generality that S contains the equations a i j i j + = a i j − a i j + , ∀ j ∈ N k − ,and that a i i k = a i − a i k is also in S . By the definition of S , these equations imply that the edges( v , v ) , ( v , v ) , . . . , ( v k − , v k ), and ( v k , v ) are contained in H . Together, these edges generatethe cycle ( v i , v i , . . . , v i k , v i ), so H contains at least one cycle. Theorem 3.13.
Assume a shift characterized problem, as described in Definition 1.4, wheren ≥ and the shifts { a i } ni = are unknown. Then there exists at least one system of exactly ( n − equations of the form a i − a j = a i j that is linearly independent with respect to the set of shifts { a i } ni = , with each a i j constant. No system of n independent equations exists. P roof . Let G be as given in Definition 3.8. Because G is a connected graph, we can always createa spanning tree H that is a subgraph of G . Let S be the corresponding set of linear equations,defined as described in (3.8). As a spanning tree, H contains ( n −
1) edges and H has no cycles,so by Lemma 3.12, we know S contains exactly ( n −
1) linearly independent equations.Suppose a set S of n linearly independent equations exists, all of the form a i − a j = a i j .Because there are n unknowns in the set of shifts, there is exactly one solution set { a i } ni = . Fix σ (cid:44) i ∈ N n , define ˜ a i = a i + σ . For each equation in S , ˜ a i − ˜ a j = a i − a j = a i j . Thus, { ˜ a i } ni = is also a solution to S . This contradicts the uniqueness of { a i } ni = , and therefore no such setof n linearly independent equations exists. 16 .3. Discretization for the boundary method In the first two subsections below, we give some results on how the grid-points interact withthe underlying space. In sections 3.3.3 and 3.3.4 we present error bounds. In section 3.3.5 weconsider issues of volume and containment: here we ensure that one can have B ⊆ ¯ B r for all r , and show that | ¯ B r | → r → ∞ . Finally, Section 3.3.6 puts bounds on the error for theWasserstein distance approximation. As described in Section 2.2, we discretize the region A using a regular Cartesian grid, andrefine the grid over multiple iterations, with the aim of refining only the grid region containingthe boundary set. Definition 3.14.
Let V be the set of adjacency vectors for all discretizations of A. We choose V to be the linear combinations of the standard unit vectors, e , . . . , e d , with coe ffi cients ± . Wespecifically exclude the zero vector from the set, so |V| = d − . If d = , V equals V : = { ( − , − , (0 , − , ( − , , ( − , , (1 , − , (1 , , (0 , , (1 , } . (3.9)Let r ∈ N be the current discretization level, and w = w r be the width of the discretizationat level r . Let A r be the r -th point set , the set of points x included in the r -th discretization of A . Since we discard boxes of µ -measure zero during the transport step, assume without loss ofgenerality that µ ( x r ) > x ∈ A r .For each iteration r , let A ri = A i ∩ A r . (3.10)for all i ∈ N n . For all x ∈ A r , the points in A r that are adjacent to x constitute a subset of the neighbors of x , N ( x ) : = { x + w r v | v ∈ V} . (3.11) Lemma 3.15.
Let A r be the set of points included in the r-th discretization of A, and assume thedefinition of N given in Equation (3.11) . For all x , x ∈ A r , if x ∈ N ( x ) , then x ∈ N ( x ) . P roof . This follows from Equation (3.11) and the adjacency vectors established in Definition 3.14:for all k ∈ N d , e k ∈ V ⇐⇒ − e k ∈ V .We now formalize our idea of the r -th interior and boundary point sets used in our discretiza-tion. For all i ∈ N n , define the r -th iteration interior point set associated with A i as˚ A ri : = { x ∈ A ri | ∀ v ∈ V , x + w r v ∈ A rj = ⇒ j = i } . (3.12)Define the r -th boundary point set as B r : = A r \ n (cid:91) i = ˚ A ri , (3.13)and let B ri : = B r ∩ A i (3.14)for all i ∈ N n . The r -th evaluation region , the subset of A enclosed by the discretization A r , isdefined as ¯ A r : = { x r | x ∈ A r } , (3.15)and the r -th boundary region , the subset of A enclosed by the boundary point set B r , is given by¯ B r : = { x r | x ∈ B r } . (3.16)17 .3.2. Distance bounds Though the discretization is fully defined, it still needs to be related back to the sets A i j andthe boundary set B . To do this, we first bound the distance separating B r and A i j . Lemma 3.16.
Let A r be the set of points included in the r-th discretization of A, w r the widthat that discretization, and V the adjacency vector set satisfying Definition 3.14. Assume A i j isdefined by Equation (2.1) , edg( · ) by Equation (2.8) , and B ri by Equation (3.14) . Suppose A isconvex, c is a p-norm on X × Y, and B ri (cid:44) ∅ . For each x i ∈ B ri , either x i ∈ edg( A r ) or there existsa point x j = x i + w r v , with v ∈ V , such that x j ∈ B rj for some j (cid:44) i. Thus, if x i (cid:60) edg( A r ) , thedistance from x i to the set A i j , as measured with respect to the ground cost c, is bounded aboveby c ( x i , x j ) . P roof . Recall the definition of A ri in Equation (3.10). Assume x i ∈ B ri \ edg( A r ). By the definitionof B r given in Equation (3.13), there exists x j = x i + w r v ∈ A rj ∪ N ( x ) for some j (cid:44) i , where N ( x ) is the set of neighbors of x as defined in Equation (3.11). By Lemma 3.15, x i ∈ N ( x j ),and since x i ∈ A ri , we have x j ∈ B rj . Thus, x i ∈ A and x j ∈ A , and because A is convex, thisimplies { t x i + (1 − t ) x j | t ∈ [0 , } ⊆ A . Because c is continuous on X × Y , Lemma 3.3 applies. Hence, F is continuous on A . There-fore, because x i ∈ A i and x j ∈ A j , there exists t ∗ ∈ [0 ,
1] such that b = t ∗ x i + (1 − t ∗ ) x j ∈ A i j .Then b = x i + (1 − t ∗ ) w r v , so by applying the ground cost we have (cid:107) b − x i (cid:107) p = (cid:107) (1 − t ∗ ) w r v (cid:107) p = (1 − t ∗ ) (cid:107) w r v (cid:107) p ≤ (cid:107) w r v (cid:107) p = (cid:107) x j − x i (cid:107) p . Therefore, c ( x i , b ) ≤ c ( x i , x j ).Because we can bound the ground cost between the points in B r \ edg( A r ) and the set A i j in terms of the ground cost between neighboring points, it is worth identifying a bound on thatground cost between neighbors. Lemma 3.17.
Suppose c = (cid:107)·(cid:107) p , p ∈ [1 , ∞ ] and assume a shift characterized problem in R d .Let N be defined as given by Equation (3.11) and B ri as given by Equation (3.14) . For the r-thiteration of the boundary method, given width w r , there exists a maximum M r such that, for all x i ∈ B ri and x j ∈ B rj , where i , j ∈ N n and i (cid:44) j, if x j ∈ N ( x i ) , then c ( x i , x j ) ≤ M r ≤ w r d / p . P roof . Let x i and x j be defined as above. By applying the definition given in Equation (3.13), x j = x i + w r v for some v ∈ V . For our Cartesian grid V , (cid:107) v (cid:107) p achieves its maximum when v = d = (1 , . . . , ∈ R d , so c ( x i , x j ) = (cid:107) w r v (cid:107) p = w r (cid:107) v (cid:107) p ≤ w r (cid:107) d (cid:107) p = w r d / p . Therefore, there exists maximum M r such that, for all x i ∈ B ri and x j ∈ B rj , c ( x i , x j ) ≤ M r ≤ w r d / p . ff erences In order to bound the error on the Wasserstein distance, we merely require a finite boundon the errors for the individual shift di ff erences, a i j . However, accurately computing the shiftdi ff erences themselves is also important, and for that reason, we also present theorems that morefinely bound the error on a i j for important ground cost functions. Because estimates are generatedusing one or more computations of g i j ( x ), the magnitude of these errors is dependent on thepoint(s) chosen. 18 emma 3.18. Let A i j be defined by Equation (2.1) , and a i j by Equation (2.6) . Suppose theground cost c satisfies the triangle inequality. Let x ∈ A and i , j ∈ N n such that i (cid:44) j. The errorresulting from approximating a i j at x is bounded above by | α i j ( x ) | ≤ c ( x , b ) , where α i j ( x ) : = (cid:2) c ( x , y i ) − c ( b , y i ) (cid:3) + (cid:2) c ( b , y j ) − c ( x , y j ) (cid:3) , (3.17) and b is the point in A i j nearest to x with respect to the ground cost. P roof . Assume b ∈ A i j is the closest point in A i j to x . Then c ( b , y i ) − c ( b , y j ) = g i j ( b ) = a i j . For every x ∈ A , there exists some α i j ( x ) ∈ R such that c ( x , y i ) − c ( x , y j ) = a i j + α i j . By rearrangement and substitution, we have α i j ( x ) = − a i j + c ( x , y i ) − c ( x , y j ) = − (cid:2) c ( b , y i ) − c ( b , y j ) (cid:3) + c ( x , y i ) − c ( x , y j ) = (cid:2) c ( x , y i ) − c ( b , y i ) (cid:3) + (cid:2) c ( b , y j ) − c ( x , y j ) (cid:3) . Since c satisfies the triangle inequality, c ( x , y i ) − c ( b , y i ) ≤ c ( x , b ) + c ( b , y i ) − c ( b , y i ) = c ( x , b ) . Thus, | c ( x , y i ) − c ( b , y i ) | ≤ | c ( x , b ) | = c ( x , b ) , and, by a similar line of reasoning, | c ( b , y j ) − c ( x , y j ) | ≤ c ( x , b ). Therefore, | α i j ( x ) | ≤ | c ( x , y i ) − c ( b , y i ) | + | c ( b , y j ) − c ( x , y j ) | ≤ c ( x , b ) . In addition to bounding the error for individual points x , we can also establish meaningfulglobal bounds. Lemma 3.19.
Assume a shift characterized problem in R d , where c is a p-norm, p ∈ [1 , ∞ ] .Let w r be the width of the discretization during iteration r, and let x r indicate the box of widthw r centered at the point x . Let B be defined by Equation (2.2) , N by Equation (3.11) , and B ri byEquation (3.14) . Taking α i j as defined by Equation (3.17) and ¯ B r as given by Equation (3.16) ,let α max be the maximum value of | α i j ( x ) | over all x ∈ ¯ B r and i , j ∈ N n , such that: (1) i (cid:44) j,(2) x ∈ x ri for some x i ∈ B ri , and (3) B rj ∩ N ( x i ) (cid:44) ∅ . Then α max ≤ w r d / p and for all x ∈ ¯ B r , (cid:107) x − B (cid:107) p ≤ w r d / p . P roof . Suppose x ∈ ¯ B r . By the definition of our grid, x is contained in some G = Conv( S ),where S is a finite set of neighboring grid points. For each x a , x b ∈ S , x b ∈ N ( x a ), and hence x b = x a + w r v for some v ∈ V , the adjacency vectors described in Definition 3.14. Since (cid:107) v (cid:107) p ≤ d / p , c ( x a , x b ) ≤ w r d / p . Because x a and x b were arbitrarily chosen, this is true of everypair of vertices of G . By the definition of G , x can be written as a convex combination of thepoints in S . Therefore, for any fixed x ∈ S , c ( x , x ) ≤ w r d / p .19ecall the definition of B r given in Equation (3.13). Because x ∈ ¯ B r , Conv( S ) ∩ B r mustbe nonempty. Assume without loss of generality that x = x i ∈ B ri for some i ∈ N n . Because c satisfies the triangle inequality, Lemma 3.18 applies. Hence, there must exist a point x j ∈ B rj , aneighbor of x i , with j (cid:44) i , and a point b ∈ A i j such that c ( x i , b ) ≤ c ( x i , x j ) ≤ w r d / p . Applyingthe triangle inequality, we find that c ( x , b ) ≤ c ( x , x i ) + c ( x i , b ) ≤ w r d / p . Therefore, (cid:107) x − B (cid:107) p ≤ w r d / p and α max ≤ w r d / p . In preparation for bounding the Wasserstein distance error, we now bound the error on theground cost c with respect to individual points in ¯ B r . Lemma 3.20.
Given a shift characterized transport problem in R d , with ground cost c = (cid:107)·(cid:107) p ,p ∈ [1 , ∞ ] . Assume w r is the width of the discretization at the r-th iteration and let ˜ π ∗ bean approximated transport plan with associated transport map (cid:101) T , obtained using the boundarymethod with discretization w r . Suppose π ∗ is an optimal transport plan with associated map T ,and let x in A such that T ( x ) = y i , but (cid:101) T ( x ) = y j . Then the error in the ground cost at the point x is equal to | g i j ( x ) | , where g i j is defined as given in Equation (2.5) . Furthermore, there exists γ max such that, for all such x ∈ A with T ( x ) = y i and (cid:101) T ( x ) = y j for some i (cid:44) j, | g i j ( x ) | ≤ γ max ≤ max ≤ i < ni < j ≤ nA ij (cid:44) ∅ | a i − a j | + w r d / p < ∞ . (3.18)P roof . Let x ∈ A such that T ( x ) = y i , but (cid:101) T ( x ) = y j . Then the error in the ground cost at x equals | c ( x , y i ) − c ( x , y j ) | = | g i j ( x ) | . As a consequence of Lemma 3.19: | g i j ( x ) | = | c ( x , y i ) − c ( x , y j ) | ≤ | a i j | + | α i j ( x ) | ≤ max ≤ i < ni < j ≤ n | a i j | + w r d / p < ∞ . The result is independent of x , i , and j , and therefore there must exist some γ max ≤ max ≤ i < ni < j ≤ n | a i j | + w r d / p < ∞ . As shown in Section 3.3.4, the ground cost error for individual points is finitely bounded overa wide range of admissible ground cost functions. By definition, the measure µ is bounded. Wepropose to identify the largest possible region in which the ground cost error can be non-zero,and to show that the area of that region goes to zero as r goes to infinity. With this, we will showthat the boundary method converges with respect to the Wasserstein distance.In Equation (3.16), we defined a region ¯ B r based on the point set B r . For this, we need toknow that we can choose an initial width w such that, for all iterations r , B ⊂ ¯ B r . Theorem 3.21guarantees that such a width exists, and gives a sense of the relevant features driving the choiceof w . For details about the numerical considerations involved, see Remark 5.20 heorem 3.21. Assume c is a p-norm, p ∈ [1 , ∞ ] , and A = [0 , l ] d . There exists an initial widthw such that, for all w r such that w r ≤ w , x ∈ ˚ A ri , as defined by Equation (3.12) , implies the boxof width w r centered at x , given by x r , satisfies x r ⊆ ˚ A i , where ˚ A i is the strict interior of A i , asdefined by Equation (2.3) . P roof . Recall the definition of A i given by Equation (1.15), A i j given by Equation (2.1), B givenby Equation (2.2), and g i j given by Equation (2.5). Let B ( x , s ) indicate the open ball of radius s (with respect to the p -norm c ) centered at x and C ( x , s ) indicate the d -dimensional cube withside length s (with respect to the Euclidean distance) centered at x . Because c is a p -norm, foreach i ∈ N n , y i ∈ ˚ A i , and therefore there exists δ i > B ( y i , δ i ) ⊆ ˚ A i . Thus, there exist ε > δ ≥ ε , such that, for any i ∈ N n , c ( x , y i ) < δ implies C ( x , s ) ⊆ ˚ A i for all s ≤ ε .Let S : = A \ (cid:91) i ∈ N n B ( y i , δ ) . Because S is a closed set minus a finite number of open sets, S is closed.If all A i j are hyperplanes on S , the claim is self-evident for all w ≤ ε , so assume instead thatat least one A i j is not a hyperplane on S . Let G : = sup x ∈ Si , j ∈ N n i (cid:44) j |∇ g i j ( x ) | and G : = sup x ∈ Si , j ∈ N n i (cid:44) j |∇ g i j ( x ) | . There exists a maximum directional magnitude with respect to the Euclidean distance, M = sup x ∈ S u ∈ R d i ∈ N n | x u − y i u | ≤ l √ d < ∞ , where | x u − y i u | is the magnitude of the vector x − y i projected parallel to the direction of u .Because c ∈ C ( S ), G and G are well-defined. For any x ∈ S and any unit direction vector u ∈ R d , |∇ u g i j ( x ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( x u − y i u ) | x u − y i u | p − ( c ( x , y i )) p − − ( x u − y j u ) | x u − y j u | p − ( c ( x , y j )) p − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ M p − δ p − < ∞ . and |∇ u g i j ( x ) | = ( p − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | x u − y i u | p − ( c ( x , y i )) p − − | x u − y i u | p − ( c ( x , y i )) p − − | x u − y j u | p − ( c ( x , y j )) p − + | x u − y j u | p − ( c ( x , y j )) p − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p − M p − δ p − (cid:34) + M p δ p (cid:35) < ∞ . Hence, G < ∞ and G < ∞ . 21ssume the Gaussian curvature of the set A i j at a point x ∈ A i j is given by the function K i j ( x ),and when K i j ( x ) (cid:44) R i j ( x ) = | K i j ( x ) | − . Because K i j ( x ) isdefined as a product of first and second directional derivatives of g i j , and those derivatives arebounded, there exists a maximum absolute Gaussian curvature for B on S , given by K : = sup i , j ∈ N n i (cid:44) j x ∈ A ij ∩ S | K i j ( x ) | < ∞ . Because at least one A i j is not a hyperplane, K >
0. Because K < ∞ , for any i , j ∈ N n , i (cid:44) j andany x ∈ A i j ∩ S , the radius of curvature is bounded below: R i j ( x ) ≥ K − > ε = K √ d . Suppose s ≤ min { ε, ˜ ε } , x ∈ ˚ A ri for some i ∈ N n , and that µ ( A j ∩ C ( x , s )) > j ∈ N n , j (cid:44) i .The set C ( x , s ) is the cube surrounding x and its neighbors. Because x ∈ ˚ A ri , A i j cannotbe a hyperplane in C ( x , s ), and so R i j is well-defined on C ( x , s ). If there exist k ∈ N n and x ∈ C ( x , s ) such that c ( x , y k ) < δ , then C ( x , s ) ⊆ C ( x , ε ) ⊆ ˚ A k , and since x ∈ ˚ A ri , thisimplies k = i and C ( x , s ) ⊆ ˚ A i . This implies C ( x , s ) ∩ A j = ∅ , which contradicts the claimthat µ ( A j ∩ C ( x , s )) >
0. Therefore, c ( x , y k ) ≥ δ for all k ∈ N n and x ∈ C ( x , s ). This implies C ( x , s ) ⊆ S . Hence, the intersection of the boundary A i j with the cube C ( x , s ) must have apoint with minimum radius of curvature, x m : = arg inf x ∈ A ij ∩C ( x , s ) R i j ( x ) , and since x m ∈ S , it must be the case that R i j ( x m ) ≥ / K .Because µ ( A j ∩ C ( x , s )) >
0, but x (cid:60) A j , there must exist x c ∈ A i j ∩ C ( x , s ). Hence, withinthe cube C ( x , s ), there must be a d -dimensional sphere (or partial sphere) of radius R i j ( x m ),not in A i , whose boundary intersects x c (“partial” because the sphere may be cut o ff by one ormore of the planes bounding the cube). Call this (partial) sphere (cid:101) S .Since x ∈ ˚ A ri , it must be the case that (cid:101) S ∩ { N ( x ) ∪ { x }} = ∅ , where N is the set of neighborsdefined by Equation (3.11). Because x c ∈ C ( x , s ), and the maximum distance between gridpoints in C ( x , s ) is s √ d , this requires R i j ( x m ) < s √ d /
2. Hence, there exists x m ∈ A i j ∩ S suchthat R i j ( x m ) < s √ d ≤ K . This contradicts R i j ( x m ) ≥ / K . Thus, it must be the case that for all j ∈ N n , j (cid:44) i implies µ ( C ( x , s ) ∩ A j ) =
0, and therefore C ( x , s ) ⊆ ˚ A i .Setting w ≤ min { ε, ˜ ε } completes the proof. Remark 5.
By the boundary method’s very nature, any initial configuration must enclose theboundary in a way that allows it to be distinguished from the region interiors. This is the meaningbehind the width w considered in Theorem 3.21. In principle, w may need to be quite small.In practice, the potential problems associated with an overly-large w rarely occur, and they areobvious when they do. We did occasionally observe an issue when the initial w was so largethat a region x r could contain an entire A i (in other words, when w was significantly largerthan the δ described in Theorem 3.21). In those cases, the a ff ected region’s a i was such that ( y i , y j ) = a i − a j for some j (cid:44) i, and the resulting transport plan had µ ( A i ) = . Hence, the set { a i } ni = and reconstructed regions { A i } ni = directly revealed when such an error had occurred.Also, because of the nature of the iterative method, a poor choice of w quickly becomesobvious in the boundary region itself. Simply put, the loss of any portion of the boundary setB destabilizes the method. Losing part of B creates a visible gap in the “wall” between tworegions, and the gap increases in size with each successive iteration. This behavior seems tooccur whenever some part of B is lost, no matter what the cause. For example, in our tests weobserved that discarding an edge box that intersects B results in the same progressive damage tothe boundary set. Not surprisingly, this also “stalls” the convergence of the Wasserstein distancein ways that are obvious during computation.In our numerical tests, we used w ≤ / n and obtained consistently reliable results. Next, we show that a well-chosen initial width and grid arrangement can guarantee that, forevery iteration r , each point in A r \ B r corresponds to a box in the interior of some region A i . Theorem 3.22.
Assume c is a p-norm, p ∈ [1 , ∞ ] , and A = [0 , l ] d . Suppose the first iterationwidth w is chosen as described in Theorem 3.21. Fix r, let w r ≤ w , and let A r be the bound-ary set remaining at the r-th iteration. Given the definition of B from Equation (2.2) , ¯ A r fromEquation (3.15) , ¯ B r from Equation (3.16) , if B ⊆ ¯ A r , then B ⊆ ¯ B r , and hence B ⊆ A r + . P roof . We will show the conclusions by proving that x (cid:60) ¯ B r implies x (cid:60) B .Suppose x (cid:60) ¯ B r . If x (cid:60) ¯ A r , then x (cid:60) B , since by assumption, B ⊆ ¯ A r . Thus, we assumeinstead that x ∈ ¯ A r \ ¯ B r .Because x ∈ ¯ A r , we know x ∈ x r , the box of radius w r centered around some x ∈ A r . Wehave x ∈ A i for some i ∈ N n , where A i is defined as given in Equation (1.15), and so by thedefinition of A ri from Equation (3.10), x ∈ A ri . However, x (cid:60) ¯ B r implies x r (cid:42) ¯ B r , so from thedefinition of B r given in Equation (3.13), x (cid:60) B r . Because, x ∈ A ri \ B r = ˚ A ri (see Equation (3.12)),by Theorem 3.21, x r ⊆ ˚ A i . Hence, x ∈ ˚ A i . Therefore, by Equation (2.3), x (cid:60) B .Now that we have ensured B ⊆ ¯ B r , we aim to construct a region of controlled volume enclos-ing ¯ B r : ¯ B r ⊆ ¯ B r + . Then we show that, as r → ∞ , the volume of ¯ B r + in R d goes to zero with respectto the Lebesgue measure. This will allow us to put a convenient upper bound on the volume of¯ B r in terms of the width w r . Because ¯ B r + exists solely in A , and not on the product space, we canonce again rely on the Euclidean distance in R d . Lemma 3.23.
Assume a shift characterized transport problem in R d , with c = (cid:107)·(cid:107) p , p ∈ [1 , ∞ ] .Suppose w r is the width used for the r-th iteration, and assume B is defined as given in Equa-tion (2.2) , ¯ B r as given in Equation (3.16) . Let the region ¯ B r + ⊆ A be defined as ¯ B r + : = { x ∈ A | (cid:107) x − B (cid:107) ≤ w r √ d } . (3.19) For all r, ¯ B r ⊆ ¯ B r + . P roof . By definition, ¯ B r ⊆ A . Suppose x ∈ ¯ B r . Because we are applying the Euclidean norm,Lemma 3.19 implies that (cid:107) x − B (cid:107) ≤ w r √ d , and since x ∈ A , x ∈ ¯ B r + . Theorem 3.24.
Assume c is a p-norm, p ∈ [1 , ∞ ] . Let w r be the width of the discretizationapplied during the r-th iteration. Given the definition of B in Equation (2.2) and ¯ B r in Equa-tion (3.16) , if µ ( B ) = and there exists some constant ˜ L such that | B | = ˜ L < ∞ with respect tothe R d − Lebesgue measure, then there exists some L < ∞ , such that (cid:12)(cid:12)(cid:12) ¯ B r (cid:12)(cid:12)(cid:12) ≤ w dr L with respect tothe R d Lebesgue measure. roof . Recall the definition of ¯ B r + given in Equation (3.19). We know (cid:82) ¯ B r + d x = (cid:82) A χ (cid:104) ¯ B r + (cid:105) ( x ) d x .Let B ( x , ρ ) be the closed ball of radius ρ centered at x , and defined with respect to the Euclideandistance. Write (cid:90) A χ (cid:104) ¯ B r + (cid:105) ( x ) d x = (cid:90) A χ (cid:104)(cid:110) x ∈ A | (cid:107) x − B (cid:107) ≤ w r √ d (cid:111)(cid:105) ( x ) d x = (cid:90) A χ (cid:104)(cid:110) x ∈ A | x ∈ B ( z , w r √ d ) for some z ∈ B (cid:111)(cid:105) ( x ) d x , ≤ (cid:90) A χ [ B ]( z ) (cid:32)(cid:90) A χ (cid:104) B ( z , w r √ d ) (cid:105) ( x ) d x (cid:33) d z . For all fixed x , (cid:90) A χ (cid:104) B ( x , w r √ d ) (cid:105) ( x ) d x ≤ Vol d (cid:16) w r √ d (cid:17) , where Vol d ( ρ ) is the volume of the d -dimensional sphere of radius ρ , defined with respect to theEuclidean distance. By using the Γ function, this volume can be written asVol d (cid:16) w r √ d (cid:17) : = π d / d Γ (cid:16) d (cid:17) (cid:16) w r √ d (cid:17) d = π d / ( d / (cid:16) w r √ d (cid:17) d if d = k for some k ∈ Z k !(4 π ) k d ! (cid:16) w r √ d (cid:17) d if d = k + k ∈ Z . Because the volume is independent of the point x ∈ A , we therefore have (cid:90) ¯ B r + d x ≤ (cid:90) A χ [ B ]( z ) (cid:90) A χ (cid:104) B ( z , w r √ d ) (cid:105) ( x ) d x d z , ≤ (cid:90) A χ [ B ]( z ) Vol d (cid:16) w r √ d (cid:17) d z = Vol d (cid:16) w r √ d (cid:17) (cid:90) B d z = w dr L , where L : = ˜ L π d / ( d / (cid:16) √ d (cid:17) d if d = k for some k ∈ Z ˜ L k !(4 π ) k d ! (cid:16) √ d (cid:17) d if d = k + k ∈ Z . Let x ∈ ¯ B r . By applying Lemma 3.19 with c the Euclidean distance, we know that for all x ∈ ¯ B r , (cid:107) x − B (cid:107) ≤ w r √ d , which implies x ∈ ¯ B r + . Thus, ¯ B r ⊆ ¯ B r + , which implies | ¯ B r | ≤ | ¯ B r + | ≤ w dr L . Remark 6.
The interplay between B, B r , ¯ B r , and ¯ B r + is nontrivial. Figure 3 helps to visualizeit properly. In Figure 3(a), we show placement of some boundary set B r . It is crucial thatthe subgrid created by B r completely surrounds B, because that is the only way to ensure thatB ⊆ ¯ B r . One can see in this image how a (very degenerate) choice of c, coupled with the rightarrangement of y i ’s, might allow a small and sharply curved boundary set to slip unnoticedbetween points.As Figure 3(b) illustrates, each point in B r appears as the center of its corresponding box,and the boxes completely cover the boundary set.The region ¯ B r + is deliberately constructed to entirely cover all the boxes in ¯ B r . As Figure 3(c)shows, its volume can be significantly larger than that of the boxes it contains. However, theworst-case “thickness” given to ¯ B r + ensures that it will always enclose both B and ¯ B r . a) B r surrounding B (b) boxes ¯ B r covering B (c) region ¯ B r + covering ¯ B r Figure 3: Detail from problem in Figure 5(a): Boundary set interactions near A ∩ A ∩ A Theorem 3.25.
Assume µ is absolutely continuous and let P ∗ be the Wasserstein distance. Letw r be the width of the r-th iteration of the boundary method. Given the definition of B in Equa-tion (2.2) and ¯ B r in Equation (3.16) , suppose B ⊆ ¯ B r , and that there exists some L such that | ¯ B r | = w dr L < ∞ with respect to the d-dimensional Lebesgue measure. If γ max < ∞ is the max-imum error of the ground cost in the set ¯ B r , and (cid:101) P ∗ is the Wasserstein distance approximationobtained with the boundary method, then the value of µ on A is bounded by some M < ∞ and (cid:12)(cid:12)(cid:12)(cid:101) P ∗ − P ∗ (cid:12)(cid:12)(cid:12) ≤ w dr LM γ max , (3.20) where the bound equals the maximum possible volume of ¯ B r multiplied by the maximum value of µ and the maximum error of the ground cost. P roof . If x ∈ A \ ¯ B r , then x has been identified as being in the interior of A i for some i ∈ N n .Thus, the cost error associated with the points outside ¯ B r is zero.Suppose instead that x ∈ ¯ B r . By definition, the absolute value of the di ff erence between thecorrect and approximated ground costs at x is less than or equal to γ max . Condition (1)(a) requires µ to be absolutely continuous, so there exists M such that, for all x ∈ X , 0 ≤ µ ( x ) ≤ M < ∞ .Therefore, the error on the Wasserstein distance is bounded above by (cid:12)(cid:12)(cid:12)(cid:101) P ∗ − P ∗ (cid:12)(cid:12)(cid:12) ≤ (cid:90) ¯ B r γ max d µ ( x ) = (cid:90) ¯ B r µ ( x ) γ max d x ≤ (cid:90) ¯ B r ( M ) γ max d x = (cid:12)(cid:12)(cid:12) ¯ B r (cid:12)(cid:12)(cid:12) M γ max ≤ ( w dr L ) M γ max . Remark 7.
The bounds in Theorems 3.24 and 3.25 indicate that the volume of the boundary setand the error of the computed Wasserstein distance decrease according to the dimension of thespace. Thus, we should expect our numerical tests to show a quadratic (in R ) or cubic (in R )decrease of the Wasserstein distance error. These decreases are clearly observed in practice, seeSection 4. . Numerical results As mentioned in Remark 2, some choices of µ may make it necessary to approximate µ ( x r )or (cid:90) x r c ( z , y i ) d µ ( z ). However, the majority of numerical studies we have seen restrict to simplechoices of µ (most often uniform). For this reason, we restricted our examples to cases where thecost and mass integrals can be written in a closed form. µ ( x r )The integral of µ over some box can be written as: µ ( x r ) = (cid:90) x r µ ( z ) d z = M ( z ) (cid:12)(cid:12)(cid:12)(cid:12) z ∈ x r with M : X → R ≥ . (4.1)Since µ is a probability density function, we must have (cid:82) A d µ =
1. For convenience, let ˆ µ denotean un-normalized version of µ , and similarly for ˆ M .Using the linearity of the integral, one can use linear combination of simple functions forwhich exact solutions are known. We can also construct more complex measures by partitioning A into disjoint subsets. In this case, however, we add an additional restriction in order to be surethat exact solutions can always be found: We µ -partition A into subsets S , . . . , S σ , such thatthe boundaries of each S s fall on the initial set of grid lines. Assume that for each set S s , thereexists a density function ˆ µ s that is exactly solvable on S s . From these, we consider ˆ µ (and ˆ M ) tobe the piecewise functions defined on each S s as ˆ µ s (and ˆ M s , respectively).Most of our computations were performed in two-dimensions. For such problems, giveniteration r and x = ( x , x ) ∈ A , ˆ µ ( x r ) can be written asˆ µ ( x r ) = = ˆ M ( x + w r / , x + w r / − ˆ M ( x + w r / , x − w r / − ˆ M ( x − w r / , x + w r / + ˆ M ( x − w r / , x − w r / . (4.2)The closed-form choices used in our numerical tests are shown in Table 1. As described above,we used the table entries as building blocks in the construction of more complex measures. Table 1: Closed-form options for µ ˆ µ (( x , x )) = M ( u , v ) = uv ˆ µ (( x , x )) = x t x t , t > M ( u , v ) = ( t + − ( u t + v t + )ˆ µ (( x , x )) = e tx , t (cid:44) M ( u , v ) = t − ve tu ˆ µ (( x , x )) = e tx , t (cid:44) M ( u , v ) = t − ue tv x r We performed many tests where µ could be computed exactly but the Wasserstein distancecould not; see Section 4 for details. In such cases, we made no attempt to approximate P ∗ ,choosing instead to focus on the accuracy of the µ -partition generated by the approximate shiftset { ˜ a i } ni = . 26owever, there were a number of cases in two dimensions where the choice of µ and c allowed for closed-form computations. In those cases, because the combination of c and µ givesus an exact solution, there exists C : X × Y → R ≥ such that (cid:90) x r c ( z , y i ) d µ ( z ) = C ( z , y i ) (cid:12)(cid:12)(cid:12)(cid:12) z ∈ x r . (4.3)As in Section 4.1.1, we write ˆ C when working with ˆ µ .Now consider X , Y ⊂ R , x = ( x , x ) ∈ A , and y = ( y , y ) ∈ { y i } ni = . When µ ( x r ) =
0, theWasserstein distance on x r is also zero. For those boxes where µ ( x r ) >
0, we can take advantageof the uniformity to define the function ˆ C in terms of a single variable: the component-wisedistance between points given by ( ∆ , ∆ ), where ∆ = | x − y | , ∆ = | x − y | . When theWasserstein distance over x r can be computed and is non-zero, it takes the form (cid:90) x r c ( z , y ) d ˆ µ ( z ) = ˆ C ( ∆ + w r / , ∆ + w r / − ˆ C ( ∆ + w r / , ∆ − w r / − ˆ C ( ∆ − w r / , ∆ + w r / + ˆ C ( ∆ − w r / , ∆ − w r / , (4.4)where ˆ C : R → R ≥ is an explicit function.Table 2 gives Wasserstein distance functions ˆ C for c the 2-norm and the p -th power of some p -norm ( p ∈ [1 , ∞ )). By leveraging the linearity of the integral and subdividing A into disjointsets, we can build combinations of ground costs and measures with closed form C . We used thisto perform tests in R , with µ being either uniform or zero in relevant boxes. Table 2: Closed-form options for C when µ is uniform or zero on A c ˆ C ( u , v )2-norm u log (cid:16) √ u + v + v (cid:17) + uv √ u + v if( u , v ) (cid:44) + v log (cid:16) √ u + v + u (cid:17) u , v ) = p -th power p -norm ( p + − ( u p + v + uv p + ) When µ is uniform and the region boundaries are known exactly, we can use the formulas forthe ground cost given in Table 2 to compute exact values. The 2-norm was used for the problemsshown in Figures 4(a) and 4(b), and the 1-norm was used to generate Figure 4(c).For two points on the Northwest-Southeast diagonal, placed as shown in Figure 4(a), theexact Wasserstein distance is equal to P ∗ NWSE : = (cid:104) √ + √ + sinh − (1) + √ − (2) + sinh − (3) (cid:105) ≈ . . y (a) 2 points: NW-SE line y y y y y y y y y y y y y y y y (b) 4 × y y (c) 2 points: “bad” 1-normFigure 4: Problems where the exact Wasserstein distance and set of shifts are knownTable 3: Wasserstein errors for the NW-SE, 4 ×
4, and “bad” 1-norm problems w ∗ abs. error2 − . × − − . × − − . × − − . × − − . × − − . × − − . × − − . × − − . × − (a) NW-SE errors w ∗ abs. error2 − . × − − . × − − . × − − . × − − . × − − . × − − . × − − . × − − . × − (b) 4 × w ∗ abs. error2 − . × − − . × − − . × − − . × − − . × − − . × − − . × − − . × − − . × − (c) “Bad” 1-norm errors Table 3(a) shows the absolute error for various w ∗ . Note that the actual decrease in error isroughly quadratic in w ∗ : | (cid:101) P ∗ NWSE − P ∗ NWSE | ≈ . w ∗ ) . .When we have a 4 × y i in the center, as shown in Fig-ure 4(b), the exact Wasserstein distance is equal to P ∗ × : = (cid:104) √ + sinh − (1) (cid:105) ≈ . . Table 3(b) shows the error. Again, the observed error decrease is roughly quadratic in w ∗ : | (cid:101) P ∗ × − P ∗ × | ≈ . w ∗ ) . .Recall that the theorems presented in Section 3 o ff er no convergence guarantee for the be-havior of the 1-norm. In fact, the optimal solution may not be µ -a.e. unique, in which case the set { A i } ni = may not partition A . This is exactly what happens for the problem shown in Figure 4(c).The problem is identical to that shown in Figure 4(a), except that c is the 1-norm. Because ofthe change in norms, the Northeast and Southwest corners of A do not have unique transportdestinations. The loss of µ -a.e. uniqueness, and resulting failure to partition, is clearly visible inthe figure. However, the exact Wasserstein distance can still be computed, and is equal to P ∗ bad : = . w ∗ : | (cid:101) P ∗ bad − P ∗ bad | ≈ . w ∗ ) . .For all three problems, we know the exact shift values: since every point in A goes to thenearest y i , the shift di ff erences are all zero, which means every shift should be identical. In the4 × w ∗ . For the4 × w ∗ : | ˜ a − ˜ a | ≈ . w ∗ ) . . When w ∗ = − , this shift erroris 4 . × − . As Section 3.3.6 shows, even if the Wasserstein distance is unknown, the Wasserstein ap-proximation error at the end of the r -th iteration is bounded above by (cid:88) x ∈ B r µ ( x r ) max x ∈ x r g i j ( x ) , (4.5)where i and j , i (cid:44) j refer to the destinations of x and some neighbor. In practice, we can usecontinuity to refine that estimate still further, as described in Section 2.2.3.As w r →
0, max x ∈ x r g i j ( x ) → | a i j | for each i (cid:44) j , and µ ( ¯ B r ) →
0. If the boundary method isworking e ff ectively, we can expect to see the Wasserstein distance error decreasing with respectto µ ( ¯ B r ). If µ is uniform on A , that decrease should be linear with respect to the volume | ¯ B r | .We considered the change in the computed Wasserstein distance for Example 2.1 using threecanonical ground costs: the 1-norm, the 2-norm, and the squared 2-norm. The resulting µ -partitions are shown in Figures 7(a), 5(a), and 8(a), respectively. Since µ is uniform, and A = [0 , , Theorem 3.25 suggests that we should see a quadratic convergence for the 2-norm. (Thetheorem makes no convergence claim for the 1-norm or the squared 2-norm.)For each ground cost, we computed the Wasserstein approximation error in two ways:1. The worst-case Wasserstein distance error bound, given by applying (4.5).2. The rate of change with respect to a reference approximation, ∆ (cid:101) P ∗ ( w ∗ ) : = ∆ (cid:101) P ∗ (2 − m ) = (cid:12)(cid:12)(cid:12)(cid:101) P ∗ m − (cid:101) P ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:101) P ∗ m + − (cid:101) P ∗ (cid:12)(cid:12)(cid:12) , for m < . For all three ground cost functions, the worst-case Wasserstein distance error is roughly linearin w ∗ , and the rate of change of ∆ (cid:101) P ∗ is roughly quadratic in w ∗ . Though Theorem 3.25 onlyguarantees quadratic convergence for the 2-norm, we also observe quadratic convergence for the1-norm and squared 2-norm. For this example, the 1-norm generated a µ -a.e. unique partition,but comparable convergence was seen in tests where partitioning failed. Results are given inTable 4. µ -Partitions in R µ and ν We include three examples with variations of µ and ν , shown in Figure 5. All three assume c is the 2-norm. 29 able 4: Wasserstein approximation behavior with respect to w ∗ c (cid:101) P ∗ err max ( w ∗ ) ∆ (cid:101) P ∗ ( w ∗ )1-norm 0 . . w ∗ ) . . w ∗ ) . . . w ∗ ) . . w ∗ ) . squared 2-norm 0 . . w ∗ ) . . w ∗ ) . y y y y y (a) µ and ν uniform y y y y y y y y (b) ν non-uniform y y y y y y y y y y y y y y y y (c) µ ( x , x ) = x x Figure 5: Partitions for uniform and non-uniform measures
In Figure 5(a), we assume µ is the uniform continuous probability distribution on A , and ν is the uniform discrete distribution with n =
5. The five points where ν = / µ -partition obtained by the boundary method; forcomparison, see [33, Figure 3 (right)].Starting from the points shown in Figure 5(a), we next take the point y and split it into fournew points, each of one quarter-mass, positioned equidistantly from the point’s original location.This gives us a non-uniform ν with four points of weight 1 / /
20. We keep µ uniform. The resulting µ -partition is shown in Figure 5(b).In Figure 5(c), we choose the nonuniform probability density µ ( x , x ) = x x . For ν , wechoose the uniform 4 × µ ’s nonuniformity becomes obvious. While the individual regionsno longer have equal Lebesgue measure, each has equal µ -measure 1 /
16. The larger regions inthe lower-left correspond to the lower density of µ in that corner, while the smaller regions in theupper-right correspond to the higher concentration of µ -density there. µ Next, we deliberately introduce a discontinuous µ that is not strictly positive: µ ( x ) = x ∈ [0 , / / (cid:82) A d µ ( x ) =
1, so µ is a probability density function on A = [0 , . For ν , we usethe uniform 4 × y y y y y y y y y y y y y y y (a) boundary region ¯ B r y y y y y y y y y y y y y y y y (b) shift characterization y y y y y y y y y y y y y y y y (c) shaded zero- µ regionFigure 6: µ is zero in the lower-left quadrant In Figure 6(a), we see the boundary set used to generate the solution. Points between regionsare retained, as are points adjacent to regions of measure zero. No computations are done on thelower-left region, because any destination is equally valid on boxes of µ -measure zero.However, when the shift definition is applied to the semi-discrete optimal transport problem,there is only one valid shift-characterized solution over A . Figure 6(b) shows that solution. Theunique shift di ff erences force the selection of a unique boundary set B , even in the region where µ is zero.Figure 6(c) shows the shift characterization again, but here the region of µ -zero measure isshaded, helping to confirm visually that the regions have equal µ -measure. Figure 6(c) also showsthe locations of intersection points we identified using the boundary method. These intersectionswere used to accurately compute the set of shifts. The computations in Sections 4.3.1 and 4.3.2 all assume the ground cost function equalsthe 2-norm, but as Section 4.2.2 suggests, computation with other functions is quite possible.Using the same problem solved with the 2-norm in Figure 5(a), we generated µ -partitions for awide range of p -norm ground costs. Results for the 1-norm, 10-norm, and ∞ -norm are shown inFigure 7. Note that the 1-norm and ∞ -norm converge to ( µ -a.e. unique) solutions, even thoughthose norms are not covered by our theorems. The computations above all assume that the ground cost function is a norm. However, theboundary method works equally well on much more general ground cost functions. Three exam-ples are shown in Figure 8.Figure 8(a) shows the result given by the squared 2-norm. Because the squared 2-norm isnot itself a norm, we were only able to make the most general mathematical claims regardingits behavior. However, the boundary method has no trouble with it. In fact, as Section 4.2.2indicates, its convergence behavior is practically identical to that of the p -norms.When 0 < p <
1, the p -norm formula can still be applied, even though the resulting functiondoes not satisfy the triangle inequality. Formally, one has c p ( x , x ) : = (cid:80) dk = (cid:104) | x k − x k | p (cid:105) / p p ∈ (0 , ∞ )max k ∈ N d | x k − x k | p = ∞ . (4.7)31 y y y y (a) 1 -norm ground cost y y y y y (b) 10-norm ground cost y y y y y (c) ∞ -norm ground costFigure 7: Equal area using di ff erent ground cost norms y y y y y (a) squared 2-norm y y y y y (b) p -function ground cost y y y y y (c) p -polynomial ground costFigure 8: Equal area using non-norm ground costs As Figure 8(b) illustrates, even these “ p -function” transport problems can be approximated.However, when p <
1, the regions become discontinuous and disconnected, as typified by the“spikes” on the exterior walls. (The spike on the lower right is part of the region coupled with y , while the other four spikes are coupled with y .) Note that the 1 / p -functionswith positive coe ffi cients: c ( x , x ) = c ( x , x ) / + c / ( x , x ) . This function, like many other “ p -polynomial” functions, is neither convex nor concave, chang-ing behavior with distance. µ -Partitions in R As we showed in Section 3, there is no theoretical obstacle to applying the boundary methodto higher-dimensional problems, though visual representation becomes more complex.The image in Figure 9 was generated by taking c to be the 2-norm, µ the uniform continuousprobability density, and ν the uniform discrete probability density with five randomly-placednon-zero points in [0 , . Even in this relatively simple case, it is impossible to find a single32 igure 9: Three-dimensional semi-discrete solution with n = point-of-view that clearly shows all five non-zero points while clearly illustrating the boundariesof the µ -partitions. However, even though clear illustration is problematic, the computationsmade with the boundary method were completely successful. One important advantage to the boundary method is its reduction of the complexity of thediscretized problem, compared to traditional methods. Before considering the numerical results,it is worth developing a generalized comparison that puts this reduction in perspective:Suppose for the sake of argument that a discretization with width 2 − M is required to solve a prob-lem in R with N positive points in Y . Generating the full grid would create a product space X × Y of size 2 M N . Say the boundary method is used instead, with a fixed initial discretization widthof 2 − . Each application of Step (2) of the boundary method algorithm removes approximatelyhalf the points in A r , so by discarding interiors the method constructs a product space of size2 M + N .Assume that we compute solutions for both the boundary method and the full product spaceusing the same linear solver (e.g., the network simplex method). Using it to solve the largestboundary problem of size 2 M + N , we have V = M + + N vertices and E = M + N edges.Solving over the full product space gives V = M + N vertices and E = M N edges. Hence,even if we assumed a solver with complexity O ( V ) (and no such solver exists), the ratio wouldbe approximately 2 M to M . Typically, it is closer to 2 M to M .Of course, this improved complexity would be irrelevant if the constant factor was exces-sively large. Fortunately, that this is not the case, as our next section illustrates.Since we focus here on the semi-discrete problem, for purposes of evaluating complexity, weassume X is discretized into W d elements, and Y into N elements. Given W su ffi ciently large, theresulting network has W d + N ∼ O ( W d ) nodes and W d N arcs. = / w ∗ Here we consider scaling on the plane with respect to W = w ∗ . We used Example 2.1, with µ and ν uniform and c the 2-norm. The locations of the 5 points where ν = / w ∗ = − m , m ∈ N , and computed the timetaken by the boundary method. By repeating this process for a few di ff erent location sets (and33veraging them), we estimated the average scaling behavior of the boundary method with respectto W . The test results are shown in Table 5(a), and the scaling equations are on the left side ofTable 6. Table 5: Planar scaling with respect to W and N N = W T (sec) S (MB)2 .
855 24 . .
005 49 . .
497 98 . .
025 196 . .
093 394 . .
577 785 . .
397 1571 . .
158 3151 . .
660 6309 . (a) Scaling with respect to W W = W = N T (sec) S (MB) T (sec) S (MB)128 6.938 17.25 22.365 33.91136 12.190 18.24 36.601 35.05144 10.982 17.99 29.952 36.49152 13.139 18.54 36.703 41.27160 11.420 18.66 34.801 40.27168 15.727 20.97 44.959 40.66176 15.332 21.38 44.873 43.06184 18.243 21.38 53.689 43.20192 12.796 21.60 40.029 43.66 (b) Scaling with respect to N Table 6: Time and storage scaling with respect to W and N separately T ( W ) ≈ . × − W ln W Time T ( N ) ≈ . × − N ln NS ( W ) ≈ . × − W Storage S ( N ) ≈ . N / To evaluate planar scaling with respect to N , we performed multiple runs in [0 , where W = was fixed and µ and ν were uniform. The N = n points where ν = / n were placed atrandom locations in A . Because the resulting time data was highly dependent on point placement,it was extremely noisy. Thus, we did ten runs for each N and took the median. We started with N = N = N . Increasing N means one must consider the scaling behavior of the boundary method withrespect to N , as described in Section 4.5.2, above. However, there is another relevant limitingfactor for large N : the decreasing area size µ ( A i ) = n − runs up against the accuracy of thereconstruction. For the problem shown in Figure 10, the area of each region is 5 . × − . When w ∗ = − , the maximum error for the area of the partition regions is 8 . × − . This is 16 . N requires that we increase W tomatch. Hence, we wanted to consider what happens as W and N increase in tandem.34 a) N =
200 points in R (b) µ -PartitionFigure 10: Partitioning with large N As it turns out, the scaling we observe is consistent with the product of the two scalingbehaviors already determined: O ( WN log W log N ) with respect to time, and O ( WN / ) withrespect to storage. See Table 7 for approximate equations. Table 7: Time and memory scaling with respect to both W and N Time T ( N , W ) ≈ . × − WN ln W ln N Storage S ( N , W ) ≈ . × − WN / R d The computations described above can be repeated in three dimensions. We scale W sepa-rately by taking a projection of Example 2.1 into the center of the cube [0 , . Then we considerthe median of tests when W = and N ranges from 8 to 80. Finally, we scale W and N together,and consider their combined behavior. Approximate scaling equations are given in Table 8. Table 8: 3-D scaling with respect to W and N W alone Time T ( W ) ≈ . × − W ln W Storage S ( W ) ≈ . × − W N alone Time T ( N ) ≈ . × − N ln N Storage S ( N ) ≈ . × N / W and N Time T ( N , W ) ≈ . × − W N ln W ln N Storage S ( N , W ) ≈ . × − W N / Taking the combined scaling equations for two and three dimensions, and extrapolating toarbitrary dimension d ≥
2, we anticipate scaling of T ( d , N , W ) ∼ O ( W d − N log W log N ) and S ( d , N , W ) ∼ O ( W d − N / d ) . . Conclusions and future work In this work, we presented the boundary method, a new technique for approximating solu-tions to semi-discrete optimal transportation problems. We gave an algorithmic description andmathematical justification. As we showed, by tackling only the boundary of the regions to betransported, the method has very favorable scaling properties. Under the assumption that allcomputations are exact, we gave sharp convergence results for p -norms with p ∈ (1 , ∞ ), andwe presented numerical examples supporting those convergence results. We showed that theboundary method can provide accurate approximations of the partition regions and Wassersteindistance for a multitude of cost functions, including some that are not covered by our theorems:the 1-norm, the ∞ -norm, strictly convex non-norms such as the squared 2-norm, concave non-norms such as p -functions with p ∈ (0 , p -functions thatare neither concave nor convex. As we also showed, even when partitioning fails, the boundarymethod can solve with accuracy and convergence comparable to the case where a partition ex-ists. Our future work will consider applications of the boundary method to fully continuous masstransportation problems and the impact of estimated computations on convergence. References [1] G. Monge, M´emoire sur la th´eorie des d´eblais et des remblais, in: Histoire de l’Acad´emie Royale des Sciences deParis, avec les M´emoires de Math´ematique et de Physique pour la mˆeme ann´ee, Acad´emie des sciences (France).,1781, pp. 666–704, in French.[2] L. V. Kantorovich, On the translocation of masses, C.R. (Doklady) Acad. Sci. URSS (N.S.) 37 (1942) 199–201.[3] L. V. Kantorovich, On a problem of Monge, Uspekhi Mat. Nauk 3 (1948) 225–226.[4] C. Villani, Topics in Optimal Transportation, Vol. 58 of Graduate Studies in Mathematics, American MathematicalSociety, Providence, R.I., 2003.[5] W. Gangbo, R. J. McCann, The geometry of optimal transportation, Acta Mathematica 177 (2) (1996) 113–161.[6] J. A. Cuesta-Albertos, A. Tuero-D´ıaz, A characterization for the solution of the Monge-Kantorovich mass transfer-ence problem, Statistics and Probability Letters 16 (2) (1993) 147–152.[7] A. Pratelli, On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation,Annales de l’Institut Henri Poincare (B): Probability and Statistics 43 (1) (2007) 1–13.[8] L. R¨uschendorf, Monge-Kantorovich transportation problem and optimal couplings, Jahresbericht der DeutschenMathematiker-Vereinigung 109 (3) (2007) 113–137.[9] L. R¨uschendorf, L. Uckelmann, Numerical and analytical results for the transportation problem of Monge-Kantorovich, Metrika 51 (3) (2000) 245–258.[10] F. Aurenhammer, Power diagrams: properties, algorithms and applications, SIAM Journal on Computing 16 (1)(1987) 78–96.[11] M. Muskulus, A. M. Slats, P. J. Sterk, S. Verduyn-Lunel, Fluctuations and determinism of respiratory impedancein asthma and chronic obstructive pulmonary disease, Journal of Applied Physiology 109 (2010) 1582–1591.[12] G. Carlier, A general existence result for the principal-agent problem with adverse selection, Journal of Mathemat-ical Economics 35 (2001) 129–150.[13] S. Haker, A. Tannenbaum, Optimal mass transport and image registration, in: IEEE Workshop on Variational andLevel Set Methods in Computer Vision: Proceedings: 13 July, 2001, Vancouver, Canada, IEEE Computer Society,Los Alamitos, California, 2001, pp. 29–36.[14] M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, in: Advances in Neural InformationProcessing Systems, Vol. 26, Curran Associates, Inc., 2013, pp. 2292–2300.[15] E. A. Carlen, W. Gangbo, Solution of a model Boltzmann equation via steepest descent in the 2-Wasserstein metric,Archive for Rational Mechanics and Analysis 172 (2004) 21–64.[16] Q. M´erigot, A multiscale approach to optimal transport, Computer Graphics Forum 30 (5) (2011) 1584–1592.[17] P.-A. Chiappori, R. McCann, L. Nesheim, Hedonic price equilibria, stable matching, and optimal transport: equiv-alence, topology, and uniqueness, Economic Theory 42 (2) (2010) 317–354.[18] P.-A. Chiappori, R. McCann, B. Pass, Multi- to one-dimensional optimal transport, to appear in Comm. Pure Appl.Math. (2016).
19] X. Dupuis, The semi-discrete principal agent problem, presented at “Computational Optimal Transportation” work-shop, July 18–22, 2016. .[20] F. de Goes, K. Breeden, V. Ostromoukhov, M. Desbrun, Blue noise through optimal transport, ACM Trans. Graph31 (6) (2012) 171:1–171:11.[21] F. Abedin, C. E. Guti´errez, An iterative method for generated Jacobian equations, preprint. (2016).[22] T. Glimm, V. Oliker, Optical design of single reflector systems and the Monge-Kantorovich mass transfer problem,J. Math Sci. 117 (3) (2003) 4096–4108.[23] V. Oliker, L. Prussner, On the numerical solution of the equation ∂ z ∂ x ∂ z ∂ y − (cid:18) ∂ z ∂ x ∂ y (cid:19) = f and its discretizations, I ,Numerische Mathematik 54 (1988) 271–293.[24] F. Aurenhammer, F. Ho ff mann, B. Aronov, Minkowski-type theorems and least-squares partitioning, in: SCG ’92:Proceedings of the eighth annual symposium on Computational geometry (SOCG92 8th Annual Symposium onComputational Geometry 1992, Berlin, Germany, June 10–12, 1992), Association for Computing Machinery, NewYork, 1992, pp. 350–357.[25] F. Aurenhammer, F. Ho ff mann, B. Aronov, Minkowski-type theorems and least-squares clustering, Algorithmica20 (1) (1998) 61–76.[26] J. Kitagawa, Q. M´erigot, B. Thibert, A Newton algorithm for semi-discrete optimal transport, arXiv:1603.05579(2017).[27] L. A. Ca ff arelli, S. A. Kochengin, V. I. Oliker, On the numerical solution of the problem of reflector design withgiven far-field scattering data, in: Monge Amp`ere equation: applications to geometry and optimization (NSF-CBMS Conference on the Monge Amp`ere Equation, Applications to Geometry and Optimization, July 9–13, 1997,Florida Atlantic University), Vol. 226 of Contemporary Mathematics, American Mathematical Society, Providence,R.I., 1999, pp. 13–32.[28] B. L´evy, A numerical algorithm for L semi-discrete optimal transport in 3D, ESAIM: Mathematical Modellingand Numerical Analysis 49 (6) (2015) 1693–1715.[29] J.-M. Mirebeau, Discretization of the 3D Monge-Amp`ere operator, between wide stencils and power diagrams,arXiv:1503.00947 (2015).[30] J.-D. Benamou, B. D. Froese, A. M. Oberman, Numerical solution of the optimal transportation problem using theMonge-Amp`ere equation, Journal of Computational Physics 260 (2014) 107–126.[31] A. Oberman, Convergent di ff erence schemes for degenerate elliptic and parabolic equations: Hamilton-Jacobiequations and free boundary problems, SIAM Journal on Numerical Analysis 44 (2006) 879–895.[32] B. Schmitzer, A sparse multi-scale algorithm for dense optimal transport, arXiv:1510.05466v2 (2016).[33] J. W. Barrett, L. Prigozhin, A mixed formulation of the Monge-Kantorovich equations, ESAIM: M2AN 41 (2007)1041–1060.[34] G. Bouchitt´e, G. Buttazzo, P. Seppecher, Shape optimization solutions via Monge-Kantorovich equation, ComptesRendus de l’Acad´emie des Sciences – Series I – Mathematics 324 (1997) 1185–1191.[35] J. Kitagawa, An iterative scheme for solving the optimal transportation problem, Calculus of Variations 51 (2014)243–263.[36] X. Ma, N. S. Trudinger, X. Wang, Regularity of potential functions of the optimal transportation function, Arch.Ration. Mech. Anal. 177 (2) (2005) 151–183.[37] P. Kov´acs, Minimum-cost flow algorithms: an experimental evaluation, Tech. rep., The Egerv´ary Research Group(2015).[38] B. D. Froese, A. M. Oberman, Convergent finite di ff erence solvers for viscosity solutions of the elliptic Monge-Amp`ere equation in dimensions two and higher, Journal of Computational Physics 260 (2014) 107–126.[39] R. Jordan, D. Kinderlehrer, F. Otto, The variational formulation of the Fokker-Planck equation, SIAM Journal onMathematical Analysis 29 (1) (1998) 1–17.[40] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, G. Peyr´e, Iterative Bregman projections for regularized trans-portation problems, SIAM Journal on Scientific Computing 37 (2) (2015) A1111–A1138.[41] D. P. Bertsekas, Network Optimization: Continuous and Discrete Models, Athena Scientific, Belmont, Mas-sachusetts, 1998, . Accessed: 2016-09-16.[42] D. P. Bertsekas, D. A. Casta˜n´on, A generic auction algorithm for the minimum cost network flow problem, Comput.Optim. Appl. 2 (1993) 229–260.[43] Q. M´erigot, A comparison of two dual methods for discrete optimal transport, in: F. Nielsen, F. Barbaresco (Eds.),GSI 2013 — Geometric Science of Information, Aug 2013, Paris, France, Vol. 8085 of Lecture Notes in ComputerScience, Springer, 1781, pp. 389–396.[44] J. Walsh III, L. Dieci, General auction method for real-valued optimal transport, preprint, http://gatech.jdwalsh03.com/index.html (2016).[45] J. Walsh III, The AUCTION ALGORITHMS IN C ++ project, computer software, https://github.com/jdwalsh03/auction .
46] F. Aurenhammer, Voronoi diagrams: a survey of a fundamental data structure, ACM Computing Surveys 23 (3)(1991) 345–405.[47] K. R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New York, 1967.[48] J. Walsh III, The boundary method and general auction for optimal mass transportation and Wasserstein distancecomputation, Ph.D. thesis, Georgia Institute of Technology (April 2017).46] F. Aurenhammer, Voronoi diagrams: a survey of a fundamental data structure, ACM Computing Surveys 23 (3)(1991) 345–405.[47] K. R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New York, 1967.[48] J. Walsh III, The boundary method and general auction for optimal mass transportation and Wasserstein distancecomputation, Ph.D. thesis, Georgia Institute of Technology (April 2017).