Constrained overdamped Langevin dynamics for symmetric multimarginal optimal transportation
CConstrained overdamped Langevin dynamics for symmetricmultimarginal optimal transportation
Aur´elien Alfonsi Rafa¨el Coyaud Virginie EhrlacherFebruary 8, 2021
Abstract
The Strictly Correlated Electrons (SCE) limit of the Levy-Lieb functional in Density FunctionalTheory (DFT) gives rise to a symmetric multi-marginal optimal transport problem with Coulombcost, where the number of marginal laws is equal to the number of electrons in the system, whichcan be very large in relevant applications. In this work, we design a numerical method, builtupon constrained overdamped Langevin processes to solve Moment Constrained Optimal Transport(MCOT) relaxations (introduced in A. Alfonsi, R. Coyaud, V. Ehrlacher and D. Lombardi, Math.Comp. 90, 2021, 689–737) of symmetric multi-marginal optimal transport problems with Coulombcost. Some minimizers of such relaxations can be written as discrete measures charging a lownumber of points belonging to a space whose dimension, in the symmetrical case, scales linearlywith the number of marginal laws. We leverage the sparsity of those minimizers in the design ofthe numerical method and prove that any local minimizer to the resulting problem is actually a global one. We illustrate the performance of the proposed method by numerical examples whichsolves MCOT relaxations of 3D systems with up to 100 electrons.
Contents d = 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Three-dimensional test cases ( d = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 a r X i v : . [ m a t h . NA ] F e b Optimal transport (OT) problems [44, 47] appear in numerous application fields such as data sci-ence [38], finance [3], economics [9, 19, 20] or physics [46]. Hence an increasing interest in developingefficient numerical methods for this types of problems among the applied mathematics community.In this article, we specifically focus on multi-marginal symmetric optimal transportation problemsarising from quantum chemistry. Density Functional Theory (DFT) [37] is one of the most populartheories in quantum chemistry in order to compute the ground state of electrons within a molecule.It is exact in principle, due to the Hohenberg-Kohn theorem, up to the knowledge of the Levy-Liebfunctional, which is unfortunately not computable in practice. Hence, a wide zoology of electronicstructure models have been developped in the chemistry community where approximations of thisLevy-Lieb functional are computed [33]. Actually, it has been recently proved [6, 7, 8, 13, 14, 17, 31]that the semi-classical limit of this Levy-Lieb functional is the solution of a symmetric multi-marginaloptimal transport problem which we state now.For all p ∈ N ∗ (where N ∗ denotes the set of positive integers { , , , . . . } ), we denote by P ( R p )the set of probability measures on R p . For d ∈ N ∗ , for all µ ∈ P ( R d ) and M ∈ N ∗ a fixed number ofmarginal laws (the number of electrons in DFT), we will denote the set of M -couplings for µ byΠ( µ ; M ) := (cid:40) π ∈ P (cid:16) ( R d ) M (cid:17) : ∀ ≤ m ≤ M, (cid:90) ( R d ) M − dπ ( x , . . . , x M ) = dµ ( x m ) (cid:41) . (1)Let c : ( R d ) M → R + ∪ { + ∞} be a M -symmetric (i.e. such that for all ( x , · · · , x M ) ∈ (cid:0) R d (cid:1) M , c ( x , . . . , x M ) = c ( x σ (1) , . . . , x σ ( M ) ) for σ ∈ S M a M -permutation) non-negative lower semi-continuous(l.s.c.) function. The function c is called hereafter the cost function . Then, the multimarginalsymmetric optimal transport problem associated to µ , M and c is defined as I ( µ ) = inf π ∈ Π( µ ; M ) (cid:90) ( R d ) M c ( x , . . . , x M ) dπ ( x , . . . , x M ) . (2)In DFT applications, the cost c is defined as the Coulomb cost c ( x , . . . , x M ) = (cid:80) m As introduced in [1], the Moment Constrained Optimal Transport (MCOT) problem is a particularcase of Generalized Moment Problem [25] which may be seen as a relaxation of optimal transportwhere the marginal constraints are alleviated and replaced by a finite number of moment constraints. i Note that we use, in this article, the term particle to designate a Dirac measure (seen in the minimization problem asa vector in R + × ( R d ) M accounting for a nonnegative weight and the coordinates of a point in ( R d ) M ), and not with thephysics meaning that encompasses electrons – the electronic density of which, in the DFT application, would correspondin this article to M times the marginal law µ . ii In the case of multimarginal martingale optimal transport, if there is no assumption of Markovian relationshipbetween the marginal laws, the scaling in the number of constraints for the approximation of the martingale constraintsmay be exponential in M . In the following, we restrain our analysis to symmetrical multimarginal optimal transport for thesake of clarity but let us mention here that the results presented here can be extended to generalmultimarginal optimal transport, as well as martingale optimal transport.Let d ∈ N ∗ , µ ∈ P ( R d ), M ∈ N ∗ and c : ( R d ) M → R + ∪ { + ∞} be a lower semi-continuoussymmetric function. The MCOT problem is a relaxation of the optimal transport problem (2) whichwe present now. Let N ∈ N ∗ and let us consider a set ( φ n ) ≤ n ≤ N ⊂ L ( R d , µ ; R ) of N continuous real-valued functions, integrable with respect to µ and called hereafter test functions . For all 1 ≤ n ≤ N ,let us denote by µ n = (cid:90) R d φ n ( x )d µ ( x ) , (3)the moments of µ , by Π( µ ; ( φ n ) ≤ n ≤ N ; M ) := (cid:40) π ∈ P (cid:16) ( R d ) M (cid:17) : (4) ∀ ≤ n ≤ N, (cid:90) ( R d ) M M (cid:88) m =1 | φ n ( x m ) | d π ( x , . . . , x M ) < ∞ , (cid:90) ( R d ) M (cid:32) M M (cid:88) m =1 φ n ( x m ) (cid:33) d π ( x , . . . , x M ) = µ n (cid:41) , the set of probability measures on ( R d ) M for which the mean of the moments against the test functionsof the marginal laws are equal to the one of µ , and byΠ S ( µ ; ( φ n ) ≤ n ≤ N ; M ) := (cid:40) π ∈ P (cid:16) ( R d ) M (cid:17) : (5) ∀ ≤ n ≤ N, (cid:90) ( R d ) M M (cid:88) m =1 | φ n ( x m ) | d π ( x , . . . , x M ) < ∞ , ∀ ≤ m ≤ M, (cid:90) ( R d ) M φ n ( x m )d π ( x , . . . , x M ) = µ n (cid:41) the set of probability measures on ( R d ) M that have, for each marginal law, the same moments as µ against the test functions.For technical reasons linked to the fact that the optimal problem is defined on the unbounded statespace R d , we assume in addition that there exists a non-decreasing non-negative continuous function θ : R + → R + satisfying θ ( r ) −−−−→ r → + ∞ + ∞ and for which there exists C > < s < ≤ n ≤ N and all x ∈ R d , | φ n ( x ) | ≤ C (1 + θ ( | x | )) s . (6)We finally choose a positive real number A > A ≥ A := (cid:82) R d θ ( | x | ) dµ ( x ).Then, the MCOT problem is defined by I N := inf π ∈ Π S ( µ ;( φ n ) ≤ n ≤ N ; M ) M (cid:82) ( R d ) M (cid:80) Mm =1 θ ( | x m | ) dπ ( x ,...,x M ) ≤ A (cid:90) ( R d ) M c ( x , . . . , x M ) dπ ( x , . . . , x M ) . (MCOT S )Under appropriate assumptions on the family of test functions ( φ n ) ≤ n ≤ N , it is proved in [1] that thevalue of I N can be made arbitrarily close to I as N , the number of test functions, goes to infinity.Besides, converging subsequences of minimizers to (MCOT S ) necessarily converge to some minimizerof (2). This is the reason why (MCOT S ) can be seen as a particular discretization approach for thenumerical approximation of Problem (2). Remark 1. It is proved in [1] that the value of I N does not depend on the value of A provided that A satisfies A ≥ A . Using the symmetry of the cost c and the marginal constraints, it can be easily checked that I N is also equal to I N = inf π ∈ Π( µ ;( φ n ) ≤ n ≤ N ; M ) M (cid:82) ( R d ) M (cid:80) Mm =1 θ ( | x m | ) dπ ( x ,...,x M ) ≤ A (cid:90) ( R d ) M c ( x , . . . , x M ) dπ ( x , . . . , x M ) . (MCOT)Then, from [1, Proposition 3.3], there exists at least one minimizer to problem (MCOT), whichcan be written as π N = K (cid:88) k =1 w k δ ( x k ,...,x kM ) , (7)for some 0 < K ≤ N + 2, with w k ≥ x km ∈ R d for all 1 ≤ m ≤ M and 1 ≤ k ≤ K . Besides, thesymmetrized measure associated to π N , which is defined by π NS := 1 M ! (cid:88) σ ∈S M K (cid:88) k =1 w k δ (cid:16) x kσ (1) ,...,x kσ ( M ) (cid:17) (8)where S M is the set of permutations of { , · · · , M } , is a minimizer to (MCOT S ).The proof of this result makes use of Tchakaloff’s theorem [2, Corollary 2], which is recalled inTheorem 4 in Section 6.1. Note that since Π( µ ; ( φ n ) ≤ n ≤ N ; M ) ⊂ Π( µ ; M ), when I is finite, it naturallyholds that I N ≤ I < ∞ .These theoretical results naturally lead us to consider an optimization problem similar to (MCOT)but where the optimization set is reduced to the set of measures of Π( µ ; ( φ n ) ≤ n ≤ N ; M ) which canbe written as discrete measures under the form (7) for some K ∈ N ∗ . This naturally leads to thefollowing optimization problem, which we call hereafter the MCOT particle problem with K particles: I NK := inf ( W,Y ) ∈U NK K (cid:88) k =1 w k c (cid:16) X k (cid:17) , (MCOT K )where U NK := (cid:40) ( W, Y ) ∈ R K + × (cid:16) ( R d ) M (cid:17) K , W = ( w k ) ≤ k ≤ K , Y = ( X k ) ≤ k ≤ K , (9) K (cid:88) k =1 w k = 1 , K (cid:88) k =1 w k ϑ ( X k ) ≤ A, ∀ ≤ n ≤ N, K (cid:88) k =1 w k ϕ n ( X k ) = µ n (cid:41) , with, for all X = ( x , · · · , x M ) ∈ ( R d ) M and all 1 ≤ n ≤ N , ϑ ( X ) := 1 M M (cid:88) m =1 θ ( | x m | )) and ϕ n ( X ) := 1 M M (cid:88) m =1 φ n ( x m ) . (10)In view of [1, Proposition 3.3], we have I NK = I N as soon as K ≥ N + 2.A few remarks are in order at this point. Remark 2. (i) Considering problem MCOT K as a starting point for a numerical scheme seemsvery appealing, especially in contexts when M is large. Indeed, in principle, the resolution of(MCOT K ) only requires the optimization of at most K ( M + 1) scalars, thus would require theresolution of an optimization problem defined on a continuous optimization set involving a numberof parameters which only scales linearly with respect to the number of marginal laws. Thus,gradient-based algorithms are natural to consider for the numerical resolution of (MCOT K ), atleast for differentiable test functions.(ii) Problem MCOT K is highly non-convex, whereas the original MCOT problem (MCOT) reads as a(high-dimensional) linear problem iii . This definitely makes the numerical resolution of (MCOT K )a challenging task. This is the reason why we consider in this article randomized versions ofgradient-based algorithms for the resolution of (MCOT K ). Nevertheless, strikingly, we provein this article that, despite the lack of convexity, any local minimizers to the MCOT particleproblem (MCOT) are actually global minimizers, provided that K ≥ N + 6 . This is the objectof Section 2.2 to state this result and further mathematical properties of the set of minimizers to(MCOT K ). The main focus of this article is to propose numerical schemes relying on stochastic versions ofgradient-based algorithms in order to find minimizers to the MCOT particle problem. Such numericalschemes actually make use of constrained overdamped Langevin processes , which are usually encoun-tered in the context of molecular dynamics simulations [28, 29]. In Section 3, we relate such stochasticprocesses with MCOT problems and entropic regularizations of the latter.In numerical tests, and especially in the 3D case, the schemes proposed in this article performbetter when using a large number of particles K , with weights w k assumed to be fixed and equal to K which are not optimized upon. That is why we introduce here the resulting optimization, calledthe MCOT fixed-weight particle problem with K particles, which reads as follows: J NK := inf Y := ( X k ) ≤ k ≤ K ∈ (( R d ) M ) K , ∀ ≤ n ≤ N, K (cid:80) Kk =1 ϕ n ( X k )= µ n , K (cid:80) Kk =1 ϑ ( X k ) ≤ A K (cid:88) k =1 K c (cid:16) X k (cid:17) . (MCOT K -fixed weight) Remark 3. (i) Let us stress on the fact that the existence of a solution to (MCOT K -fixed weight)is not guaranteed in general. This stems from the fact that there may not exist a set of points Y = (cid:0) X k (cid:1) ≤ k ≤ K satisfying the constraints of problem (MCOT K -fixed weight). However, for all N, K ∈ N ∗ , it always holds that J NK ≥ I NK .Let however consider ( W, Y ) ∈ U NN +2 a minimizer of (MCOT K ) and assume that the cost c andthe test functions φ n are bounded. Then, by rounding the weights w k to a multiple of /K , andthen by using (cid:96) copies of particles with weight (cid:96)/K , we can construct ˜ Y = (cid:16) ˜ X k (cid:17) ≤ k ≤ K such that K K (cid:88) k =1 ϕ n ( ˜ X k ) ≈ µ n + O (cid:18) K (cid:19) . Thus, ˜ Y satisfies the moment constraints of problem (MCOT K -fixed weight) up to an errorof order O (cid:0) K (cid:1) and achieves a cost that is also O (cid:0) K (cid:1) away from the optimal cost achieved by ( W, Y ) .Furthermore, in the limit K → ∞ optima of problems (MCOT K -fixed weight) (with an acceptederror O (cid:0) K (cid:1) on the constraints) converge to the optimum of the problem (MCOT) . iii More generally, any non-linear minimization problem can be reframed as a linear minimization problem in a muchlarger space (the measure space), as min x ∈ R d c ( x ) = min P (cid:82) R d c ( y )d P ( y ). (ii) Yet, in the numerical experiments in the fixed weight case in 3D, the convergence in K appearsto be faster than O (cid:0) K (cid:1) and even low values of K can give sharp approximations of the optimumof (MCOT) . The aim of this section is to present the first main theoretical result of this paper, which states somemathematical properties on the set of minimizers of the particle problem MCOT K .For any ( W, Y ) ∈ R K + × (( R d ) M ) K , we define by I ( W, Y ) := K (cid:88) k =1 w k c ( X k ) , where W := ( w k ) ≤ k ≤ K and Y := ( X k ) ≤ k ≤ K . Problem (MCOT K ) can then be equivalently rewrittenas I NK = inf ( W,Y ) ∈U NK I ( W, Y ) . (11)We begin this section by Theorem 1, which states that for any two elements of U NK , there exists acontinuous path with values in U NK which connects these two elements, and such that I monotonicallyvaries along this path. Theorem 1. Let us assume that K ≥ N + 6 . Let ( W , Y ) , ( W , Y ) ∈ U NK . Then, there existsa continuous application ψ : [0 , → U NK made of a polygonal chain such that ψ (0) = ( W , Y ) , ψ (1) = ( W , Y ) and such that the application [0 , (cid:51) t (cid:55)→ I ( ψ ( t )) is monotone. In order to explain the main ideas of the proof of Theorem 1, let us remark that, using Tchakaloff’stheorem (recalled in Section 6.1), for any measure π ∈ Π( µ ; ( φ n ) ≤ n ≤ N ; M ), satisfying, (cid:90) ( R d ) M ϑdπ ≤ A, (12)and charging K ≥ N +6 points, one can find a measure ˜ π ∈ Π( µ ; ( φ n ) ≤ n ≤ N ; M ) charging N +3 points,whose support is included in the one of π , and having the same cost and the same moment against ϑ .Then, the segment ((1 − t ) π + t ˜ π ) t ∈ [0 , is included in Π( µ ; ( φ n ) ≤ n ≤ N ; M ), charges at most 2 N +6 pointsand keeps the cost and the moment against ϑ constant. Besides, let ˜ π , ˜ π ∈ Π( µ ; ( φ n ) ≤ n ≤ N ; M ) betwo measures with support on at most N + 3 points, and such that for i = 0 , 1, ˜ π i satisfies (12).Then, the segment ((1 − t )˜ π + t ˜ π ) t ∈ [0 , is included in Π( µ ; ( φ n ) ≤ n ≤ N ; M ), satisfies the inequalityconstraint (12) for all t ∈ [0 , N + 6 points, and the cost varies linearly along it.By identifying ( W , Y ) with π (resp. ( W , Y ) with π ), one can join π to π by segments (withappropriately defined intermediate measures ˜ π and ˜ π ) satisfying the constraints, and along which thecost varies linearly. The adaptation of these ideas to vectors ( W , Y ) , ( W , Y ) ∈ U NK , which requiresto take into account the displacement of the positions between Y and Y as well as the ordering ofthe coordinates, is the object of Section 6.2.A direct consequence of Theorem 1 is then Corollary 2 which states that any local minimizer toproblem (11) (or equivalently problem MCOT K ) is actually a global minimizer as soon as K ≥ N + 6.In addition, the set of minimizers forms an polygonally connected (and thus arc-connected) set. Corollary 2. Let us assume that K ≥ N + 6 . Then, any local minimizer of the MCOT particleproblem (MCOT K ) is actually a global minimizer. Besides, the set of (local or global) minimizers ofthe MCOT particle problem (MCOT K ) is an polygonally connected subset of R K + × (( R d ) M ) K . The motivation of this section is twofold: first, the numerical method used in this article for theresolution of the particle problems (MCOT K ) and (MCOT K -fixed weight) can be seen as a time dis-cretization of constrained overdamped Langevin dynamics, which are usually encountered in moleculardynamics simulation; second, we draw here a link, on the formal level, between the long-time and largenumber of particles limit of these processes and a regularized version of the MCOT problem (MCOT)using the so-called Kullback-Leibler entropy regularization, very similar to the regularization which isat the core of the Sinkhorn algorithm for the resolution of optimal transportation problem [38].The objective of Section 3.1 is to recall some fundamental properties of general constrained over-damped Langevin processes. Then, in Section 3.2, we consider specific processes which are related tothe MCOT problem presented in Section 2. Let p ∈ N ∗ . Let us first introduce unconstrained overdamped Langevin processes in the state space R p . Let (Ω , F , P ) be a probability space. An overdamped Langevin stochastic process is a stochasticprocess ( Y t ) t ≥ solution to the following stochastic differential equation dY t = −∇ V ( Y t ) dt + β dW t , where V : R p → R is a smooth function, called hereafter the potential function of the overdampedLangevin process, β > β = √ T with ¯ T the temperature), and ( W t ) t ≥ isa p -dimensional Brownian motion. Constrained overdamped Langevin processes are overdamped Langevin processes whose trajectoryis enforced to be included into a given submanifold. In the sequel, we assume that the submanifoldis characterized as the zero isovalued set of a given smooth function Γ : R p → R q for some q ∈ N ∗ , sothat the corresponding submanifold is defined by M = { Y ∈ R p , Γ( Y ) = 0 } . We assume in the sequel that the submanifold M is arc connected. In addition, let us assume thatthere exists a neighborhood W of M such that, for all Y ∈ W , G ( Y ) := ∇ Γ( Y ) T ∇ Γ( Y ) ∈ R q × q (13)is an invertible matrix, where ∇ Γ( Y ) i,j = ∂ i Γ j for 1 ≤ i ≤ p and 1 ≤ j ≤ q . These two assumptionson the function Γ, together with the implicit function theorem, imply that M is a regular ( p − q )-dimensional submanifold.A constrained overdamped Langevin process [28, Section 3.2.3] is a R p -valued stochastic process( Y t ) t ≥ that solves the stochastic differential equation (cid:40) dY t = −∇ V ( Y t ) dt + βdW t + ∇ Γ( Y t ) d Λ t , Γ( Y t ) = 0 , (14)where β > 0, ( W t ) t ≥ is a p -dimensional Brownian process and (Λ t ) t ≥ is a q -dimensional stochasticadapted stochastic process, which ensures that Y t belongs to the submanifold M almost surely for all t ∈ R + . More precisely, Λ t is the Lagrange multiplier associated to the constraint Γ( Y t ) = 0 and isdefined by d Λ t = G − ( Y t ) ∇ Γ( Y t ) T ∇ V ( Y t ) − β (cid:80) pi =1 ∂ i Γ ( Y t )... (cid:80) pi =1 ∂ i Γ q ( Y t ) dt − β ∇ Γ( Y t ) T dW t . (15)Thus, if we define P ( y ) = Id − ∇ Γ( y ) T G − ( u ) ∇ Γ( y ) the projection operator, we get dY t = P ( Y t )[ −∇ V ( Y t ) + βdW t ] − β ∇ Γ( Y t ) T G − ( Y t ) (cid:80) pi =1 ∂ i Γ ( Y t )... (cid:80) pi =1 ∂ i Γ q ( Y t ) dt. Let us assume in addition that Z := (cid:90) R p e − V ( Y ) β dσ M ( Y ) < + ∞ , (16)where dσ M is the surface measure (induced by the Lebesgue measure in R p , see [28, Remark 3.4]for a precise definition) on the submanifold M . Let us introduce the probability measure η ∈ P ( R p )defined by dη ( Y ) := 1 Z e − V ( Y ) β | det G ( Y ) | − / dσ M ( Y ) . (17)Under suitable assumptions, [28, Proposition 3.20] states that η is the unique equilibrium distri-bution of the stochastic process Y t solution to the constrained overdamped Langevin dynamics (14)and that Y t weakly converges to η as t → + ∞ . (18) We recall here some results proved in [43, Section 2.3 and Proposition 5.1], where the authors considerthe so-called large-particle limit of constrained overdamped Langevin dynamics subject to averagemoment constraints. The objective of the work [43] was to study the properties of the constrainedoverdamped Langevin process in a large number of particles limit and to show the convergence towards η of the invariant distribution of the approximating particle system when the number of particles K → ∞ . iv More precisely, from now on, let us consider p (cid:48) = Kp for some K ∈ N ∗ . We define for any K ∈ N ∗ the potential function V K and the constraint function Γ K by: ∀ Y = ( X k ) ≤ k ≤ K ∈ ( R p ) K , V K ( Y ) := 1 K K (cid:88) k =1 V (cid:16) X k (cid:17) and Γ K ( Y ) := 1 K K (cid:88) k =1 Γ (cid:16) X k (cid:17) . We then consider the following constrained overdamped Langevin process ( Y Kt ) t ≥ that is assumed tobe solution to the stochastic differential equation (cid:40) dY Kt = −∇ V K ( Y Kt ) dt + βdW Kt + ∇ Γ K ( Y Kt ) d Λ Kt , Γ K ( Y Kt ) = 0 , (19)where ( W Kt ) t ≥ is a Kp -dimensional Brownian process and (Λ Kt ) t ≥ is a q -dimensional stochasticadapted stochastic process, which ensures that Y Kt satisfies the constraint Γ K ( Y Kt ) = 0 almost surely.The process Y K is usually called a particle system: each coordinate X k for 1 ≤ k ≤ K is seen as aparticle. The large number of particles limit consists in considering the limit as K goes to infinity ofthe stochastic process ( Y Kt ) t ≥ .It follows from (18) that, under suitable assumptions, as t goes to ∞ , the law of the process Y Kt converges to the probability measure η K ∈ P (cid:0) ( R p ) K (cid:1) defined for all Y K = ( X , · · · , X K ) ∈ ( R p ) K by dη K ( Y K ) = 1 Z K (cid:18) Π Kk =1 e − V ( Xk ) β (cid:19) dσ M K ( Y K ) , (20) iv We use here the notation K for the number of particles in view of the use of the Langevin dynamics to solve(MCOT K ) problems, from Section 3.2, for which K ≥ N + 6. Yet, the results recalled in Section 3.1.2 are general andunrelated to MCOT applications. M K := (cid:8) Y K ∈ ( R p ) K , Γ K ( Y K ) = 0 (cid:9) ,Z K := (cid:90) ( R p ) K e − V K ( Y K ) β dσ M K ( Y K ) , and G K ( Y K ) := ∇ Γ K ( Y K ) T ∇ Γ K ( Y K ) ∈ R q × q . For 1 ≤ k ≤ K , (cid:0) X kt (cid:1) t ≥ is a p -dimensional stochastic process. Let us denote by ζ Kt ∈ P ( R p ) the lawof the first particle X t . Then, the symmetry of the functions V K and Γ K implies that ζ Kt weaklyconverges in law when t → ∞ to the probability measure ζ K ∞ defined for all X ∈ R p by dζ K ∞ ( X ) = (cid:90) ( R p ) K − dη K ( X, X , · · · , X K ) . (21)Under appropriate assumptions on V and Γ which we do not detail here [43, Proposition 5.1],the sequence (cid:0) ζ K ∞ (cid:1) K ∈ N ∗ weakly converges in P ( R p ) as K goes to infinity to a probability measure π ∗ β ∈ P ( R p ) which is the unique solution to π ∗ β := arg min π ∈P ( R p ) (cid:82) R p Γ dπ =0 (cid:90) R p ln dπ ( X )( Z ∞ ) − e − V ( X ) β dX dπ ( X ) , (22)where Z ∞ := (cid:82) R p e − v ( X ) β dX . In other words, π ∗ β is thus a probability measure on R p , which isabsolutely continuous with respect to the Lebesgue measure and which is solution to π ∗ β := arg min π ∈P ( R p ) (cid:82) R p Γ dπ =0 (cid:90) R p V ( X ) dπ ( X ) + β (cid:90) R p ln (cid:18) dπ ( X ) dX (cid:19) dπ ( X ) . (23) The aim of this section is to illustrate the link between the MCOT problems presented in Section 2 andthe constrained overdamped Langevin processes introduced in Section 3.1. We start by considering the fixed weight MCOT particle problem (MCOT K -fixed weight), before considering the MCOT particleproblem with adaptive weights (MCOT). We first draw the link between constrained Langevin overdamped dynamics and the fixed weightMCOT particle problem (MCOT K -fixed weight). Then, for all K ∈ N ∗ , let us consider ( Y Kt ) t ≥ aconstrained overdamped Langevin process solution to the stochastic differential equation (19) with p = dM , q = N , V = c and Γ = ( ϕ − µ , · · · , ϕ N − µ N ) where for all 1 ≤ n ≤ N , ϕ n is definedby (10).Then, the stochastic dynamics (19) can be viewed as a randomized version of a constrained gradientnumerical method for the resolution of problem (MCOT K -fixed weight), where for all t ≥ Y Kt =( X t , · · · , X Kt ) ∈ (( R d ) M ) K and where for all 1 ≤ k ≤ K , X kt = (cid:16) x k ,t , · · · , x kM,t (cid:17) ∈ (cid:16) R d (cid:17) M . V and Γ satisfy the regularity assumptions which ensurethe convergence results stated in Section 3.1.2 to hold true. But, formally, assuming that the long-time limit and large number of particles convergence holds nevertheless, the associated measure π ∗ β solution (23) can be equivalently rewritten as π ∗ β := arg min π ∈P ( ( R d ) M ) ∀ ≤ n ≤ N, (cid:82) ( R d ) M ϕ n dπ = µ n J ( π ) , (24)where J ( π ) := (cid:90) ( R d ) M c ( X ) dπ ( X ) + β (cid:90) ( R d ) M ln (cid:18) dπ ( X ) dX (cid:19) dπ ( X ) . Recall that π ∗ β is the large number of particles limit of the long-time limit of the law of one particleassociated to the constrained overdamped Langevin process. Notice that π ∗ β can be equivalentlyseen as the solution of an entropic regularization of the MCOT problem (MCOT), where the term (cid:82) ( R d ) M ln (cid:16) dπ ( X ) dX (cid:17) dπ ( X ) can be identified as the Kullback-Leibler entropy of the measure π withrespect to the Lebesgue measure. Thus, Problem 24 is close to the entropic regularization of optimaltransport problems used in several works [4, 16, 35, 38], in particular for the so-called Sinkhornalgorithm [4].Let us point out here that, at least on the formal level, we expect the family ( π ∗ β ) β> to weaklyconverge to a minimizer of (MCOT) as β goes to 0 (a similar result is proven in [10, Theorem 2.7]). A similar link can be drawn between constrained Langevin overdamped dynamics and the MCOTparticle problem (MCOT K ) with adaptive weights.In order to fit in the framework of the constrained Langevin overdamped dynamics, without anypositivity constraint, let us introduce a continuous surjective function f : R → R + , which we callhereafter a weight function . We assume that f satisfies the following assumption: there exists aninterval I ⊂ R such that the Lebesgue measure of I is equal to 1 and such that (cid:82) I f = 1. A simplechoice of admissible weight function can be given by f ( a ) = a for all a ∈ R with I = ( , / ).Then, for all K ∈ N ∗ , let us consider (cid:16) Y Kt (cid:17) t ≥ a constrained overdamped Langevin process solutionto the stochastic differential equation (19) with p = dM + 1, q = N + 1, and where for all X = ( a, X ) ∈ R × ( R d ) M , V ( X ) = f ( a ) c ( X ) and Γ( X ) = ( f ( a ) − , f ( a ) ϕ ( X ) − µ , · · · , f ( a ) ϕ N ( X ) − µ N ). Then,the stochastic dynamics (19) can be viewed as a randomized version of a constrained gradient numericalmethod for the resolution of the optimization probleminf ( A,Y ) ∈V NK K (cid:88) k =1 f ( a k ) c (cid:16) X k (cid:17) , (25)where V NK := (cid:40) ( A, Y ) ∈ R K × (cid:16) ( R d ) M (cid:17) K , A = ( a k ) ≤ k ≤ K , Y = ( X k ) ≤ k ≤ K , (26) K (cid:88) k =1 f ( a k ) = 1 , K (cid:88) k =1 f ( a k ) ϑ ( X k ) ≤ A, ∀ ≤ n ≤ N, K (cid:88) k =1 f ( a k ) ϕ n ( X k ) = µ n (cid:41) , which is equivalent to problem (MCOT K ) using the surjectivity of f .Note that the choice of the function f can influence the dynamics as it regulates both the waythe brownian motion W affects the weights, and the balance, in the minimization of V and in theenforcement of the constraint Γ( X ) = 0 N , between a displacement of particles and a change in weights.2Here again, it is not clear in general that V and Γ satisfies the regularity assumptions which ensuresthe convergence results stated in Section 3.1.2 to hold true. But, using formal computations, we canconsider the associated measure π a β ∈ P (cid:0) R × ( R d ) M (cid:1) solution to π a β := arg min π ∈P ( R × ( R d ) M ) (cid:82) a ∈ R (cid:82) X ∈ ( R d ) M f ( a ) dπ ( a,X )=1 ∀ ≤ n ≤ N, (cid:82) a ∈ R (cid:82) X ∈ ( R d ) M f ( a ) ϕ n ( X ) dπ ( a,X )= µ n J ( π ) , (27)where J ( π ) := (cid:90) a ∈ R (cid:90) X ∈ ( R d ) M f ( a ) c ( X ) dπ ( a, X ) + β (cid:90) a ∈ R (cid:90) X ∈ ( R d ) M ln (cid:18) dπ ( a, X ) dadX (cid:19) dπ ( a, X ) . Let us introduce now π a β ∈ P (cid:0) ( R d ) M (cid:1) defined by dπ a β ( X ) = (cid:90) a ∈ R f ( a ) dπ a β ( a, X ) . Then π a β satisfies the constraints of problem (24) and J ( π a β ) = (cid:90) X ∈ ( R d ) M c ( X ) dπ a β ( X ) + β (cid:90) a ∈ R (cid:90) X ∈ ( R d ) M ln (cid:32) dπ a β ( a, X ) dadX (cid:33) dπ a β ( a, X ) . Notice that, as a consequence, problem (27) may be seen as a second kind of entropic regularizationof (MCOT) and that π a β is expected to be an approximation of some minimizer to (MCOT) as β goesto 0.Let us notice here that the assumption made on f ensures that, for all π ∈ P (cid:0) ( R d ) M (cid:1) , there existsa probability measure π ∈ P (cid:0) R × ( R d ) M (cid:1) such that dπ ( X ) = (cid:90) a ∈ R f ( a ) dπ ( a, X ) . Indeed, defining dπ ( a, X ) := I ( a ) da ⊗ dπ ( X ) yields the desired result. Besides, we easily check that J ( π ) = J ( π ), which leads immediately to J ( π a β ) ≤ J ( π ∗ β ) from the optimality of π a β . We present in this section the numerical procedure we use in our numerical tests to compute ap-proximate solutions to the particle problems with fixed weights (MCOT K -fixed weight) or adaptiveweights (MCOT K ) for a fixed given K ∈ N ∗ . Note that (MCOT K -fixed weight) can be equivalentlyrewritten as J NK := inf Y K ∈ (( R d ) M ) K , Γ K ( Y K )=0 , Θ K ( Y K ) ≤ A V K ( Y K ) , (28)where V K and Γ K are defined in Section 3.2.1, and whereΘ K : (cid:26) (( R d ) M ) K (cid:55)→ R Y := ( X , · · · , X K ) → K (cid:80) Kk =1 ϑ ( X k ) . Besides, problem (MCOT K ) can be rewritten equivalently as I NK := inf Y K ∈ ( R × (( R d ) M ) ) K Γ K ( Y K )=0 , Θ K ( Y K ) ≤ A V K ( Y K ) , (29)3where V K and Γ K are defined in Section 3.2.2, and whereΘ K : (cid:26) ( R × ( R d ) M ) K (cid:55)→ R Y := (( a , X ) , · · · , ( a K , X K )) → K (cid:80) Kk =1 f ( a k ) ϑ ( X k ) . For the sake of simplicity, we restrict the presentation here to the method used for the resolutionof (28), since the method used for the resolution of (29) follows exactly the same lines. The numerical procedure considered in this paper consists in a time discretization of the dynamics (14)with an adaptive time step and noise level. The main idea of the algorithm is the following: let ( W n ) n ∈ N be a sequence of iid normal vectors of dimension dM K . At each iteration n ∈ N ∗ of the procedure,starting from an initial guess Y K ∈ M K for n = 0, a new approximation Y Kn +1 ∈ M K is computedas the projection in some sense of Y Kn +1 / := Y Kn − ∇ V K ( Y Kn )∆ t n + β n √ ∆ t n W n onto M K , where∆ t n > β n > n . Precisely, the next iterate Y Kn +1 is computed as Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ Kn +1 where Λ Kn +1 ∈ R N is a Lagrange multiplier which ensuresthat the constraint Γ K ( Y Kn +1 ) = 0 is satisfied.The complete resulting procedure is summarized in Algorithm 1. Algorithm 1 Constrained Overdamped Langevin AlgorithmInput Y K ∈ M K , ∆ t > β > τ > i const ∈ N ∗ , i max ∈ N ∗ , NoiseDecrease : R + × N → R + , n max ∈ N ∗ Fix n = 0, Λ K = 0.Define ( W n ) n ∈ N a sequence of i.i.d. normal vectors of the same dimension as Y K . while n ≤ n max do AdaptTimeStep( Y Kn , Λ Kn , ∆ t n , β n , τ n ) Y Kn +1 / := Y Kn − ∇ V K ( Y Kn )∆ t n + β n √ ∆ t n W n if Projection( Y Kn +1 / , ∇ Γ K ( Y Kn ) , Λ Kn , i max ) succeeds then Y Kn +1 , Λ Kn +1 , i n ← Projection( Y Kn +1 / , ∇ Γ K ( Y Kn ) , Λ Kn , i max ) if i n ≤ i const then τ n +1 ← τ n end if β n +1 ← NoiseDecrease( β n , n )∆ t n +1 ← ∆ t n ; τ n +1 ← τ n n ← n + 1 else τ n ← τ n / end ifend whilereturn min( V K ( Y Kn ) , ≤ n ≤ n max )We discuss here three main difficulties about the algorithm we propose: • the initialization step which consists in finding an element Y K ∈ M K ; • the choice of the values of the time step ∆ t n and noise level β n at each iteration of the algorithm; • the practical method used in order to compute a projection of Y Kn +1 / onto the submanifold M K ,and in particular the value of the Lagrange multiplier Λ Kn +1 .The procedure chosen to adapt the time step and noise level is discussed in Section 4.2. Thealgorithm used to compute a projection of Y Kn +1 / onto the submanifold M K and the value of the4Lagrange multiplier Λ Kn +1 is detailed in Section 4.3. Finally, the initialization procedure used tocompute a starting guess Y K ∈ M K is exaplined in Section 4.4. Two remarks are in order to motivate the procedure we propose here:(i) the computation of the Lagrange multiplier Λ Kn +1 at each iteration n of the algorithm and of theresulting value of Y Kn +1 must be fast (as it is executed at each step).(ii) the time-step ∆ t n must be:(a) small enough for the procedure that computes the Lagrange multiplier to be well-defined,(b) large enough for the total number of iterations needed to observe convergence to be reason-able. In practice, n max was chosen to be of the order of 20000 in the numerical experimentspresented in Section 5.To address item (i), we use a Newton method similar to the one proposed in [29, 30] to enforcethe constraints and compute the Lagrange multiplier Λ Kn +1 which is summarized in Algorithm 3 anddetailed in Section 4.3. This method is observed to converge fast if the value Y Kn +1 / is close enoughto the submanifold M K . The tolerance threshold allowed at each step on the satisfiability of theconstraints is given by τ n > 0, the value of which is also adapted at each step. Its precise value isinferred as follows: if the Newton method converges fast enough (i.e. if the number of iterations neededto ensure convergence i n is lower than some fixed value i const ), then the value of τ n is multiplied by2. On the other hand, if the Newton method does not converge in a maximum number of iterations(given by i max ), then τ n is divided by 2. This step may involve a new time-step computation foriteration n , which we detail below.The time-step ∆ t n is adapted (in order to answer item (ii)) through the AdaptTimeStep subpro-cedure (Algorithm 2). It is increased at each step n if the constraints are satified up to a tolerancethreshold lower than τ n (in order to answer (iib)). Otherwise, the time-step is divided by 2 as manytimes as needed for Y Kn +1 / to satisfy the constraints defining the submanifold up to a tolerance lowerthan τ n (in order to satisfy item (iia)).Moreover, the noise-level β n is decreased at each iteration n at a rate inspired from Robbins-Siegmund Lemma [36, Theorem 6.1] for non-constrained stochastic gradient optimization, using theNoiseDecrease function in Algorithm 1. This is managed through the NoiseDecrease function. In thenumerical experiments presented in Section 5, we used two possible choices of NoiseDecrease functiondefined respectively by ( β, n ) (cid:55)→ β (noise level unchanged) and ( β, n ) (cid:55)→ (cid:113) nn +1 β (slow decrease of thenoise level: note that this is the relative decrease, so that after n steps, the noise is β / √ n ). Algorithm 2 AdaptTimeStep subprocedureInput: Y K , Λ, ∆ t , β , τ , n if (cid:107) Γ K ( Y K − ∇ V K ( Y K )2∆ t + W n √ tβ ) (cid:107) ≤ τ then ∆ t ← t ; elsewhile (cid:107) Γ K ( Y K − ∇ V K ( Y K )∆ t + W n √ tβ ) (cid:107) ≥ τ do ∆ t ← ∆ t/ 2; Λ ← Λ / end whileend if As mentioned earlier, to compute Y Kn +1 ∈ M K and Λ Kn +1 from Y Kn +1 / , we use a Newton method similarto the one proposed in [29, 30]. We refer the reader to [30, Section 2.2.2] for theoretical considerationson such projections.More precisely, the procedure reads as follows: given Y Kn , Y Kn +1 / ∈ (( R d ) M ) K , the aim of theNewton procedure is to find a solution Λ Kn +1 ∈ R N to the equationΓ K (cid:16) Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ Kn +1 (cid:17) = 0 . We numerically observe that this Newton procedure only converges in cases when Y Kn +1 / and Y Kn areclose enough to the manifold M K . Provided that Y Kn belongs to M K , Y Kn +1 / can be made arbitrarilyclose to the submanifold provided that the value of the time step ∆ t n is chosen small enough. We alsorefer the reader to [40, Theorem 1.4.1] for theoretical conditions which guarantee the convergence ofthis Newton procedure.This projection procedure, together with the routine for the adaptation of the error tolerance τ n onthe satisfiability of the constraints, is summarized in Algorithm 3. Note that this Newton algorithmrequires the inversion of matrices of the form ∇ Γ K ( Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ) T · ∇ Γ( Y Kn )for Λ ∈ R N and that we cannot theoretically guarantee the invertibility of this matrix in general. Inpractice, it naturally depends significantly on the choice of test functions ( φ n ) ≤ n ≤ N . Algorithm 3 Projection subprocedure (Newton method)Input: Y Kn +1 / , ∇ Γ K ( Y Kn ), Λ Kn , i max i = 0, Λ (cid:48) ← Λ n while (cid:107) Γ K ( Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ (cid:48) i ) (cid:107) > − and i ≤ i max do Λ (cid:48) i +1 ← Λ (cid:48) i − (cid:16) ∇ Γ K ( Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ (cid:48) i ) T · ∇ Γ( Y Kn ) (cid:17) − · Γ K (( Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ (cid:48) i ) i ← i + 1 end whileif (cid:107) Γ K ( Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ (cid:48) i ) (cid:107) ≤ − thenreturn Y Kn +1 / + ∇ Γ K ( Y Kn ) · Λ (cid:48) i , Λ (cid:48) i , i elsereturn Projection failure. end if Algorithm 1 is initialized with an initial guess Y K which is assumed to belong to the constraintssubmanifold M K . In practice, finding an element which belongs to this submanifold is a delicatetask, especially when the number of test functions is large. Indeed, as mentioned in the precedingsection, the Newton procedure described in Section 4.3 only converges if the starting point of thealgorithm is sufficiently close to the manifold M K . This is the reason why this initialization step israther performed using a method inspired from [49, Section 5 example 3]. A Runge-Kutta 3 (Bogacki-Shampine) numerical scheme [41, (5.8-42)] is used in order to discretize the dynamics ddt Y K ( t ) = F ( Y K ( t ))6starting from a random initial state Y K ( t = 0) = Y K, ∈ (cid:0) ( R d ) M (cid:1) K , where F is defined as ∀ Y K ∈ (cid:16) ( R d ) M (cid:17) K , F ( Y K ) = − (cid:13)(cid:13) Γ K (cid:0) Y K (cid:1)(cid:13)(cid:13) ∇ Γ K ( Y K ) · Γ K ( Y K ) (cid:107)∇ Γ K ( Y K ) · Γ K ( Y K ) (cid:107) . (30)We observe that such a numerical procedure is more robust than a Newton algorithm, even if it canconverge very slowly.Let us mention here that, in the case of the particle problem (29) with adaptive weights, an addi-tional step may be used prior to such a Runge-Kutta method, which consists in using a Carath´eodory-Tchakaloff subsampling procedure. Carath´eodory-Tchakaloff subsampling [39, 45] has been introducedto compute low nodes cardinality cubatures.In our context, this method can be adapted to find a low nodes cardinality starting point, as closeas possible to the constraints submanifold M K . More precisely, the method works as follows: wefix a value K ∞ (cid:29) K and compute ( X , · · · , X K ∞ ) iid samples of random vectors according to theprobability law µ . A Non-Negative Least Squares (NNLS) is then used to find a sparse solution to theoptimization problem w ∗ ∈ arg min w ∈ R K ∞ + (cid:107) Φ w − ¯ µ (cid:107) , (31)where Φ := (Φ n,k ) ≤ n ≤ N +1 , ≤ k ≤ K ∞ ∈ R N × K ∞ , µ = ( µ , · · · , µ N , ∈ R N +1 and ∀ ≤ k ≤ K ∞ , ∀ ≤ n ≤ N, Φ n,k = ϕ n ( X k ) and Φ N +1 ,k = 1 . By Kuhn-Tucker conditions for the NNLS problem [26, Theorem (23.4)], there exists a solution w ∗ := ( w ∗ k ) ≤ k ≤ K ∞ ∈ R K ∞ + to (31) such that J ≤ N + 1 with J := { ≤ k ≤ K ∞ , w ∗ k > } . Commonalgorithms such as the Lawson-Hanson method [26, Theorem (23.10)] enable to compute such a sparsesolution. Let us point out that any solution w ∗ to (31) then satisfies N (cid:88) n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) k ∈ J w ∗ k ϕ n ( X k ) − ¯ µ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N (cid:88) n =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K ∞ K ∞ (cid:88) k =1 ϕ n ( X k ) − ¯ µ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (32)In practice, in the case when J ≤ K , the positions and weights returned by the Carath´eodory-Tchakaloff Subsampling procedure are subdivided and randomly perturbed with a small amount ofnoise.7 The aim of this section is to illustrate the results obtained via the numerical procedure described inSection 4 for the resolution of the particle problems (29) and (28) in different test cases.Section 5.1 is devoted to results obtained in cases where d = 1 and Section 5.2 contains numericalresults obtained in examples where d = 3. The experiments presented in this section have beenimplemented in python 3 using scipy and numpy modules, and tested on a server with an Intel Xeonprocessor with 32 cores (hyperthreaded) and 192 Go RAM. d = 1 ) In the case where d = 1, the solution to the optimal transport problem (2) is analytically known inthe case when c is a symmetric repulsive cost from [12, Theorem 1.1]. For sake of completeness, werecall their result for the cost function that we consider in our numerical experiments. Theorem 3 (Colombo, De Pascale, Di Marino, 2015) . Let (cid:15) ≥ and c : R M → [0 , + ∞ ] be the costdefined by ∀ x , · · · , x M ∈ R , c ( x , . . . , x M ) = (cid:88) ≤ i,j ≤ M,i (cid:54) = j (cid:15) + | x i − x j | . (33) Let µ be an non atomic probability measure on R such that min π ∈ Π( µ ; M ) (cid:90) R M c ( x , . . . , x M ) dπ ( x , . . . , x M ) < + ∞ . (34) Let −∞ = d < d < · · · < d M = + ∞ be such that µ ([ d i , d i +1 ]) = 1 M , i = 0 , . . . , M − . (35) Let T : R → R be the unique (up to µ -null sets) function increasing on each interval [ d i , d i +1 ] , i = 0 , . . . , M − and such that T [ d i ,d i +1 ] µ ) = [ d i +1 ,d i +2 ] µ, i = 0 , . . . , M − T [ d M − ,d M ] µ ) = [ d ,d ] µ. (36) Then T is an admissible map for inf T : R → R Borel , T µ = µ, T ( M ) =Id (cid:90) R c ( x, T ( x ) , . . . , T ( M − ( x )) dµ ( x ) , (37) where T ( i ) = i times (cid:122) (cid:125)(cid:124) (cid:123) T ◦ · · · ◦ T .Moreover, the only symmetric optimal transport plan is the symmetrization of the plan induced bythe map T . We make use of Theorem 3 to compare the exact solution of problem (2) together with the approx-imation given by the numerical procedure described in Section 4 to solve the MCOT particle problemswith fixed or adaptive weights.8 (1) µ (2) µ (3) µ Figure 1: Densities of the marginal laws tested for 1D numerical tests. The numerical experiments in this section were realized with three different marginallaws, which are respectively denoted by µ , µ and µ and defined by dµ ( x ) := 12 [ − , ( x ) dx, (38) dµ ( x ) := (cid:20) π 10 cos (cid:18) π x (cid:19) + 0 . (cid:21) [ − , ( x ) dx, (39) dµ ( x ) := (cid:20) . π cos (cid:18) π x (cid:19) + 0 . (cid:21) [ − , ( x ) dx. (40)The densities of µ , µ , µ are plotted in Figure 1. Test functions. The test functions ( φ n ) ≤ n ≤ N used are Legendre Polynomials with the followingscaling φ n = (cid:113) n + n + 1 P n , (41)where P n is the Legendre Polynomial of degree n . As the marginal laws considered have their supportin [ − , ∇ Γ( X ) is related to a Vandermonde matrix, the invertibility of which (crucialto enforce the constraints by Algorithm 3 or the Runge-Kutta method) is ensured as long as particlesare spread on more than N locations. Cost. We use in all experiments the regularized Coulomb cost function (33) with (cid:15) = 10 − . Weight functions. Two different choices of weight functions f are studied in the numerical ex-periments presented below: the squared weight function f : R (cid:51) a (cid:55)→ a and the exponential weightfunction f : R (cid:51) a (cid:55)→ e − a . Although we do not have strong criteria to chose a weight function, theintuition behind the squared weight function is that it can behave well regarding the enforcement ofthe constraints by a Newton method, given that Γ is then a polynomial. The intuition behind theexponential weight function is that it could slow down the cancellation of the weights of the particles,keeping alive more degrees of freedom for the optimization process. The aim of Figure 2 is to plot the decrease of (cid:107) Γ K ( Y Km ) (cid:107) ∞ as a function of the number of iterationsof the Runge-Kutta 3 method presented in Section 4.4, in a test case where M = 5. We numericallyobserve here that, as expected, as N increases, the number of iterations needed by the Runge-Kutta9 number of iterations − − − − − − − c o n s t r a i n t ss a t i s fi a b ili t y (1) µ number of iterations c o n s t r a i n t s s a t i s f i a b ili t y (2) µ Figure 2: Evolution of (cid:107) Γ K ( Y Km ) (cid:107) ∞ for different weight functions as a function of the number ofiterations m of the Runge-Kutta 3 procedure. Tests were performed with M = 5, K = 10000.Blue curves uses fixed weights, orange curves uses an exponential weight function and green curves asquared weight function. No marker is for N = 10, a diamond marker for N = 20 and a “+” markerfor N = 40. Caratheodory-Tchakaloff subsampling gave initial values of 1 . × − ( 3 . × − ,3 . × − ) for µ , N = 10 (resp. N = 20, N = 40) and 3 . × − ( 1 . × − , 4 . × − )for µ , N = 10 (resp. N = 20, N = 40).procedure to reach convergence increases.Besides, we observe that the additional degrees of freedomof the cases using weight functions allow a faster initial optimization – yet not heavily pronounced,as well as an initialization slightly faster for the squared weight function compared to the exponentialone. The aim of Figures 3, 4 and 5 is to plot the evolution of V K ( Y Kn ) (or V K ( Y Kn )) as a function of n the number of iterations of the constrained overdamped Langevin algorithm presented in Section 4 forvarious values of N , various weight functions, values of β and NoiseDecrease functions, and using ornot a subsampling at initialization. We observe in Figure 3 that decreasing the noise as the squarerootof the number of iterations n converges faster than keeping it constant, and that keeping β = 0 isthe fastest. In Figure 4 we remark that the higher N the slower the optimization (with the particularcase of µ , N = 20 with the squared weight function which does not converge in 20000 iterations),and that cases initialized by Caratheodory-Tchakaloff subsampling tend to start with a higher cost.In Figure 5, we observe that with K = 10000 particles, considering fixed or variable weights doesnot strongly change the speed of convergence (but for the case µ , N = 20 with the squared weightfunction mentioned above). However, using variable weights with K = 100 particles seems to be thefastest set of parameters.0 number of iterations c o s t (1) µ , fixed weights number of iterations c o s t (2) µ , squared weight function number of iterations c o s t (3) µ , fixed weights number of iterations c o s t (4) µ , squared weight function Figure 3: Evolution of the cost as a function of the number of iterations n for various weight functionsand values of β , for µ and µ . Tests were performed with M = 5, N = 20, K = 10000 and∆ t = 10 − . Blue curves are for β = 10 − . , orange curves for 10 − . , green curves for 10 − . andpurple curves for β = 0. Solid lines have a decrease of the noise in the squareroot of time whereasdotted lines with a “+” marker have no decrease of the noise.1 number of iterations c o s t (1) µ , fixed weights number of iterations c o s t (2) µ , squared weight function number of iterations c o s t (3) µ , fixed weights number of iterations c o s t (4) µ , squared weight function Figure 4: Evolution of the cost as a function of the number of iterations n for various weight functionsand values of N , for µ and µ . Tests were performed with M = 5, β = 0, K = 10000 and ∆ t = 10 − .Blue curves for N = 10, green curves for N = 20 and red curves for N = 40. Dotted lines correspondto tests initialized by Caratheodory-Tchakaloff subsampling whereas tests solid lines correspond totests initialized by Runge-Kutta 3 method.2 number of iterations c o s t (1) µ , number of iterations c o s t (2) µ , Figure 5: Evolution of the cost as a function of the number of iterations n for various weight functions,for µ and µ . Tests were performed with M = 5, N = 20, β = 0 and ∆ t = 10 − . Blue curves usesfixed weights, orange curves uses an exponential weight function and green curves a squared weightfunction. K = 10000 particles for solid lines and K = 100 particles for dotted lines. Optimizationfollowing a Caratheodory-Tchakaloff subsampling at initialization uses “+” markers. . × . × . × . × . × . × × weight function: constant weight function: squared weight function: exponential K = 10000K = 1000K = 100K = 10000 and initialisation by subsamplingNoiseDecrase constantNoiseDecrase squareroot . × . × . × . × . × . × . × . × . × c o s t − − − − − − − − . × . × . × . × . × . × − − − − − − − − β − − − − − − − − Figure 6: Lowest cost value reached during optimization by the constrained overdamped Langevinalgorithm in function of the β , for various weight functions, values of K and choices of NoiseDecreasefunctions. The purple line corresponds to the optimal transport cost. The marginal law is µ , N = 20, M = 5, ∆ t = 10 − .3 The goal of Figure 6 is to compare the minimal values of the cost obtained by the algorithm fordifferent parameters together with its analytic value. We observe that considering adaptive weightsenable to reach lower optimal costs than with fixed weights, but the relative difference between theapproximate minimal cost values is lower than 0.1%. When the noise level decreases in the square rootof the number of iterations a lower optimal cost can be reached compared to a constant noise level. Inthe variable weights cases, the lower K the lower the optimal cost, but when the optimization startswith a Tchakaloff subsampling solution, for which the lowest cost reached is 0.3% higher than withthe Runge-Kutta 3 method. The aim of Figures 7, 8 and 9 is to plot the positions of the particles obtained by the numericalprocedure presented in Section 4 for respectively µ , µ and µ and different values of K , N , β ,initialization methods, and in fixed and variable weights cases. We numerically observe that theobtained particles are located close to the support of the exact optimal transport plan, and that thehigher the value of N the more precise the approximation of this transport map is [1, Theorem 4.1].Also, when K = 10000 and even more when β = 10 . , particles are more spreaded around thetransport map. (1) N = 10 (2) N = 20 (3) N = 40 Figure 7: Optimal transport with µ and M = 5, ∆ t = 10 − . In each plot, on the main graph M ( M − (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 w k δ x km ,x km (cid:48) is represented by blue particles. The darker the heavier the particle.Particles have some transparency which allows to see more clearly areas of high concentration. Redcurves represent the functions T i for i ∈ { , . . . , M − } defined in Theorem 3. The higher the densitythe darker. On side graphs are represented in blue a weighted histogram of the particles, in red themarginal law and in green a normal kernel density estimate based on the weighted particles (with abandwidth rule based on Scott’s rule with d = 0).4 (1) N = 20, β = 0, K =10000, fixed weights (2) N = 20, β = 10 − . , K = 10000, fixed weights (3) N = 40, β = 0, K =10000, fixed weights (4) N = 40, β = 10 − . , K = 10000, fixed weights(5) N = 20, β = 0, K =10000, squared weights (6) N = 20, β = 10 − . , K = 10000, squaredweights (7) N = 40, β = 0, K =10000, squared weights (8) N = 40, β = 10 − . , K = 10000, squaredweights(9) N = 20, β = 0, K =100, squared weights (10) N = 20, β = 10 − . , K = 100, squared weights (11) N = 40, β = 0, K =100, squared weights (12) N = 40, β = 10 − . , K = 100, squared weights(13) N = 20, β =0, K = 10000, squaredweights, with initial sub-sampling (14) N = 20, β = 10 − . , K = 10000, squaredweights, with initial sub-sampling (15) N = 40, β =0, K = 10000, squaredweights, with initial sub-sampling (16) N = 40, β = 10 − . , K = 10000, squaredweights, with initial sub-sampling Figure 8: Optimal transport with µ and M = 5, ∆ t = 10 − . In each plot, on the main graph M ( M − (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 w k δ x km ,x km (cid:48) is represented by blue particles. The darker the heavier the particle.Particles have some transparency which allows to see more clearly areas of high concentration. Redcurves represent the functions T i for i ∈ { , . . . , M − } defined in Theorem 3. The higher the densitythe darker. On side graphs are represented in blue a weighted histogram of the particles, in red themarginal law and in green a normal kernel density estimate based on the weighted particles (with abandwidth rule based on Scott’s rule with d = 0).5 (1) N = 20, β = 0, K =10000, fixed (2) N = 20, β = 10 − . , K = 10000, fixed (3) N = 40, β = 0, K =10000, fixed (4) N = 40, β = 10 − . , K = 10000, fixed(5) N = 20, β = 0, K =10000, squared (6) N = 20, β = 10 − . , K = 10000, squared (7) N = 40, β = 0, K =10000, squared (8) N = 40, β = 10 − . , K = 10000, squared(9) N = 20, β = 0, K =100, squared (10) N = 20, β = 10 − . , K = 100, squared (11) N = 40, β = 0, K =100, squared (12) N = 40, β = 10 − . , K = 100, squared(13) N = 20, β = 0, K = 10000, squared, withinitial subsampling (14) N = 20, β = 10 − . , K = 10000, squared, withinitial subsampling (15) N = 40, β = 0, K = 10000, squared, withinitial subsampling (16) N = 40, β = 10 − . , K = 10000, squared, withinitial subsampling Figure 9: Optimal transport with µ and M = 5, ∆ t = 10 − . In each plot, on the main graph M ( M − (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 w k δ x km ,x km (cid:48) is represented by blue particles. The darker the heavier the particle.Particles have some transparency which allows to see more clearly areas of high concentration. Redcurves represent the functions T i for i ∈ { , . . . , M − } defined in Theorem 3. The higher the densitythe darker. On side graphs are represented in blue a weighted histogram of the particles, in red themarginal law and in green a normal kernel density estimate based on the weighted particles (with abandwidth rule based on Scott’s rule with d = 0).6 d = 3 ) The numerical experiments were realized with four different marginal laws that are named afterwardsas follows: µ ∼ N (0 , Id ) , (42) µ ∼ N (0 , . . . . . 75 1 . ) + 13 N ( , . . . . . 22 1 . ) , (43) µ ∼ N (0 , C ) + 15 N ( , C ) + 15 N ( , C ) + 15 N ( , C )+ 15 N ( , C ) + 110 N ( , C ) , with C = . . . . . 75 1 . , (44) µ ∼ U ( B (0 , . (45)And, for i = 1 , , , 4, using as test functions v tensor products of 1D orthonormal polynomials ( P µ i ,jl ) ≤ j ≤ ,l ∈ N ,defined as, for j = 1 , , l ∈ N ,degree (cid:16) P µ i ,jl (cid:17) = l, ∀ l (cid:48) < l, (cid:90) R P µ i ,jl ( x j ) P µ i ,jl (cid:48) ( x j ) dµ i ( x , x , x ) = 1( l + 1) δ l,l (cid:48) . (46)As for a finite number of multivariate polynomials (and under a suitable control of mixed deriva-tives), the hyperbolic cross [15] seems to behave better than using all polynomials up to a given degree,we used, for a number of test functions N appropriately chosen, the polynomials P µ i , l ⊗ P µ i , l ⊗ P µ i , l ,where ( l + 1)( l + 1)( l + 1) ≤ L N , (47)where L N is defined such that { ( l , l , l ) | ( l + 1)( l + 1)( l + 1) ≤ L N } = N . The map betweenmaximum degree of the polynomials ( L N − 1) and N is shown in Table 1. L N − N 28 38 44 53 56 74Table 1: Map between the maximum degree of 1D polynomials and the number of test functions usinghyperbolic cross in 3D.In the numerical examples presented afterwards, as all weights are fixed to K , there is no need touse the polynomial of degrees ( l , l , l ) = (0 , , N decreased by 1 compared to thevalues of Table 1. Remark 4. One of the main advantages of using sums of Normal functions (or a uniform measureon a ball) as marginal laws and polynomials as test functions is that their exists in that case closeformulas for the computation of the moments. From our experiments in dimension 1, the precisionof the computation of the moments is important both for the solution of the MCOT problem to bewell-defined (and thus for the algorithm to converge) – numerically computed moments, though not v These polynomials were chosen after a few numerical tests on some optimization procedures for their better conver-gence properties than the polynomials they were compared to. Their tensorised form both eases the computation of themoments and allows some parallelisation. Note also that the matrix ∇ Γ K ( Y K ) is a multivariate Vandermonde matrix.We checked numerically its invertibility throughout the optimization process. exact, must allow the existence of Y K ∈ (( R d ) M ) K such that (cid:107) Γ K ( Y K ) (cid:107) ∞ ≤ (cid:15) for (cid:15) the machine-precision; and for the convergence as N increases of the MCOT cost towards the OT cost – numericallycomputed moments not precise enough might hide this convergence. Numerical quadratures in 3Dcould be implemented for dealing with more general marginal laws and test functions, however, theircomputation and convergence speed put it beyond the scope of this article. Mean-Covariance. Tests were also performed using as test functions the mean and covariancematrix for µ and µ , in order to notice on examples how much those test functions do constrain anoptimal transport problem. Note that this problem of optimal transport when the mean and covariancestructure are given may be interesting per se, when only partial information on the distribution isknown. We have indicated in Table 2 the optimal costs obtained with our algorithm for µ and µ with mean-covariance constraints ( N = 9) and with many moment constraints ( N = 52). We observeon our examples a relative difference around 15-20%. µ , M = 10 µ , M = 100 µ , M = 10 µ , M = 100 N = 9 10.65 1395 8.007 1074 N = 52 12.50 1599 9.107 1201Table 2: Optimal value of the cost obtained for µ and µ with mean-covariance constraints ( N = 9)and with many moment constraints ( N = 52). Cost. In order to avoid too high values of the cost function, we used in all experiments a regularizedCoulomb cost c ( x , . . . , x M ) = (cid:80) Mm (cid:54) = m (cid:48) =1 1 (cid:15) + | x m − x m (cid:48) | , with (cid:15) = 10 − and ∀ i = 1 , . . . , M, x m ∈ R . Fixed weights. After several tests comparing fixed and variable weights (with various weight func-tions), we observe that in dimension 3, for the marginal laws considered, both initialization andoptimization using variable weights were much slower than using fixed weights. Therefore, all follow-ing tests have been performed using fixed weights. Heuristically, when using variable weights, someparticles tend to have large weights and are strongly constrained while other ones become lightweightand do not move much since the gradient on positions is proportional to weights. Initialization was performed by a sampling K particles according to the marginal law, and then usingthe Runge-Kutta method showed in Section 4.4 to bring the particles on the submanifold of theconstraints M K . This method has been tested for various values of N and K , presented respectivelyin Figures 10.As N increases (Figure 10), the submanifold of the constraints becomes harder to reach usingthe Runge-Kutta method (similarly to the 1D case), and large values of N ( L N ≥ 11) could not beattained in the time of the numerical experiment (remind that the number of computations involvedat each iteration grows linearly with N ). In the case of each marginal laws ( µ and µ ) for which thetests have been performed, despite the assymmetry of µ , the dependence on N of the convergencespeed appears to be similar.Note also that as we use symmetrised test functions (regarding the marginal laws) with fixedweights, the number of independant coordinates involved in the Runge-Kutta method to satisfy theconstraints is linear in KM ( M being the number of marginal laws). Thus, solving the problem offinding a starting point on the submanifold with 100 marginal laws and 10 particles is numericallythe same as the one with 10 marginal laws and 10 particles. Although in the case where weightsare variable this remark can not be applied, as coordinates on different marginal laws of the same8 number of iterations c o n s t r a i n t s s a t i s f i a b ili t y (1) µ , M = 10 number of iterations c o n s t r a i n t s s a t i s f i a b ili t y (2) µ , M = 10 number of iterations c o n s t r a i n t s s a t i s f i a b ili t y (3) µ , M = 100 number of iterations c o n s t r a i n t s s a t i s f i a b ili t y (4) µ , M = 100 Figure 10: Evolution of (cid:107) Γ K ( Y Km ) (cid:107) ∞ for values of N ranging from 27 to 52 and K between 1000(dotted lines) and 10000 (solid lines) as a function of the number of iterations m of the Runge-Kutta 3procedure. ∆ t = 10 − . Blue curves are for N = 27, orange ones for N = 37, green ones for N = 43,red ones for N = 52 and pink ones for N = 73.particle share the same weight, increasing the number of marginal laws relaxes the problem of findinga starting point on the submanifold. The aim of Figures 11 and 12 is to plot the evolution of V K ( Y Kn ) as a function of n the number ofiterations of the constrained overdamped Langevin algorithm presented in Section 4 for various valuesof N and values of β . As we observed (Figure 11), and similarly to the tests in dimension 1, thattests with β = 0 converges faster than β > 0, we kept β = 0 for all the other tests. The convergenceof the cost for various values of N and K , various number of marginal laws and for µ and µ ispresented in Figure 12. And a presentation of how particles move during the optimization procedure9 number of iterations × . × . × . × . × c o s t β = 0 β = 0.01 β = 0.1 β = 1.0 Figure 11: Evolution of the cost as a function of the number of iterations n for various values of β .The marginal law is µ , β varies from 0 to 1 and the other parameters are ∆ t = 10 − , noise leveldecreases as the squareroot of the number of iterations, N = 27, M = 10, K = 160.can be seen in Figures 13 and 14.On all subgraphs of Figure 12, one can observe that the optimization procedure reaches a costclose to the optimal one for the MCOT problem in 50-200 iterations, when K is large enough for agiven N (e.g. K = 1000 is sufficient when N = 27 but not when N = 43). As N increases the value ofthe optimal costs does as well, which is expected, as MCOT problems get more and more constrained.As K increases, the value of the cost computed converges towards the MCOT cost. Indeed, the slightdecrease of the computed MCOT cost at the 20000 th iteration as K increases that can be observedin Table 3 from K = 320 to K = 10000 suggests that their exists K ∈ N such that for K ≥ K , thegain in an increase in K reflects weakly on the MCOT cost computed.On Figures 13 and 14 is plotted the evolution of some symmetrized visualizations of the processduring the optimization for an MCOT problem on µ . Although at each iteration it satisfies themoment constraints, it deviates from a Normal sample rapidly and tends to concentrate on somepoints (a bit like in Tchakaloff’s theorem and [1, Theorem 3.1]). As K increases, the symmetrized minimizers of Figures 15 and 16 tends to be visually more and moreconcentrated on some particular points. According to Table 3, higher values of K tends to have lowercosts.Some symmetrized visualizations of minimizers for MCOT problems for the non-symmetrical mea-sures µ and µ are presented in Figures 17 and 18. In those cases, the 1D couplings obtained oneach axis (X, Y or Z) are not the same (Figure 17). A higher number of marginal laws M seemsto spread more the particules, although their 1D coupling still shows particles highly concentratedaround a few values in the considered examples. Higher values of N increases the concentration ofthe particles around fewer values in the µ examples. The planar representation of the minimizersfor large M (Figure 18), shows that particules are not distributed spatially as a Normal function andtend to concentrate on some 1D curves (for the considered 2D projections) with a higher spreadingthan for lower values of M .0 number of iterations c o s t (1) µ , M = 10 number of iterations c o s t (2) µ , M = 10 number of iterations c o s t (3) µ , M = 100 number of iterations c o s t (4) µ , M = 100 Figure 12: Evolution of the cost as a function of the number of iterations n for various values of N and K from 1000 (dotted lines) to 10000 (solid lines). ∆ t = 10 − , β = 0. Blue curves are for N = 27,orange ones for N = 37, green ones for N = 43, red ones for N = 52 and pink ones for N = 73. OnFigures 12.1 and 12.3, “+” signs are added to better distinguish overlaid curves. K 40 80 160 320 1000 10000cost 12.2558198 12.1747815 12.1457150 12.0916662 12.0821615 12.0785749lower cost 12.1981977 12.0864398 12.0862042 12.0855486 12.0821615 12.0785745Table 3: Values of the regularized Coulomb cost (see here-named paragraph in Section 5.2.1) for theMCOT problem with µ , M = 10, N = 27, ∆ t = 10 − , β = 0 and K ranging from 40 to 10000.The cost line corresponds to the value of the regularized cost associated to the minimizing processat iteration 20000 (which also corresponds to the minimizers represented in the graphs of Figures 15and 16). The lower cost line corresponds to the lower value of the regularized cost encountered by theminmizing process before or at iteration 20000 for each value of K .1 (a1) plane XY, iteration 1(a2) plane XY, iteration 30(a3) plane XY, iteration 50 (b1) X axis, iteration 1(b2) X axis, iteration 30(b3) X axis, iteration 50 (c1) radial, iteration 1(c2) radial, iteration 30(c3) radial, iteration 50 Figure 13: Transport along optimization for µ , M = 10, K = 10000, N = 27, β =0, ∆ t = 10 − . In figures of column (a) is showed MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In figures ofcolumn (b) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ x km, ,x km (cid:48) , . In figures of column (c) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ | x km | , | x km (cid:48) | , where | x km | = (cid:113)(cid:80) i =1 ( x km,i ) . The evolution of the correspondingcost can be seen in Figure 12.1.2 (a4) plane XY, iteration 100(a5) plane XY, iteration 1000(a6) plane XY, iteration 20000 (b4) X axis, iteration 100(b5) X axis, iteration 1000(b6) X axis, iteration 20000 (c4) radial, iteration 100(c5) radial, iteration 1000(c6) radial, iteration 20000 Figure 14: Transport along optimization for µ , M = 10, K = 10000, N = 27, β =0, ∆ t = 10 − . In figures of column (a) is showed MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In figures ofcolumn (b) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ x km, ,x km (cid:48) , . In figures of column (c) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ | x km | , | x km (cid:48) | , where | x km | = (cid:113)(cid:80) i =1 ( x km,i ) . The evolution of the correspondingcost can be seen in Figure 12.1.3 (a1) plane XY, K=40(a2) plane XY, K=80(a3) plane XY, K=160 (b1) X axis, K=40(b2) X axis, K=80(b3) X axis, K=160 (c1) radial, K=40(c2) radial, K=80(c3) radial, K=160 Figure 15: Optimal transport with µ , M = 10, N = 27, β = 0 and ∆ t = 10 − ,for K = 40 , , MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In figuresof column (b) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ x km, ,x km (cid:48) , . In figures of column (c) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ | x km | , | x km (cid:48) | , where | x km | = (cid:113)(cid:80) i =1 ( x km,i ) . Corresponding costs can be foundin Table 3.4 (a4) plane XY, K=320(a5) plane XY, K=1000(a6) plane XY, K=10000 (b4) X axis, K=320(b5) X axis, K=1000(b6) X axis, K=10000 (c4) radial, K=320(c5) radial, K=1000(c6) radial, K=10000 Figure 16: Optimal transport with µ , M = 10, N = 27, β = 0 and ∆ t = 10 − , for K = 320 , , MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In figuresof column (b) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ x km, ,x km (cid:48) , . In figures of column (c) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ | x km | , | x km (cid:48) | , where | x km | = (cid:113)(cid:80) i =1 ( x km,i ) . Corresponding costs can be foundin Table 3.5 (a1) µ , X axis, M = 10, N = 27 (a2) µ , X axis, M = 100, N = 27 (a3) µ , X axis, M = 100, N = 27 (a4) µ , X axis, M = 100, N = 52 (b1) µ , Y axis, M = 10, N = 27 (b2) µ , Y axis, M = 100, N = 27 (b3) µ , Y axis, M = 100, N = 27 (b4) µ , Y axis, M = 100, N = 52 (c1) µ , Z axis, M = 10, N = 27 (c2) µ , Z axis, M = 100, N = 27 (c3) µ , Z axis, M = 100, N = 27 (c4) µ , Z axis, M = 100, N = 52 Figure 17: Optimal transport for µ and µ , M = 10 , N = 27 , β = 0, K =10000 and ∆ t = 10 − In figures of column (a) is showed MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In figuresof column (b) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ x km, ,x km (cid:48) , . In figures of column (c) is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ | x km | , | x km (cid:48) | , where | x km | = (cid:113)(cid:80) i =1 ( x km,i ) . In order to better distinguish betweenareas of low and high particles density, plots are represented as 2D histograms.6 (1) µ , plane XY, M = 10, N = 27 (2) µ , plane XY, M = 100, N = 27(3) µ , plane XY, M = 100, N = 27(4) µ , plane XY, M = 100, N = 52 Figure 18: Optimal transport for µ and µ , M = 10 , N = 27 , β = 0, K = 10000 and∆ t = 10 − In each graph, minimizers are represented as MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In order to betterdistinguish between areas of low and high particles density, plots are represented as 2D histograms.7 µ - Figure 19 Optimal transport for µ with a large number of electrons is of theoretical interest as it might provideapproximations for a uniform electronic density in a large space [32]. Numerical results for its MCOTrelaxation with M = 100 and N = 52 are presented in Figure 19. Although the cost has beenoptimized (Figure 19.1), it is only 3% lower than the initial uniform sampling (after a Runge-Kutta 3initialization). Although the 1D marginal laws seem well approximated (Figures 19.2 and 19.3), planarand radial graphs (Figures 19.2 and 19.4) show that particles are concentrated on two spheres (of radius0.6 and 1 respectively). Most of the transport takes place inside and between those two spheres. number of iterations c o s t (1) Cost as a function of n (2) plane XY(3) X axis (4) radial Figure 19: Evolution of the cost as a function of the number of iteration n (Figure 19.1) and optimaltransport with µ , M = 100, N = 52, K = 10000 β = 0 and ∆ t = 10 − . In Figure 19.2 is showed MK (cid:80) Kk =1 (cid:80) Mm =1 δ x km, ,x km, . In Figure 19.3 is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ x km, ,x km (cid:48) , . In Figure 19.4is showed M ( M − K (cid:80) Kk =1 (cid:80) Mm (cid:54) = m (cid:48) =1 δ | x km | , | x km (cid:48) | , where | x km | = (cid:113)(cid:80) i =1 ( x km,i ) . In order to better distinguishbetween areas of low and high particles density, plots are represented as 2D histograms.8 The aim of this section is to gather the proofs of our main theoretical results. We present here a corollary of the so-called Tchakaloff theorem which is the backbone of our results con-cerning the theoretical properties of the MCOT particle problem. A general version of the Tchakalofftheorem has been proved by Bayer and Teichmann [2]. Theorem 4 is an immediate consequence ofTchakaloff’s theorem, see Corollary 2 in [2]. Theorem 4. Let π be a measure on R d concentrated on a Borel set A ∈ F , i.e. π ( R d \ A ) = 0 . Let N ∈ N ∗ and Λ : R d → R N a measurable Borel map. Assume that the first moments of Λ π exist,i.e. (cid:90) R N (cid:107) u (cid:107) dΛ π ( u ) = (cid:90) R d (cid:107) Λ( z ) (cid:107) d π ( z ) < ∞ , where (cid:107) · (cid:107) denotes the Euclidean norm of R N . Then, there exist an integer ≤ K ≤ N , points z , ..., z K ∈ A and weights p , ..., p K > such that ∀ ≤ i ≤ N , (cid:90) R d Λ i ( z )d π ( z ) = K (cid:88) k =1 p k Λ i ( z k ) , where Λ i denotes the i -th component of Λ . We recall here that Λ π is the push-forward of π through Λ, and is defined as Λ π ( A ) = π (Λ − ( A ))for any Borel set A ⊂ R N .Last, let us mention that Theorem 4 is a consequence of Caratheodory’s theorem [42, Corollary17.1.2] applied to (cid:82) R N u dΛ π ( u ) which lies in the (convex) cone induced by spt(Λ π ), the supportof the measure Λ π . We denote here by S K the set of permutations of the set { , · · · , K } . Lemma 5. Let ( W, Y ) ∈ U NK be such that there exists k (cid:48) such that w k (cid:48) = 0 . Then for any permutation σ ∈ S K , there exists a polygonal map ψ : [0 , → U NK such that ψ (0) = ( W, Y ) , ψ (1) = ( W σ , Y σ ) and I ( ψ ( t )) is constant, where Y σ := ( X σ ( k ) ) ≤ k ≤ K ∈ (( R d ) M ) K and W σ := ( w σ ( k ) ) ≤ k ≤ K ∈ ( R + ) K .Proof. For ( W, Y ) and ( W (cid:48) , Y (cid:48) ), we will denote [( W, Y ) , ( W (cid:48) , Y (cid:48) )] the segment map t ∈ [0 , (cid:55)→ [(1 − t ) W + tW (cid:48) , (1 − t ) Y + tY (cid:48) ] and we will construct ψ as the concatenation of segments that areclearly in U NK and leaves I constant.It is sufficient to prove this result for transpositions i.e. for σ such that there exist i < i suchthat σ ( i ) = i , σ ( i ) = i and σ ( i ) = i for i (cid:54)∈ { i , i } . We distinguish two cases. • k (cid:48) ∈ { i , i } , say k (cid:48) = i . We then define Y = ( X k ) ≤ k ≤ K by X k (cid:48) = X i and X k = X k for k (cid:54) = k (cid:48) and consider the segment [( W, Y ) , ( W, Y )]. We then set w k (cid:48) = w i , w i = 0 and w k = w k for k (cid:54)∈ { k (cid:48) , i } (note that W = W σ ) and consider the segment [( W, Y ) , ( W , Y )]. Last, we define Y = ( X k ) ≤ k ≤ K as X i = X k (cid:48) and X k = X k for k (cid:54) = i (note that Y = Y σ ) and consider thesegment [( W , Y ) , ( W , Y )]. • k (cid:48) (cid:54)∈ { i , i } . First, we define Y = ( X k ) ≤ k ≤ K by X k (cid:48) = X i and X k = X k for k (cid:54) = k (cid:48) andconsider the segment [( W, Y ) , ( W, Y )]. We then set w k (cid:48) = w i , w i = 0 and w k = w k for k (cid:54)∈ { k (cid:48) , i } and consider the segment [( W, Y ) , ( W , Y )]. Then, we define Y = ( X k ) ≤ k ≤ K as X i = X i and X k = X k for k (cid:54) = i , and consider the segment [( W , Y ) , ( W , Y )]. We the set w i = w i , w i = 0 and w k = w k for k (cid:54)∈ { i , i } , and consider the segment [( W , Y ) , ( W , Y )].9Now, we define Y = ( X k ) ≤ k ≤ K by X i = X i , X k = X k for k (cid:54) = i and consider the segment[( W , Y ) , ( W , Y )]. Then, we define w i = w k (cid:48) = w i , w k (cid:48) = 0 and w k = w k for k (cid:54)∈ { i , k (cid:48) } (notethat W = W σ ) and consider the segment [( W , Y ) , ( W , Y )]. Last, we set Y = ( X k ) ≤ k ≤ K with X k (cid:48) = X k (cid:48) and X k = X k for k (cid:54) = k (cid:48) (note that Y = Y σ ) and finally consider the segment[( W , Y ) , ( W , Y )], which gives the claim. Proof. For i = 0 , 1, let W i := ( w k,i ) ≤ k ≤ K ∈ R K + , Y i = ( X ki ) ≤ k ≤ K ⊂ ( R d ) M and π i := (cid:80) Kk =1 w k,i δ X ki ∈P (cid:0) ( R d ) M (cid:1) . Note that, for i = 0 , 1, the support of π i is included in the discrete set { X ki , ≤ k ≤ K } .For i = 0 , 1, using Theorem 4 with π = π i and Λ : ( R d ) M → R N +3 the map defined such that, forall X ∈ ( R d ) M ,Λ n ( X ) = ϕ n ( X ) , ∀ ≤ n ≤ N, Λ N +1 ( X ) = 1 , Λ N +2 ( X ) = c ( X ) and Λ N +3 ( X ) = ϑ ( X ) , it holds that there exists a subset J i ⊂ { , · · · , K } such that K i := J i ≤ N + 3, and weights( (cid:101) w ij ) j ∈ J i ⊂ R + such that ∀ ≤ n ≤ N, (cid:88) j ∈ J i (cid:101) w ij ϕ n ( X ji ) = (cid:90) ( R d ) M ϕ n dπ i = K (cid:88) k =1 w k,i ϕ n ( X ki ) = µ n , (48) (cid:88) j ∈ J i (cid:101) w ij = (cid:90) ( R d ) M dπ i = K (cid:88) k =1 w k,i = 1 , (49) (cid:88) j ∈ J i (cid:101) w ij c ( X ji ) = (cid:90) ( R d ) M c dπ i = K (cid:88) k =1 w k,i c ( X ki ) = I ( W i , Y i ) , (50) (cid:88) j ∈ J i (cid:101) w ij ϑ ( X ji ) = (cid:90) ( R d ) M ϑ dπ i = K (cid:88) k =1 w k,i ϑ ( X ki ) ≤ A. (51)Without loss of generality, by using Lemma 5, we can assume that J = (cid:74) , K (cid:75) where K ≤ N + 3and that J = (cid:74) K − K + 1 , K (cid:75) where K − K + 1 ≥ N + 4.We then define (cid:102) W := ( (cid:101) w , · · · , (cid:101) w K , , · · · , ∈ R K + and (cid:102) W := (0 , · · · , , (cid:101) w K − K +1 , · · · , (cid:101) w K ) ∈ R K + . Let us first define the applications ψ : (cid:20) , (cid:21) (cid:51) t (cid:55)→ (cid:16) W + 5 t ( (cid:102) W − W ) , Y (cid:17) and ψ : (cid:20) , (cid:21) (cid:51) t (cid:55)→ (cid:16) W + 5(1 − t )( (cid:102) W − W ) , Y (cid:17) so that ψ (0) = ( W , Y ), ψ (1 / 5) = ( (cid:102) W , Y ), ψ (1) = ( W , Y ), ψ (4 / 5) = ( (cid:102) W , Y ). Then, ψ and ψ are continuous applications and identities (48)-(49)-(50)-(51) implies that for all t ∈ [0 , / t ∈ [4 / , ψ ( t ) ∈ U KN and I ( ψ ( t )) = I ( W , Y ) (respectively ψ ( t ) ∈ U KN and I ( ψ ( t ) = I ( W , Y )).We then define (cid:101) Y := (cid:16) X , · · · , X K , , · · · , , X K − K +11 , · · · , X K (cid:17) ∈ (( R d ) M ) K . We then intro-duce the continuous applications ψ : (cid:20) , (cid:21) (cid:51) t (cid:55)→ (cid:16)(cid:102) W , Y + 5( t − / (cid:101) Y (cid:17) ψ : (cid:20) , (cid:21) (cid:51) t (cid:55)→ (cid:16)(cid:102) W , Y + 5(4 / − t ) (cid:101) Y (cid:17) . It thus holds that ψ (1 / 5) = ( (cid:102) W , Y ) and ψ (2 / 5) = ( (cid:102) W , (cid:101) Y ). Similarly, ψ (4 / 5) = ( (cid:102) W , Y ) and ψ (3 / 5) = ( (cid:102) W , (cid:101) Y ). Let us point out here that, by the definition of (cid:101) Y , for any t ∈ (cid:2) , (cid:3) , the K firstcomponents of ψ ( t ) are equal to X , · · · , X K . Thus, since (cid:102) W := ( (cid:101) w , · · · , (cid:101) w K , , · · · , ∈ R K + , thisimplies that for all t ∈ (cid:2) , (cid:3) , ψ ( t ) ∈ U KN and in addition, I ( ψ ( t )) = I ( (cid:102) W , Y ) = I ( (cid:102) W , (cid:101) Y ) = I ( W , Y ) . Similarly, for any t ∈ (cid:2) , (cid:3) , ψ ( t ) ∈ U KN and in addition, I ( ψ ( t )) = I ( (cid:102) W , Y ) = I ( (cid:102) W , (cid:101) Y ) = I ( W , Y ) . Notice that in particular, I remains constant along the paths in U KN given by the applications ψ , ψ , ψ and ψ .Last, we introduce the application ψ : (cid:20) , (cid:21) (cid:51) t (cid:55)→ (cid:16)(cid:102) W + 5( t − / (cid:102) W , (cid:101) Y (cid:17) which is continuous and such that ψ (2 / 5) = ( (cid:102) W , (cid:101) Y ) and ψ (3 / 5) = ( (cid:102) W , (cid:101) Y ). Using similar argumentsas above, it then holds that for all t ∈ [2 / , / ψ ( t ) ∈ P KN and I ( ψ ( t )) = I (cid:16)(cid:102) W , (cid:101) Y (cid:17) + 5( t − / I (cid:16)(cid:102) W , (cid:101) Y (cid:17) = I ( W , Y ) + 5( t − / I ( W , Y ) . This implies that I monotonically varies along the path given by the application ψ .We finally consider the application ψ : [0 , → ( R + ) K × (cid:0) ( R d ) M (cid:1) K defined by ∀ t ∈ [0 , , ψ ( t ) = ψ ( t ) if t ∈ [0 , / ,ψ ( t ) if t ∈ [1 / , / ,ψ ( t ) if t ∈ [2 / , / ,ψ ( t ) if t ∈ [3 / , / ,ψ ( t ) if t ∈ [4 / , . Gathering all the results we have obtained so far, it then holds that ψ is continuous, that for all t ∈ [0 , ψ ( t ) ∈ U KN and that the application I ◦ ψ is monotone. Hence the desired result. Acknowledgements The Labex B´ezout is acknowledged for funding the PhD thesis of Rafa¨el Coyaud. Aur´elien Alfonsibenefited from the support of the “Chaire Risques Financiers”, Fondation du Risque. We are verygrateful to Gero Friesecke, Daniela V¨ogler, Tony Leli`evre, Gabriel Stoltz and Pierre Monmarch´efor stimulating discussions, as well as Mathieu Lewin for precious comments on the paper. Thispublication is part of a project that has received funding from the European Research Council (ERC)under the European Union’s Horizon 2020 Research and Innovation Programme – Grant Agreement n ◦ References [1] Aur´elien Alfonsi, Rafa¨el Coyaud, Virginie Ehrlacher, and Damiano Lombardi. Approximation ofoptimal transport problems with marginal moments constraints. Math. Comp. , 90(328):689–737,2021.[2] Christian Bayer and Josef Teichmann. The proof of Tchakaloff’s theorem. Proceedings of theAmerican mathematical society , 134(10):3035–3040, 2006.[3] Mathias Beiglb¨ock, Pierre Henry-Labord`ere, and Friedrich Penkner. Model-independent boundsfor option prices—a mass transport approach. Finance Stoch. , 17(3):477–501, 2013.[4] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyr´e. Iter-ative bregman projections for regularized transportation problems. SIAM Journal on ScientificComputing , 37(2):A1111–A1138, 2015.[5] Jean-David Benamou, Guillaume Carlier, and Luca Nenna. A numerical method to solve multi-marginal optimal transport problems with coulomb cost. In Splitting Methods in Communication,Imaging, Science, and Engineering , pages 577–601. Springer, 2016.[6] Ugo Bindini and Luigi De Pascale. Optimal transport with Coulomb cost and the semiclassicallimit of density functional theory. J. ´Ec. polytech. Math. , 4:909–934, 2017.[7] Giuseppe Buttazzo, Thierry Champion, and Luigi De Pascale. Continuity and estimates for mul-timarginal optimal transportation problems with singular costs. Appl. Math. Optim. , 78(1):185–200, 2018.[8] Giuseppe Buttazzo, Luigi De Pascale, and Paola Gori-Giorgi. Optimal-transport formulation ofelectronic density-functional theory. Physical Review A , 85(6):062502, 2012.[9] Guillaume Carlier. Optimal transportation and economic applications. Lecture Notes , 2012.[10] Guillaume Carlier, Vincent Duval, Gabriel Peyr´e, and Bernhard Schmitzer. Convergence ofentropic schemes for optimal transport and gradient flows. SIAM J. Math. Anal. , 49(2):1385–1418, 2017.[11] Giovanni Ciccotti, Tony Lelievre, and Eric Vanden-Eijnden. Projection of diffusions on subman-ifolds: Application to mean force computation. Communications on Pure and Applied Math-ematics: A Journal Issued by the Courant Institute of Mathematical Sciences , 61(3):371–408,2008.[12] Maria Colombo, Luigi De Pascale, and Simone Di Marino. Multimarginal optimal transport mapsfor one–dimensional repulsive costs. Canadian Journal of Mathematics , 67(2):350–368, 2015.[13] Codina Cotar, Gero Friesecke, and Claudia Kl¨uppelberg. Density functional theory and optimaltransportation with coulomb cost. Communications on Pure and Applied Mathematics , 66(4):548–599, 2013.[14] Codina Cotar, Gero Friesecke, and Claudia Kl¨uppelberg. Smoothing of transport plans with fixedmarginals and rigorous semiclassical limit of the hohenberg–kohn functional. Archive for RationalMechanics and Analysis , 228(3):891–922, 2018.[15] Dinh D˜ung, Vladimir Temlyakov, and Tino Ullrich. Hyperbolic cross approximation . AdvancedCourses in Mathematics. CRM Barcelona. Birkh¨auser/Springer, Cham, 2018. Edited and with aforeword by Sergey Tikhonov.[16] Hadrien De March. Entropic approximation for multi-dimensional martingale optimal transport. arXiv preprint arXiv:1812.11104 , 2018.2[17] Gero Friesecke, Christian B Mendl, Brendan Pass, Codina Cotar, and Claudia Kl¨uppelberg. N-density representability and the optimal transport limit of the hohenberg-kohn functional. TheJournal of chemical physics , 139(16):164109, 2013.[18] Gero Friesecke and Daniela V¨ogler. Breaking the curse of dimension in multi-marginal kantorovichoptimal transport on finite state spaces. SIAM Journal on Mathematical Analysis , 50(4):3996–4019, 2018.[19] Alfred Galichon. A survey of some recent applications of optimal transport methods to econo-metrics. The Econometrics Journal , 20(2):C1–C11, 2017.[20] Alfred Galichon. Optimal transport methods in economics . Princeton University Press, 2018.[21] Didier Henrion, Jean-Bernard Lasserre, and Johan L¨ofberg. Gloptipoly 3: moments, optimizationand semidefinite programming. Optimization Methods & Software , 24(4-5):761–779, 2009.[22] Yuehaw Khoo, Lin Lin, Michael Lindsey, and Lexing Ying. Semidefinite relaxation of multi-marginal optimal transport for strictly correlated electrons in second quantization. arXiv preprintarXiv:1905.08322 , 2019.[23] Yuehaw Khoo and Lexing Ying. Convex relaxation approaches for strictly correlated densityfunctional theory. SIAM Journal on Scientific Computing , 41(4):B773–B795, 2019.[24] Jean B Lasserre. A semidefinite programming approach to the generalized problem of moments. Mathematical Programming , 112(1):65–92, 2008.[25] Jean-Bernard Lasserre. Moments, positive polynomials and their applications , volume 1. WorldScientific, 2010.[26] Charles L Lawson and Richard J Hanson. Solving least squares problems . SIAM, 1995.[27] Benedict Leimkuhler and Sebastian Reich. Simulating hamiltonian dynamics , volume 14. Cam-bridge university press, 2004.[28] Tony Lelievre, Mathias Rousset, and Gabriel Stoltz. Free energy computations: A mathematicalperspective . World Scientific, 2010.[29] Tony Lelievre, Mathias Rousset, and Gabriel Stoltz. Langevin dynamics with constraints andcomputation of free energy differences. Mathematics of computation , 81(280):2071–2125, 2012.[30] Tony Leli`evre, Mathias Rousset, and Gabriel Stoltz. Hybrid monte carlo methods for samplingprobability measures on submanifolds. Numerische Mathematik , 143(2):379–421, 2019.[31] Mathieu Lewin. Semi-classical limit of the Levy-Lieb functional in density functional theory. C.R. Math. Acad. Sci. Paris , 356(4):449–455, 2018.[32] Mathieu Lewin, Elliott H. Lieb, and Robert Seiringer. Statistical mechanics of the uniformelectron gas. J. ´Ec. polytech. Math. , 5:79–116, 2018.[33] Mathieu Lewin, Elliott H Lieb, and Robert Seiringer. Universal functionals in density functionaltheory. In ´Eric Canc`es, Gero Friesecke, and Lin Lin, editors, Density Functional Theory . 2019.arXiv preprint arXiv:1912.10424.[34] Christian B Mendl and Lin Lin. Kantorovich dual solution for strictly correlated electrons inatoms and molecules. Physical Review B , 87(12):125106, 2013.[35] Luca Nenna. Numerical methods for multi-marginal optimal transportation . PhD thesis, PSLResearch University, 2016.3[36] Gilles Pag`es. Numerical Probability . Springer, 2018.[37] Robert G Parr. Density functional theory of atoms and molecules. In Horizons of quantumchemistry , pages 5–15. Springer, 1980.[38] Gabriel Peyr´e and Marco Cuturi. Computational optimal transport. Foundations and Trends ® in Machine Learning , 11(5-6):355–607, 2019.[39] Federico Piazzon, Alvise Sommariva, and Marco Vianello. Caratheodory-tchakaloff subsampling. Dolomites Research Notes on Approximation , 10(1), 2017.[40] Elijah Polak. Optimization: algorithms and consistent approximations , volume 124. SpringerScience & Business Media, 1997.[41] Anthony Ralston and Philip Rabinowitz. A first course in numerical analysis . Courier Corpora-tion, 2001.[42] R Tyrrell Rockafellar. Convex analysis . Number 28. Princeton university press, 1970.[43] Giovanni Samaey, Tony Leli`evre, and Vincent Legat. A numerical closure approach for kineticmodels of polymeric fluids: exploring closure relations for fene dumbbells. Computers & fluids ,43(1):119–133, 2011.[44] Filippo Santambrogio. Optimal transport for applied mathematicians. Birk¨auser, NY , pages99–102, 2015.[45] Maria Tchernychova. Carath´eodory cubature measures . PhD thesis, University of Oxford, 2016.[46] C´edric Villani. Topics in optimal transportation . Number 58. American Mathematical Soc., 2003.[47] C´edric Villani. Optimal transport: old and new , volume 338. Springer Science & Business Media,2008.[48] Daniela V¨ogler. Kantorovich vs. monge: A numerical classification of extremal multi-marginalmass transports on finite state spaces. arXiv preprint arXiv:1901.04568 , 2019.[49] Wei Zhang. Ergodic sdes on submanifolds and related numerical sampling schemes. ESAIM:Mathematical Modelling and Numerical Analysis , 54(2):391–430, 2020. A. Alfonsi Universit´e Paris-Est, CERMICS (ENPC), INRIA, F-77455 Marne-la-Vall´ee, France E-mail address : [email protected] R. Coyaud Universit´e Paris-Est, CERMICS (ENPC), INRIA, F-77455 Marne-la-Vall´ee, France E-mail address : [email protected] V. Ehrlacher Universit´e Paris-Est, CERMICS (ENPC), INRIA, F-77455 Marne-la-Vall´ee, France E-mail address ::