[PDF] A Multiscale Analysis of Multi-Agent Coverage Control Algorithms

Abstract

This paper presents a theoretical framework for the design and analysis of gradient descent-based algorithms for coverage control tasks involving robot swarms. We adopt a multiscale approach to analysis and design to ensure consistency of the algorithms in the large-scale limit. First, we represent the macroscopic configuration of the swarm as a probability measure and formulate the macroscopic coverage task as the minimization of a convex objective function over probability measures. We then construct a macroscopic dynamics for swarm coverage, which takes the form of a proximal descent scheme in the L^2-Wasserstein space. Our analysis exploits the generalized geodesic convexity of the coverage objective function, proving convergence in the L^2-Wasserstein sense to the target probability measure. We then obtain a consistent gradient descent algorithm in the Euclidean space that is implementable by a finite collection of agents, via a "variational" discretization of the macroscopic coverage objective function. We establish the convergence properties of the gradient descent and its behavior in the continuous-time and large-scale limits. Furthermore, we establish a connection with well-known Lloyd-based algorithms, seen as a particular class of algorithms within our framework, and demonstrate our results via numerical experiments.

Full PDF

AA Multiscale Analysis of Multi-AgentCoverage Control Algorithms

Vishaal Krishnan and Sonia Mart´ınez

Abstract —This paper presents a theoretical framework forthe design and analysis of gradient descent-based algorithmsfor coverage control tasks involving robot swarms. We adopt amultiscale approach to analysis and design to ensure consistencyof the algorithms in the large-scale limit. First, we represent themacroscopic conﬁguration of the swarm as a probability measureand formulate the macroscopic coverage task as the minimizationof a convex objective function over probability measures. Wethen construct a macroscopic dynamics for swarm coverage,which takes the form of a proximal descent scheme in the L -Wasserstein space. Our analysis exploits the generalized geodesicconvexity of the coverage objective function, proving convergencein the L -Wasserstein sense to the target probability measure.We then obtain a consistent gradient descent algorithm in theEuclidean space that is implementable by a ﬁnite collection ofagents, via a “variational” discretization of the macroscopic cov-erage objective function. We establish the convergence propertiesof the gradient descent and its behavior in the continuous-timeand large-scale limits. Furthermore, we establish a connectionwith well-known Lloyd-based algorithms, seen as a particularclass of algorithms within our framework, and demonstrate ourresults via numerical experiments. Index Terms —Multi-agent systems, coverage control, multi-scale analysis, proximal descent, Lloyd’s algorithm.

I. I

NTRODUCTION

Multi-agent systems are groups of autonomous agents withsensing, communication, and computational capabilities. It isoften necessary to achieve a desired coverage of a spatialregion before these systems can be deployed for speciﬁc pur-poses. This has spurred intense research activity on the designof multi-agent coverage control algorithms [1]–[4]. In spatialcoverage control problems involving large-scale multi-agentsystems, it is often more appropriate and convenient to specifythe task objective at the macroscopic scale for the distributionof agents over the spatial region. However, actuation still restsat the microscopic scale at the level of the individual agents,and faces a multitude of constraints imposed by the multi-agent setting. These include information constraints fromlimitations on sensing, communication and localization, andphysical constraints such as collision and obstacle avoidance.This separation of scales poses a problem for the analysisand design of algorithms with performance guarantees. While mechanistic models relying on theoretical tools from inﬁnite-dimensional analysis are often more appropriate for macroscales, an algorithmic approach that relies on tools from

This material is based upon work supported by grant AFOSR FA-9550-18-1-0158. V. Krishnan is with the Mechanical Engineering Department, Univer-sity of California, Riverside, Riverside CA 92521 USA. S. Mart´ınez is withthe Department of Mechanical and Aerospace Engineering, University of Cal-ifornia, San Diego, La Jolla CA 92093 USA. Email: [email protected] , [email protected] . ﬁnite dimensional analysis is more effective in addressing theabove microscopic constraints. This underscores the need fora formal theory bridging the two scales. Such a bridge theoryis crucial for integrating the mechanistic and algorithmicparadigms and in understanding how macroscopic coverageobjectives translate to the microscopic level of individualagents and conversely, how the microscopic algorithms shapemacroscopic behavior. Related work.

Multi-agent coverage control algorithmshave been widely studied over the past two decades and have arich literature. For an (inexhaustive) overview of the literature,we adopt the classiﬁcation into mechanistic vs algorithmic models, as introduced earlier. The algorithmic perspective ispredominantly based on tools from distributed optimization.Initial works combined distributed optimization with ideasfrom computational geometry and dynamic systems [1], [5]–[7]. These were then extended to include sensing, energy,and, obstacle, and dynamic constraints encountered in themulti-agent setting [3], [8], [9]. Interest in the mechanisticperspective was fueled by efforts to scale up the size of thesesystems, which emphasized the need for tools of macroscopicanalysis. This led to the application of mathematical tools fromprobability, stochastic processes and partial differential equa-tions. For large-scale multi-agent systems, one such approachinvolves the design of coverage by synthesis of Markov transi-tion matrices [10]–[13]. Another approach involves the use ofcontinuum/PDE-based models, applying ideas of diffusion andheat ﬂow to coverage control [14]–[16]. Tools from parametertuning and boundary control of PDEs [17]–[19] have been usedin this context. Statistical physics-based approaches, includingthe application of mean-ﬁeld theory, have also been recentlyexplored [20], [21]. Some works at the intersection of themicroscopic and macroscopic perspectives include [19], wherethe authors obtain performance bounds for spatial coverage bymulti-agent swarms, characterizing coverage performance as afunction of the number of robots and robot sensing radius.More recently, tools from optimal transport theory havebeen applied to multi-agent coverage. Interest in optimaltransport and optimal control is motivated by energy consid-erations, and constitutes another active area of research [22]–[26]. Furthermore, coverage algorithms often work with aquantization of the underlying spatial domain. Recently [27]–[30] explores the underlying connections of quantization tooptimal transport. Some well-known transport PDEs can beformulated as gradient ﬂows on functionals in the space ofprobability measures [31]. Furthermore, from a computationalperspective, gradient ﬂows in the space of probability measuresare often discretized into particle gradient ﬂows. The gradientﬂow structure underlying these PDEs allows for their dis- a r X i v : . [ m a t h . O C ] F e b cretization by formulating proximal gradient descent schemesin the space of probability measures. For instance, in [32]the authors discretize the well-known Fokker-Planck equationby a proximal recursion. In [33], the authors investigate theconvergence of such particle gradient ﬂows to global minimain the limit N → ∞ . In [34], the authors apply proximaldescent schemes to study uncertainty propagation in stochasticsystems. Contributions.

This paper contributes a multi-scale analysisof gradient descent-based coverage algorithms for multi-agentsystems, with three main goals in mind: (i) the formalizationof coverage objectives for large-scale multi-agent systems viameaningful macroscopic metrics, (ii) the systematic designof provable correct algorithms that are consistent across themacroscopic and microscopic scales, and (iii) to gain a funda-mental understanding of widely studied coverage algorithmsfor large-scale multi-agent systems and shed new light ontheir behavior as the number of agents N → ∞ . A suitabletheoretical framework for the above is largely missing in theliterature and this work addresses the gap.We formulate the coverage task as a minimization in thespace of probability measures and deﬁne a proximal gradi-ent descent on the aggregate objective function. The multi-agent conﬁguration is speciﬁed by discretizing the underlyingprobability measure and we obtain implementable coveragealgorithms as a proximal gradient descent on the discretizedaggregate objective function w.r.t. agent positions. This leadsto a new class of “variational” gradient algorithms, and weshow that this class of algorithms subsumes previously deﬁnedcoverage algorithms based on distortion metrics. This allowsus to establish a connection between the macroscopic andmicroscopic perspectives and present a uniﬁed theory of multi-agent coverage algorithms. Paper outline.

The rest of the paper is organized asfollows. Section II contains a description of the coverageoptimization problem setting. In Section III, we present aniterative descent scheme in the space of probability measuresand establish convergence results for such a scheme. Buildingon these results, we propose multi-agent coverage algorithmsin Section IV as the discretization of the iterative descentscheme from Section III, establish convergence results andstudy their behavior in the continuous-time and N → ∞ limits. Section V contains a case study of the well-knownLloyd’s algorithm within the theoretical framework developedin the prior sections and results from numerical experiments.An overview of the mathematical preliminaries is presented inAppendix A.II. C OVERAGE OPTIMIZATION PROBLEM

In this section, we formulate the multi-agent coverage prob-lem as an optimization of a macroscopic coverage objective,which forms the focus of our analysis and algorithm design inthe subsequent sections. We begin by specifying the problemsetting. Let Ω ⊂ R d be compact and convex (see additional notation here ), and x = ( x , . . . , x N ) (with x i ∈ Ω for i ∈ I = { , . . . , N } being the agent positions) denote themicroscopic state of the multi-agent system. In specifyingthe macroscopic conﬁguration, we look for a representationthat satisﬁes two key properties, (i) Permutation-invariance:

Assuming that the agents are identical, we note that everymicroscopic conﬁguration x ∈ Ω N is equivalent to ( P ⊗ I d ) x for any permutation P ∈ R N × N . The representation must beinvariant under such permutations, and (ii) Consistency in the N → ∞ limit : The space of representations must containthe “representation limit” as N → ∞ , to enable the studyof large-scale properties of coverage algorithms. This leadsus to specifying the macroscopic conﬁguration of the multi-agent system by probability measures over the underlyingspace Ω . For the microscopic conﬁguration x = ( x , . . . , x N ) ,we specify the corresponding macroscopic conﬁguration bythe probability measure (cid:98) µ N x = N (cid:80) Ni =1 δ x i . We note that (cid:98) µ N x is invariant under permutations of agent positions. Further-more, if the positions x i are independently and identicallydistributed according to an (absolutely continuous) probabilitymeasure µ ∈ P (Ω) , it follows from the Glivenko-Cantellitheorem [35] that as N → ∞ , the discrete probability measure (cid:98) µ N x converges uniformly, and almost surely, to µ . In this way,probability measures over Ω are a suitable space of macro-scopic representations that combine the desired properties ofpermutation-invariance and consistency in the N → ∞ limit.With the microscopic and macroscopic representations ofthe multi-agent system in place, we now move to the speciﬁca-tion of the coverage task as the minimization of a macroscopiccoverage objective function F : P (Ω) → R . We let F be l -smooth and strictly (generalized) geodesically convex , witha unique minimizer µ ∗ ∈ P (Ω) . The coverage problem canthen be described as follows: Given an initial macroscopicconﬁguration µ ∈ P (Ω) of the multi-agent system (with µ being an absolutely continuous probability measure), specify adescent scheme in P (Ω) that minimizes the coverage objectivefunction F , generating a sequence { µ k } k ∈ N that convergesweakly to µ ∗ as k → ∞ . In Section III, we propose a proximaldescent scheme that exploits the (generalized) convexity of F to solve the coverage task. Furthermore, in Section IV weobtain an implementable multi-agent coverage algorithm that We let (cid:107)·(cid:107) : R d → R ≥ denote the Euclidean norm on R d and |·| : R → R ≥ the absolute value function. The gradient operator in R d is denoted as ∇ = ( ∂/∂x , . . . ∂/∂x n ) , where, as a shorthand, we use ∂/∂z ≡ ∂ z todenote the partial derivative w.r.t. a variable z and ∂∂x i ≡ ∂ i . Consider a set Ω ⊆ R d . In what follows, ∂ Ω ⊆ R d denotes its boundary, ¯Ω = Ω ∪ ∂ Ω itsclosure, and ˚Ω = Ω \ ∂ Ω its interior with respect to the standard Euclideantopology. For M ⊆ Ω , we deﬁne the distance d ( x, M ) of a point x ∈ Ω to M as d ( x, M ) = inf y ∈ M (cid:107) x − y (cid:107) . Given any x ∈ Ω ⊂ R d , we denoteby B r ( x ) the closed d -ball of radius r > , centered at x . The indicatorfunction on Ω for the subset M will be denoted as M : Ω → { , } .We use (cid:104) f, g (cid:105) to represent the inner product of functions f, g : Ω → R w.r.t. the Lebesgue measure, given by (cid:104) f, g (cid:105) = (cid:82) Ω fg dvol . We denote by Lip(Ω) the space of Lipschitz continuous functions on Ω . A function p :Ω → R is called l -smooth (or Lipschitz differentiable) if for any x, y ∈ Ω ,we have |∇ p ( y ) − ∇ p ( x ) | ≤ l (cid:107) y − x (cid:107) . It can be shown that for an l -smoothfunction p : Ω → R and any x, y ∈ Ω , we have | p ( y ) − p ( x ) − (cid:104)∇ p ( x ) , y − x (cid:105)| ≤ l (cid:107) y − x (cid:107) . We denote by P (Ω) the space of probability measures over Ω . For a measurable mapping T : Ω → Θ , where Ω and Θ are measurable,we denote by T µ ∈ P (Θ) the pushforward measure of µ ∈ P (Ω) and wehave T µ ( B ) = µ ( T − ( B )) . for all measurable B ⊆ Θ . in the sense of Deﬁnition 7 in Appendix A. updates agent positions in Ω and performs consistently (in the N → ∞ limit) with the macroscopic descent scheme. That is,we design a provably-correct, discrete-time, agent-based algo-rithm that generates microscopic sequences { x k } k ∈ N ⊆ Ω N such that lim k,N →∞ (cid:98) µ N x k = µ (cid:63) . We address this question inSection IV by tying the macroscopic descent scheme withthe microscopic coverage algorithm by means of a variationalapproach. Example coverage objective functions.

We introduce a classof coverage objective functions, whose convexity propertieswill be analyzed in Section V. Furthermore, in Section V wealso establish a relationship between the macroscopic descentscheme corresponding to these objective functions and thewell-known Lloyd’s algorithm [1]. Let f : R → R be a strictlyconvex, non-decreasing and l -smooth function with f (0) = 0 ,and let: C f ( µ, ν ) = inf T :Ω → Ω T µ = ν (cid:90) Ω f ( | x − T ( x ) | ) dµ ( x ) , (1)be deﬁned for two probability measures µ and ν . In thequadratic case f ( x ) = x , we get C f ≡ W , the so-called L -Wasserstein distance, which is a metric over P (Ω) . Conversely,this suggests the design of a coverage objective function givena target macroscopic conﬁguration µ (cid:63) , as F ( µ ) = W ( µ, µ (cid:63) ) ,which quantiﬁes how far µ is from the target µ (cid:63) .III. M ACROSCOPIC AND PARTICLE DESCENT SCHEMES

In this section, we present a (macroscopic) iterative descentscheme in the space of probability measures P (Ω) and es-tablish weak convergence to the minimizer under certain con-ditions. Furthermore, we derive an equivalent (microscopic)characterization of the descent scheme in Ω . We refer toAppendix A for additional deﬁnitions and supporting results.We consider the following proximal recursion in P (Ω) starting from any absolutely continuous µ ∈ P (Ω) : µ k +1 ∈ arg min ν ∈P (Ω) τ W ( µ k , ν ) + F ( ν ) . (2)We assume that F satisﬁes the Neumann boundary condition ∇ (cid:0) δFδν (cid:1) · n ≥ on ∂ Ω (where n is the outward normal to ∂ Ω ) for any ν ∈ P (Ω) . This ensures conservation of massand that the solutions of the gradient descent w.r.t. F , whichare sequences of measures, are contained in P (Ω) . Lemma 1 ( Compactness and convexity of sublevel sets ). Let F be an l -smooth, geodesically convex functional (in the senseof Deﬁnition 7 in Appendix A) over Ω . The F -sublevel set ofany absolutely continuous probability measure µ ∈ P (Ω) iscompact and geodesically convex in the L -Wasserstein space ( P (Ω) , W ) .Proof. For any µ ∈ P (Ω) , the sublevel set S ( µ ) = { ν ∈P (Ω) | F ( ν ) ≤ F ( µ ) } is closed in ( P (Ω) , W ) , since F iscontinuous and P (Ω) is closed and compact (see Corollary 3in Appendix A on the compactness of P (Ω) ). This implies that S ( µ ) is also compact since it is a closed subset of a compactset.Recall from Lemma 16 that ( P (Ω) , W ) is geodesicallyconvex, and consider, for any ν , ν ∈ S ( µ ) , and ν t ∈ P (Ω) , for t ∈ [0 , , the generalized geodesic between ν to ν with µ as the reference measure (from Lemma 13 it follows thatunique optimal transport maps from µ to ν and µ to ν exist, since µ is absolutely continuous, and therefore so does aunique generalized geodesic in ( P (Ω) , W ) between ν and ν as in Deﬁnition 6). From the (generalized) geodesic convexityof F we have that F ( ν t ) ≤ (1 − t ) F ( ν ) + tF ( ν ) ≤ F ( µ ) (since F ( ν ) ≤ F ( µ ) and F ( ν ) ≤ F ( µ ) by deﬁnition of S ( µ ) ). This implies that ν t ∈ S ( µ ) for any t ∈ [0 , , fromwhich we infer the geodesic convexity of S ( µ ) . Lemma 2 ( Strong convexity of objective functional ). Let F be an l -smooth, geodesically convex functional over P (Ω) .For any absolutely continuous probability measure µ ∈ P (Ω) ,the functional G ( ν ) = τ W ( µ, ν ) + F ( ν ) is (cid:0) τ − l (cid:1) -strongly geodesically convex (in the sense of Deﬁnition 8 inAppendix A) over P r (Ω) for < τ < /l .Proof. Since F is l -smooth, applying Lemma 15 for twoatomless measures ν and ν , we get: (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω (cid:104) ξ − ξ , T µ → ν − T µ → ν (cid:105) dµ (cid:12)(cid:12)(cid:12)(cid:12) ≤ lW ( ν , ν ) , (3)where ξ and ξ are the Fr´echet derivatives of F evaluatedat ν and ν , respectively, and T µ → ν and T µ → ν are theoptimal transport maps from µ to ν and ν , respectively. Let η i = ∇ (cid:0) δGδν (cid:1)(cid:12)(cid:12) ν i , for i = 1 , , and let φ i = δW ( µ,ν ) δν (cid:12)(cid:12)(cid:12) ν i bethe so-called Kantorovich potential for the transport from ν to µ , for i = 1 , . We now have: (cid:90) Ω (cid:104) η − η , T µ → ν − T µ → ν (cid:105) dµ = (cid:90) Ω (cid:28) τ ∇ φ − τ ∇ φ − ξ + ξ , T µ → ν − T µ → ν (cid:29) dµ = 1 τ (cid:90) Ω (cid:104)∇ φ − ∇ φ , T µ → ν − T µ → ν (cid:105) dµ + (cid:90) Ω (cid:104) ξ − ξ , T µ → ν − T µ → ν (cid:105) dµ ≥ τ (cid:90) Ω | T µ → ν − T µ → ν | dµ − lW ( ν , ν ) ≥ (cid:18) τ − l (cid:19) W ( ν , ν ) , where the penultimate inequality above fol-lows from (3). We have also used the factthat (cid:82) Ω (cid:104)∇ φ − ∇ φ , T µ → ν − T µ → ν (cid:105) dµ = (cid:82) Ω | T µ → ν − T µ → ν | dµ ≥ W ( ν , ν ) (this follows froman application of Lemma 13 in Appendix A), which impliesthat (cid:82) Ω (cid:104)∇ φ − ∇ φ , T ν → ν − id (cid:105) dν = W ( ν , ν ) . Since τ < l , we get that the functional G is strongly convex withparameter τ − l . Assumption 1 ( Atomless proximal descent sequence ). Weassume that the sequence { µ k } k ∈ N generated by (2) is suchthat µ k ∈ P r (Ω) for all k ∈ N . We remark here that sufﬁcient regularity of the functional F and the atomlessness of µ should guarantee validity ofAssumption 1. Since we do not offer a characterization of the regularity of F to this end, we retain Assumption 1 inestablishing the following theorem: Theorem 1 ( Convergence of proximal recursion (2) ). Let Ω ⊆ R d be a compact, convex set, and let F : P (Ω) → R be an l -smooth, strictly geodesically convex functional sat-isfying the Neumann boundary condition ∇ (cid:16) δFδµ (cid:17) · n ≥ on ∂ Ω . Let µ be an absolutely continuous measure. UnderAssumption 1 on the generation of a proximal descent atomlesssequence, and for < τ < /l , the sequence { µ k } k ∈ N ,generated by the proximal recursion (2) , converges weaklyto µ (cid:63) = arg min ν ∈P (Ω) F ( ν ) as k → ∞ .Proof. It follows that: τ W ( µ k , µ k +1 ) + F ( µ k +1 ) ≤ F ( µ k ) ⇐⇒ F ( µ k +1 ) ≤ F ( µ k ) − τ W ( µ k , µ k +1 ) . This implies that for µ k (cid:54) = µ k +1 , we have F ( µ k +1 ) < F ( µ k ) and the sequence { F ( µ k ) } k ∈ N is monotonically strictly de-creasing. In addition, { µ k } k ∈ N is contained in the sublevel set S ( µ ) of F ( µ ) .From Lemma 1, S ( µ ) is convex and compact in the L -Wasserstein space ( P (Ω) , W ) . Thus, there is a weaklyconvergent subsequence { µ k (cid:96) } → (cid:96) µ ∈ S ( µ ) . Consider thefunctional G µ from (2), for µ ∈ P (Ω) , such that G µ ( ν ) = τ W ( µ, ν ) + F ( ν ) . First, note that | G µ k(cid:96) ( ν ) − G µ ( ν ) | = 12 τ | W ( µ, ν ) − W ( µ k (cid:96) , ν ) | = 12 τ ( W ( µ, ν ) + W ( µ k (cid:96) , ν )) | W ( µ, ν ) − W ( µ k (cid:96) , ν ) | , for all (cid:96) . Due to the triangular inequality, for all ν , | W ( µ, ν ) − W ( µ k (cid:96) , ν ) | ≤ W ( µ k (cid:96) , µ ) . Therefore, | G µ k(cid:96) ( ν ) − G µ ( ν ) | ≤ τ ( W ( µ, ν ) + W ( µ k (cid:96) , ν )) W ( µ, µ k (cid:96) ) . In addition, S ( µ ) is a compact set and W is a continuousfunctional, then there is a constant M such that | G µ k(cid:96) ( ν ) − G µ ( ν ) | ≤ MW ( µ, µ k (cid:96) ) , for all ν . Since µ k (cid:96) → (cid:96) µ , this implies the uniform conver-gence of the functionals G µ k(cid:96) ( ν ) to G µ ( ν ) . In particular, thisimplies that for all (cid:15) > , there is an (cid:96) such that for all (cid:96) ≥ (cid:96) , we have | G µ k(cid:96) ( ν ) − G µ ( ν ) | < (cid:15), for all ν . Let µ + = arg min ν G µ ( ν ) , and recall that µ k (cid:96) +1 =arg min ν G µ k(cid:96) ( ν ) . Then, by the min properties: G µ k(cid:96) ( µ k (cid:96) +1 ) ≤ G µ k(cid:96) ( ν ) < G µ ( ν ) + (cid:15) = ⇒ G µ k(cid:96) ( µ k (cid:96) +1 ) ≤ G µ ( µ + ) + (cid:15),G µ ( µ + ) − (cid:15) < G µ ( ν ) − (cid:15) ≤ G µ k(cid:96) ( ν )= ⇒ G µ ( µ + ) − (cid:15) ≤ G µ k(cid:96) ( µ k (cid:96) +1 ) . That is, we have | G µ k(cid:96) ( µ k (cid:96) +1 ) − G µ ( µ + ) | ≤ (cid:15) for all (cid:96) ≥ (cid:96) .The fact that µ is a ﬁxed point for G µ ( ν ) now follows fromthe set of inequalities: G µ ( µ + ) ≤ G µ ( µ ) = F ( µ ) ≤ G µ k(cid:96) ( µ k (cid:96) +1 ) < F ( µ k (cid:96) ) The gap G µ k(cid:96) ( µ k (cid:96) +1 ) − G µ ( µ + ) can be made arbitrarily smallby increasing (cid:96) , so it must be that G µ ( µ ) = F ( µ ) = G µ ( µ + ) ,which implies µ + = µ is the solution to the minimizationproblem of G µ and satisﬁes ∇ (cid:0) δGδν (cid:1) µ = 0 . The equation ∇ (cid:0) δGδν (cid:1) µ = 0 is equivalent to τ ∇ φ µ → µ + ∇ (cid:0) δFδν (cid:1) µ = 0 .Since ∇ φ µ → µ = 0 , then µ is a minimizer of F , and from thestrict geodesic convexity of F we get that the minimizer isunique and µ = µ (cid:63) . Note that we can apply this reasoningto all the accumulation points ˜ µ of the sequence { µ k } . Sinceall the convergent subsequences of { µ k } have the same limit µ (cid:63) and { µ k } is contained in S ( µ ) which is compact, weconclude that the whole sequence { µ k } converges to µ (cid:63) in W , i.e., weakly as k → ∞ .The implementation of (2) can be challenging becauseinvolves the solution of an inﬁnite-dimensional optimizationproblem. To address this, we determine the stochastic processin Ω that equivalently describes the recursion (2). Moreprecisely, consider a proximal recursion in Ω from an initialcondition x ∈ Ω : x k +1 ∈ arg min z ∈ Ω τ | x k − z | + f k ( z ) , (4)where { f k } k ∈ N is a sequence of functions on Ω . Suppose thatthe initial condition x is in fact a random variable distributedaccording to µ (denoted x ∼ µ ). We are interested indeﬁning the process in Ω , through an appropriate choice of { f k } k ∈ N , which results in a consistent transport of the initialmeasure µ according to the recursion (2). Theorem 2 ( Target dynamics in Ω ). Let Ω ⊆ R d be acompact, convex set, and let F : Ω −→ R satisfy the conditionsof Theorem 1. Under Assumption 1, the proximal recursion (2) ,for < τ < /l , starting from µ ∈ P r (Ω) is obtained as thetransport of µ by (4) with x ∼ µ and f k = δFδν (cid:12)(cid:12) µ k +1 , forall k ∈ N .Proof. We rewrite the single-step update in (2) from an abso-lutely continuous probability measure µ ∈ P (Ω) as follows: µ + = arg min ν ∈P (Ω) τ W ( µ, ν ) + F ( ν ) . (5)From Lemma 2 the minimizer µ + in (5) is unique. Let { v (cid:15) } be a smooth one-parameter family of vector ﬁelds such that v = v , where v is any vector ﬁeld on Ω . Now, deﬁnea one-parameter family of absolutely continuous probabilitymeasures { ν (cid:15) } (cid:15) ∈ R by means of ∂ (cid:15) ν (cid:15) + ∇ · ( ν (cid:15) v (cid:15) ) = 0 , subjectto v (cid:15) · n = 0 , and such that ν = µ + . Since µ + is a criticalpoint of the objective function in (5), we have: dd(cid:15) (cid:18) τ W ( µ, ν (cid:15) ) + F ( ν (cid:15) ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 = 1 τ (cid:90) Ω (cid:10) ∇ φ µ + → µ , v (cid:11) dµ + + (cid:90) Ω (cid:104) ξ, v (cid:105) dµ + = (cid:90) Ω (cid:28) τ ∇ φ µ + → µ + ξ, v (cid:29) dµ + , where ξ = ∇ (cid:0) δFδν (cid:1)(cid:12)(cid:12) ν = µ + and ∇ φ µ + → µ = id − T µ + → µ , with T µ + → µ : Ω → Ω being the optimal transport map from µ + to µ . Since (cid:82) Ω (cid:10) τ ∇ φ µ + → µ + ξ, v (cid:11) dµ + = 0 for all v , it impliesthat τ ∇ φ µ + → µ + ξ = 0 ( µ + a.e. in Ω ), and we obtain: τ ∇ φ µ + → µ + ξ = 1 τ (cid:0) id − T µ + → µ (cid:1) + ξ = 0 , which implies that: T µ + → µ = id + τ ξ. (6)Let ϕ = (cid:0) δFδν (cid:1)(cid:12)(cid:12) ν = µ + . For any y ∈ Ω and τ < /l , consider: y + = arg min z ∈ Ω τ | y − z | + ϕ ( z ) (cid:124) (cid:123)(cid:122) (cid:125) (cid:44) g y ( z ) . (7)The uniqueness of the minimizer above follows from thestrong convexity of g y for τ < /l (this can be veriﬁed byfollowing a similar procedure as in the proof of Lemma 2, butnow in the Euclidean space). If y + ∈ ˚Ω is a critical point of g y in (7), then it satisﬁes y + = y − τ ∇ ϕ ( y + ) . Since ξ = ∇ ϕ ,we can equivalently write y + = (id + τ ξ ) − ( y ) . That is, whenthe image of y ∈ Ω under the arg min map in (7) is a criticalpoint in the interior of Ω , then it is also the inverse image of y under the optimal transport map T µ + → µ .Now, for a y ∈ ˚Ω , the inner product of the gradientof g y at any point z ∈ ∂ Ω on the boundary of Ω withthe outward normal n to ∂ Ω at z is given by ∇ g y · n = (cid:0) τ ( z − y ) + ∇ ϕ ( z ) (cid:1) · n = τ ( z − y ) · n > , since ∇ ϕ · n = 0 and z − y points outward to Ω (as z ∈ ∂ Ω and y ∈ ˚Ω and Ω isconvex). This implies that there exists a point ˜ z in the interiorof Ω in a neighborhood of z such that g y (˜ z ) < g y ( z ) , whichimplies that z cannot be the minimizer. Thus, for any y ∈ ˚Ω ,the minimizer of g y ( z ) = τ | y − z | + ϕ ( z ) cannot lie on theboundary ∂ Ω , and must therefore lie in the interior of Ω andbe a critical point of the objective function g y . Now, when y ∈ ∂ Ω , if y + / ∈ ˚Ω , it must be that y + = y (otherwise weobtain a contradiction for the same reason as above, the innerproduct of ∇ g y with the outward normal would be strictlypositive) and the argmin map (and the optimal transport map)coincides with the identity map in this case.It then follows that for any y ∈ Ω , its image y + under the argmin map is exactly its inverse image under the optimaltransport map T µ + → µ . That is, the map in (7) is the inverseof the optimal transport map T µ + → µ . Thus, we have that themap T µ + → µ = id + τ ξ is well-deﬁned and so is its inverse,it holds that (cid:0) T µ + → µ (cid:1) − µ = (id + τ ξ ) − µ = µ + , and (5) isthe lift to the space of probability measures of (7).We therefore conclude that the proximal recursion (2) start-ing from µ is the transport of µ by (4) with x ∼ µ . From a computational perspective, Theorem 2 still requiresthe evaluation of the ﬁrst variation δFδν at µ k +1 , the transportedmeasure at the future time instant k + 1 . To circumvent thisproblem, we can alternatively consider the dynamics (4) withthe choice of ˜ f k = δFδν (cid:12)(cid:12) µ k , which only requires the evaluation,at time instant k , of the ﬁrst variation δFδν at µ k . Consider the l -smooth, geodesically-convex (linear) (cid:101) F ( ν ) = E ν (cid:20) δFδµ (cid:12)(cid:12)(cid:12) µ k (cid:21) ,for ν ∈ Ω , which satisﬁes δ (cid:101) Fδν = δFδµ (cid:12)(cid:12)(cid:12) µ k . It follows fromTheorem 2 that the descent in P (Ω) corresponding to (4) with ˜ f k = δ (cid:101) Fδν (cid:12)(cid:12)(cid:12) µ k is given by: µ k +1 ∈ arg min ν ∈P (Ω) τ W ( µ k , ν ) + E ν (cid:34) δFδµ (cid:12)(cid:12)(cid:12)(cid:12) µ k (cid:35) . (8)The convergence of (8) can also be established as follows: Theorem 3 ( Convergence of recursion (8) ). Let F : Ω → R satisfy the conditions of Theorem 1. The sequence { µ k } k ∈ N obtained as the transport of measure µ ∈ P r (Ω) by (8) with τ < /l , x ∼ µ and the choice ˜ f k = δ (cid:101) Fδν (cid:12)(cid:12)(cid:12) µ k , convergesweakly to µ (cid:63) = arg min ν ∈P (Ω) F ( ν ) as k → ∞ .Proof. Suppose that { µ k } is a sequence derived from (8).From the l -smoothness of F and Lemma 15 (with µ k +1 asthe reference measure), we have: (cid:90) Ω (cid:42) ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k − δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k +1 (cid:33) , T µ k +1 → µ k − id (cid:43) dµ k +1 ≤ lW ( µ k , µ k +1 ) . By Lemma 2, we have that the objective functional in (8) isstrongly convex and therefore has a unique minimizer, since E ν (cid:20) δFδµ (cid:12)(cid:12)(cid:12) µ k (cid:21) is linear in ν for a given µ k . Following similarsteps as in the proof of Theorem 1 to characterize the criticalpoint of (8), we get that T µ k +1 → µ k = id + τ ∇ (cid:16) δFδν (cid:12)(cid:12) µ k (cid:17) , andby substitution in the above, we obtain: τ (cid:90) Ω (cid:42) ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k − δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k +1 (cid:33) , ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k (cid:33)(cid:43) dµ k +1 ≤ lW ( µ k , µ k +1 ) . Therefore, it follows that: τ (cid:90) Ω (cid:42) ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k +1 (cid:33) , ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k (cid:33)(cid:43) dµ k +1 ≥ (cid:18) τ − l (cid:19) W ( µ k , µ k +1 ) , where we have used the fact that: τ (cid:90) Ω (cid:42) ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k (cid:33) , ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k (cid:33)(cid:43) dµ k +1 = 1 τ (cid:90) Ω (cid:10) T µ k +1 → µ k − id , T µ k +1 → µ k − id (cid:11) dµ k +1 = 1 τ W ( µ k , µ k +1 ) . Moreover, from the convexity of F and Lemma 17 (with µ k +1 as the reference measure) we have: F ( µ k ) ≥ F ( µ k +1 )+ (cid:90) Ω (cid:42) ∇ (cid:32) δFδν (cid:12)(cid:12)(cid:12)(cid:12) µ k +1 (cid:33) , T µ k +1 → µ k − id (cid:43) dµ k +1 . Substituting in the latest inequality, we obtain: F ( µ k ) ≥ F ( µ k +1 ) + (cid:18) τ − l (cid:19) W ( µ k , µ k +1 ) . From this inequality, we deduce that µ k +1 belongs to the F -sublevel set of µ k , and consequently that the sequence { µ k } k ∈ N is contained in S ( µ ) , the F -sublevel set of µ . Fromhere, following similar steps as in the proof of Theorem 1,we conclude that the sequence { µ k } k ∈ N is convergent and lim K →∞ W ( µ K , ¯ µ ) = 0 for some ¯ µ ∈ S ( µ ) . As thesequence { µ k } k ∈ N is generated by (8), the limit ¯ µ must beone of its ﬁxed points, again following similar reasoning as inTheorem 1. Since F is strictly convex, we get that the onlyﬁxed point of (8) is µ (cid:63) . We therefore have ¯ µ = µ (cid:63) .Now Theorem 2 allows us to consider the transport in P (Ω) given by the following proximal scheme in Ω : x + = arg min z ∈ Ω τ | x − z | + f ( z ) , (9)where x ∼ µ and f = δFδν (cid:12)(cid:12) µ . This scheme is convergentaccording to Theorem 3.IV. M ULTI - AGENT PROXIMAL DESCENT ALGORITHMS

In this section, we bring the sample-based, proximal descentschemes of the previous section to a form that is closer to themore familiar multi-agent cooperative control algorithms. Weachieve this by a direct discretization of the functional. Bydoing so, we are able to retain some convergence properties ofthe algorithms, as shown in this section. We then show that, inthe limit of space and time discretizations, the correspondingalgorithm recovers the lost properties.We start by describing the multi-agent system by an appro-priate probability distribution. Recall that the conﬁguration ofthe collective is given by x = ( x , . . . , x N ) , with x i ∈ Ω for i ∈ { , . . . , N } . Let (cid:98) µ N x = N (cid:80) Ni =1 δ x i , be the discretemeasure in P (Ω) corresponding to the conﬁguration x . Fora macroscopic description of the transport, we ﬁrst let themacroscopic conﬁguration be speciﬁed by an absolutely con-tinuous probability measure, and since (cid:98) µ N x is is not absolutelycontinuous, we consider an alternative absolutely continuousprobability measure (cid:98) µ h,N x through its density function using asmooth kernel, as follows: (cid:98) µ h,N x ( x ) = 1 N N (cid:88) i =1 K h ( x − x i ) , (10)where h > is the bandwidth of the kernel. With a slightabuse of notation, we allow (cid:98) µ h,N x to denote both the absolutelycontinuous measure and its corresponding density function.We also denote, for x ∈ Ω , (cid:98) µ h, x simply by (cid:98) µ hx . Thus, we have (cid:98) µ h,N x = N (cid:80) Ni =1 (cid:98) µ hx i , for x ∈ Ω N . Assumption 2 ( Properties of kernel and kernel-based mea-sures ). For h > and z ∈ Ω and a kernel-based probabilitymeasure (cid:98) µ hz deﬁned as in (10) for N = 1 , the following hold: (i) Smoothness: The kernel K h is smooth, K h ∈ C ∞ (Ω) , forevery h > . (ii) Monotonicity of support: For any z ∈ Ω and h < h ,we let supp (cid:0)(cid:98) µ h z (cid:1) ⊂ supp (cid:0)(cid:98) µ h z (cid:1) . (iii) Containment: For every h > , there exists a set ˜Ω h ⊂ Ω (relatively) open, such that for z ∈ ˜Ω h , the support of the mea-sure (cid:98) µ hz satisﬁes supp( (cid:98) µ hz ) ⊂ Ω . Moreover, lim h → ˜Ω h = Ω in Hausdorff distance. (iv) Total variation convergence: Let M be the spaceof all measureable functions over Ω . It holds that lim h → sup f ∈M (cid:8)(cid:82) Ω f ( z ) K h ( x − z ) dvol( z ) − f ( x ) (cid:9) = 0 ,that is, the kernel-based measure converges uniformly to theDirac measure as h → . An example kernel for (10) that satisﬁes Assump-tion 2 is the truncated Gaussian kernel restricted to anopen ball B h ( x i ) of radius h centered at x i , given by K h ( x − x i ) = C exp (cid:16) −| x − x i | h (cid:17) B h ( x i ) ( x ) , where C = (cid:82) B h ( x i ) exp (cid:16) −| x − x i | h (cid:17) dvol( x ) is the normalizing constant. A. Discretization of functional F and its properties We deﬁne an aggregate objective function F h,N for themulti-agent system as the discretization of the functional F ,for h > , as follows: F h,N ( x ) = F ( (cid:98) µ h,N x ) , (11)and, subsequently, analyze its properties. First note that F h,N is invariant under permutations, that is, for x ∈ ˜Ω Nh and P ∈ R N × N a permutation, we have F h,N ( x ) = F h,N (( P ⊗ I d ) x ) .The following lemma establishes the almost sure convergenceof the F h,N to F as h → , N → ∞ : Lemma 3 ( Convergence as h → , N → ∞ ). LetAssumption 2 and the Fr´echet differentiability of the func-tional F hold, and let x i ∼ µ for i ∈ { , . . . , N } ,independent and identically distributed. Then, we have lim h → lim N →∞ F h,N ( x , . . . , x N ) = F ( µ ) , µ -almost surely.Proof. We ﬁrst recall that F h,N ( x ) = F ( (cid:98) µ h,N x ) . By theGlivenko-Cantelli Theorem [36] and Assumption 2-(iv), wehave: lim h → ,N →∞ sup f ∈M (cid:110) E (cid:98) µ h,N x [ f ] − E µ [ f ] (cid:111) = 0 , a.s. We denote the above as (cid:98) µ h,N x → u . a . s µ , i.e., (cid:98) µ h,N x convergesuniformly almost surely to µ as h → and N → ∞ .Note that this implies the (almost sure) weak convergence of { (cid:98) µ h,N x } to µ . Therefore, by continuity of F in the topologyof weak convergence (which follows from the fact that F is Fre´chet differentiable in the L -Wasserstein space), wehave lim h → ,N →∞ F h,N ( x ) = lim h → ,N →∞ F ( (cid:98) µ h,N x ) = F (lim h → ,N →∞ (cid:98) µ h,N x ) = F ( µ ) , almost surely.The following lemma relates the derivative of the function F h,N to the Fr´echet derivative of the functional F : Lemma 4 ( Derivative of F h,N ). Let Assumption 2 and theFr´echet differentiability of the functional F hold, and let h > with set ˜Ω h as in Assumption 2-(iii). For x = ( z, η ) ∈ ˜Ω h × ˜Ω N − h , we have that the derivative of the function F h,N satisﬁes: ∂ F h,N ( z, η ) = 1 N (cid:90) supp( (cid:98) µ hz ) ∇ ϕ h,N x d (cid:98) µ hz , where d (cid:98) µ hz = ρ hz dvol with ρ hz ( x ) = K ( x − z, h ) , ϕ h,N x = δFδν | (cid:98) µ h,N x and ∂ denotes the derivative w.r.t the ﬁrst argument.Proof. Let x ( t ) = ( x ( t ) , . . . , x N ( t )) be a curve in ˜Ω Nh parametrized by t ∈ R , with ˙ x (0) = v = ( v , . . . , v N ) , where v i ∈ R d for all i ∈ { , . . . , N } . As F h,N is differentiable,partial derivatives exist and we can write: ddt F h,N ( x (0)) = N (cid:88) i =1 (cid:10) ∂ i F h,N ( x (0)) , v i (cid:11) . Since F h,N ( x ) = F ( (cid:98) µ h,N x ) , using the Fr´echet derivative of F ,we can write: ddt F h,N ( x (0)) = 1 N N (cid:88) i =1 (cid:90) Ω (cid:68) ∇ ϕ h,N x (0) , v i (cid:69) d (cid:98) µ hx i (0) = 1 N N (cid:88) i =1 (cid:28)(cid:90) Ω ∇ ϕ h,N x (0) d (cid:98) µ hx i (0) , v i (cid:29) . This holds for all v = ( v , . . . , v N ) and x (0) ∈ ˜Ω Nh , thus, byuniqueness of the partial derivatives, it holds that: ∂ i F h,N ( x ) = 1 N (cid:90) Ω ∇ ϕ h,N x d (cid:98) µ hx i (0) , where ∂ i denotes the derivative w.r.t. the i th argument, and weconsider any x (0) ∈ ˜Ω Nh . From the previous expression: ∂ F h,N ( z, η ) = 1 N (cid:90) Ω ∇ ϕ h,N x d (cid:98) µ hz = 1 N (cid:90) supp( (cid:98) µ hz ) ∇ ϕ h,N x d (cid:98) µ hz , where z ∈ ˜Ω h , η ∈ ˜Ω N − h , d (cid:98) µ hz = ρ hz dvol with ρ hz ( x ) = K ( x − z, h ) , and ϕ h,N x = δFδν | (cid:98) µ h,N x , and the result follows.From the invariance of F h,N under permutations, the ex-pression in Lemma 4 holds for the partial derivative of F h,N w.r.t every component of x . Lemma 5 ( α -smoothness of F h,N ). Let Assumption 2 and l -smoothness of F hold. Then there exists an α > such that F h,N is α -smooth.Proof. From l -smoothness of F , we have that the function ϕ = δFδν (cid:12)(cid:12) µ is continuously differentiable on Ω for all µ . Wenote that for x, y ∈ ˜Ω h , (cid:98) µ hy ( z ) = (cid:98) µ hx ( z + ( x − y )) for all z ∈ supp (cid:0)(cid:98) µ hy (cid:1) . For any x ∈ ˜Ω Nh , we use ( x i , x − i ) ∈ ˜Ω h × ˜Ω N − h to denote the vector with its ﬁrst entry equal to the i th component of x and all others equal to the remaining N − components of x . We now have: (cid:13)(cid:13)(cid:13) ∇ F h,N ( y ) − ∇ F h,N ( x ) (cid:13)(cid:13)(cid:13) = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 | ∂ F h,N ( y i , y − i ) − ∂ F h,N ( x i , x − i ) | = 1 N (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω ∇ ϕ h,N y ( z ) d (cid:98) µ hy i ( z ) − (cid:90) Ω ∇ ϕ h,N x ( z ) d (cid:98) µ hx i ( z ) (cid:12)(cid:12)(cid:12)(cid:12) = 1 N (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω (cid:104) ∇ ϕ h,N y ( z + ( y i − x i )) − ∇ ϕ h,N x ( z ) (cid:105) d (cid:98) µ hx i ( z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) Ω (cid:12)(cid:12)(cid:12) ∇ ϕ h,N y ( z ) − ∇ ϕ h,N x ( z ) (cid:12)(cid:12)(cid:12) d (cid:98) µ h,N x ( z )+ 1 N N (cid:88) i =1 (cid:90) Ω (cid:12)(cid:12)(cid:12) ∇ ϕ h,N y ( z + ( y i − x i )) − ∇ ϕ h,N y ( z ) (cid:12)(cid:12)(cid:12) d (cid:98) µ hx i ( z ) ≤ lW ( (cid:98) µ h,N x , (cid:98) µ h,N y ) + M (cid:107) y − x (cid:107)≤ α (cid:107) y − x (cid:107) , where the penultimate inequality results from the l -smoothnessof F (which implies ϕ has a Lipschitz-continuous gradient inexpectation). Moreover, the ﬁnal inequality results from thefact that W ( (cid:98) µ h,N x , (cid:98) µ h,N y ) ≤ (cid:107) y − x (cid:107) .In what follows, we will make the following assumptioncharacterize the behavior of the discretization F h,N along theboundary through the following assumption: Assumption 3 ( Boundary conditions ). The function F h,N isFr´echet differentiable and its derivative satisﬁes the boundarycondition ∂ F h,N ( z, ξ ) · n ( z ) = 0 for z ∈ ∂ ˜Ω h and all ξ ∈ ˜Ω N − h . In general, note that F h,N : Ω N → R is nonconvex in spiteof being the discretization of a strictly geodesically convexfunctional F : P (Ω) → R . This is because the notion ofconvexity of functions over Ω N , which is the domain of thefunction F h,N , is not implied by the notion of geodesic con-vexity over the space of probability measures over Ω . In thisway, for x , y ∈ Ω N with (cid:80) Ni =1 1 N δ x i , (cid:80) Ni =1 1 N δ y i ∈ P (Ω) being the corresponding discrete measures, the supports ofthe geodesics (when they exist) between (cid:80) Ni =1 1 N δ x i and (cid:80) Ni =1 1 N δ y i in P (Ω) do not necessarily correspond to thestraight line segment between x and y in Ω N . In what follows,we identify a condition that can guarantee convexity of thediscretized functional. We note that this condition is employedlater to prove the convergence of the discrete algorithms tolocal minimizers. Deﬁnition 1 ( Cyclical monotonicity ). A set Γ ⊂ Ω × Ω is cyclically monotone if any sequence { ( x i , y i ) } Ni =1 , with ( x i , y i ) ∈ Γ , satisﬁes: N (cid:88) i =1 | x i − y i | ≤ N (cid:88) i =1 | x i − y σ ( i ) | , where σ is any permutation, σ ∈ Σ N . For δ > , we deﬁne a subset ∆ δ ⊂ Ω N as follows: ∆ δ = (cid:110) z = ( z , . . . , z N ) ∈ ˚Ω N (cid:12)(cid:12)(cid:12) | z i − z j | > δ, ∀ i (cid:54) = j (cid:111) . For every x ∈ ∆ δ , we now deﬁne a set Γ x ⊂ Ω N such thatfor all y ∈ Γ x , we have: N (cid:88) i =1 | x i − y i | ≤ N (cid:88) i =1 | x i − y σ ( i ) | , for any permutation σ . In other words, Γ x is the subset of Ω N such that for any y ∈ Γ x , { ( x i , y i ) } Ni =1 is cyclically monotone.We now establish through the following lemma that the set Γ x contains an open neighborhood of x : Lemma 6 ( Γ x contains an open neighborhood of x ). Forany δ > and x ∈ ∆ δ , there exists an open neighborhood N ( x ) ⊂ Ω N of x such that N ( x ) ⊂ Γ x .Proof. For x ∈ ∆ δ ⊂ Ω N , let y ∈ ˚Ω N such that for all i ∈ { , . . . , N } , we have y i ∈ B δ/ ( x i ) , where B δ/ ( x i ) is theopen δ/ -ball centered at x i ∈ Ω . Now for any j ∈ { , . . . , N } with j (cid:54) = i , we have | y i − x j | = | y i − x i + x i − x j | ≥| x i − x j | − | y i − x i | > δ − δ/ > δ/ , since | x i − x j | > δ as x ∈ ∆ δ and | y i − x i | < δ/ . Thus, among all (non-identity)permutations σ , we have: N N (cid:88) i =1 | x i − y σ ( i ) | > δ > N N (cid:88) i =1 | x i − y i | . Thus, we infer that y ∈ Γ x for an arbitrary y ∈ Ω N ∩ Π Ni =1 B δ/ ( x i ) , and the result follows.From Lemma 6 that for x ∈ ∆ δ with a given δ > , thereis a ¯ h δ such that for all < h < ¯ h δ , the supports of thecomponents (cid:98) µ hx i of the measure (cid:98) µ h,N x can be made disjoint. Lemma 7 ( Relaxation to atomless measures ). For any δ > and x ∈ ∆ δ and y ∈ Γ x , there is ¯ h δ > such that for ≤ h ≤ ¯ h δ and the measures (cid:98) µ h,N x , (cid:98) µ h,N y deﬁned in (10) ,the optimal transport map T (cid:98) µ h,N x → (cid:98) µ h,N y from (cid:98) µ h,N x to (cid:98) µ h,N y satisﬁes: (cid:16) T (cid:98) µ h,N x → (cid:98) µ h,N y − id (cid:17) ( z ) = y i − x i , ∀ z ∈ supp (cid:0)(cid:98) µ hx i (cid:1) . Proof.

The proof applies a generalization of Brenier’s Theo-rem in [37]. We consider convex functions χ i : Ω → R , for i ∈ { , . . . , N } deﬁned by: χ i ( z ) = 12 | z + y i − x i | . We note that the gradient of χ i , ∇ χ i ( z ) = z + y i − x i deﬁnes a map that transports the measure (cid:98) µ hx i to (cid:98) µ hy i simply bytranslation. In addition, this mapping deﬁnes a measure withcyclically monotone support and marginals (cid:98) µ h,N x and (cid:98) µ h,N y . Bythe generalization of Brenier’s Theorem [37] (c.f. Theorem 12and extensions on uniqueness) a measure that has cyclicmonotone support is both unique and optimal in the Monge-Kantorovich sense. Thus it coincides with the measure deﬁnedby the χ i and the statement of the lemma follows.In this way, Lemma 7 essentially establishes that for x ∈ ∆ δ and any y ∈ Γ x , the optimal transport from (cid:98) µ h,N x to (cid:98) µ h,N y issimply achieved by the translation of components (cid:98) µ hx i alongthe rays y i − x i to (cid:98) µ hy i for each i ∈ { , . . . , N } . Corollary 1 ( L -Wasserstein distance ). For any δ > and x ∈ ∆ δ and y ∈ Γ x , there is a ¯ h δ > such that for any < h ≤ ¯ h δ : W (cid:0)(cid:98) µ h,N x , (cid:98) µ h,N y (cid:1) = 1 N N (cid:88) i =1 | x i − y i | . With the above results we now establish the following:

Lemma 8 ( Comparison lemma for F h,N on cyclically mono-tone sets ). Let F be a Fr´echet differentiable and geodesicallyconvex functional (in the sense of Deﬁnition 7). For any δ > , x ∈ ∆ δ , h ∈ (0 , ¯ h δ ] and y ∈ Γ x : F h,N ( y ) ≥ F h,N ( x ) + (cid:10) ∇ F h,N ( x ) , y − x (cid:11) . Proof.

For x ∈ ∆ δ and y ∈ Γ x , using the geodesic convexityof the functional F and Lemma 17 with (cid:98) µ h,N x as the referencemeasure, it follows that: F h,N ( y ) = F ( (cid:98) µ h,N y ) ≥ F ( (cid:98) µ h,N x ) + (cid:90) Ω (cid:68) ∇ ϕ h,N x , T (cid:98) µ h,N x → (cid:98) µ h,N y − id (cid:69) d (cid:98) µ h,N x = F ( (cid:98) µ h,N x ) + 1 N N (cid:88) i =1 (cid:90) Ω (cid:68) ∇ ϕ h,N x , T (cid:98) µ h,N x → (cid:98) µ h,N y − id (cid:69) d (cid:98) µ hx i = F ( (cid:98) µ h,N x ) + 1 N N (cid:88) i =1 (cid:90) supp( (cid:98) µ hxi ) (cid:68) ∇ ϕ h,N x , T (cid:98) µ h,N x → (cid:98) µ h,N y − id (cid:69) d (cid:98) µ hx i = F ( (cid:98) µ h,N x ) + 1 N N (cid:88) i =1 (cid:90) supp( (cid:98) µ hxi ) (cid:10) ∇ ϕ h,N x , y i − x i (cid:11) d (cid:98) µ hx i = F ( (cid:98) µ h,N x ) + 1 N N (cid:88) i =1 (cid:42)(cid:90) supp( (cid:98) µ hxi ) ∇ ϕ h,N x d (cid:98) µ hx i , y i − x i (cid:43) = F h,N ( x ) + N (cid:88) i =1 (cid:10) ∂ F h,N ( x i , x − i ) , y i − x i (cid:11) , thereby establishing the claim.We remark here that F h,N is convex in the limited senseestablished by the comparison result in Lemma 8, and thisdoes not necessarily generalize to the entire domain Ω N , dueto which the function F h,N can be non-convex in general. B. Multi-agent proximal descent algorithms

We formulate the proximal descent algorithm on the func-tion F h,N as follows: x + ∈ arg min z ∈ ˜Ω Nh τ (cid:107) x − z (cid:107) + F h,N ( z ) . (12)Even though F h,N is in general nonconvex, we can establishstrong convexity of the proximal descent objective functionin (12) under some conditions through the following lemma: Lemma 9 ( Strong convexity of objective function ). For an α -smooth function F h,N , the function G h,N x ( z ) = τ (cid:107) x − z (cid:107) + F h,N ( z ) is (cid:0) τ − α (cid:1) -strongly convex for < τ < α .Proof. From Lemma 5 on α -smoothness of F h,N , we have: (cid:12)(cid:12)(cid:10) ∇ F h,N ( y ) − ∇ F h,N ( x ) , y − x (cid:11)(cid:12)(cid:12) ≤ α (cid:107) y − x (cid:107) . With G h,N x ( z ) = τ (cid:107) x − z (cid:107) + F h,N ( z ) , we have: (cid:68) ∇ G h,N x ( z ) − ∇ G h,N x ( z ) , z − z (cid:69) = (cid:28) τ ( z − z ) + ∇ F h,N ( z ) − ∇ F h,N ( z ) , z − z (cid:29) = 1 τ (cid:107) z − z (cid:107) + (cid:68) ∇ F h,N ( z ) − ∇ F h,N ( z ) , z − z (cid:69) ≥ τ (cid:107) z − z (cid:107) − α (cid:107) z − z (cid:107) = (cid:18) τ − α (cid:19) (cid:107) z − z (cid:107) , thereby establishing the claim.It follows from Lemma 9 that the minimizer in (12) isunique for α -smooth F h,N and sufﬁciently small τ . Now,with x − i = ( x , . . . , x i − , x i +1 , . . . , x N ) ∈ ˜Ω N − h , we canwrite F h,N ( x , . . . , x N ) = N (cid:80) Ni =1 F h,N ( x , . . . , x N ) = N (cid:80) Ni =1 F h,N ( x i , x − i ) . By means of this decomposition, theproximal gradient descent (12) can be decomposed into thefollowing agent-wise update, for i ∈ { , . . . , N } : x + i = arg min z ∈ Ω h τ | x i − z | + F h,N ( z, x + − i ) . where Ω h is the closure of ˜Ω h . Note that the above schemerequires x + − i . In other words, to implement the above al-gorithm, every agent i , at time k , requires the positions ofthe other agents at a future time k + 1 , posing a hurdlefor implementation. To avoid this problem, we consider thefollowing proximal descent scheme: x + i = arg min z ∈ Ω h τ | x i − z | + F h,N ( z, x − i ) , (13)for every i ∈ { , . . . , N } . It follows from Lemma 9 that theobjective function in (13) is also strongly convex, and therebyhas a unique minimizer. We now present the following resulton the convergence of (13) to the local minimizers of F h,N : Theorem 4 ( Convergence of (13) to critical points of F h,N ). Let F h,N be α -smooth and satisfy Assumption 3. For τ < α ,the sequence { x ( k ) } k ∈ N generated by the update scheme (13) converges to a critical point x ∗ of F h,N that is not a localmaximizer, for all initial conditions x (0) ∈ Ω Nh . Moreover, ifthe critical point x ∗ ∈ ∆ δ for some δ > and h ∈ (0 , ¯ h δ ] ,then x ∗ is a local minimizer.Proof. We ﬁrst consider the objective function in (13), J i ( z ) = τ | x i − z | + F h,N ( z, x − i ) , with z ∈ Ω h . The inner productof the gradient of J i on z ∈ ∂ Ω h with the outward normal ˜ n to ∂ Ω h , is given by: ∇ J i ( z ) · ˜ n ( z ) = 1 τ ( z − x i ) · ˜ n ( z ) + ∂ F h,N ( z, x − i ) · ˜ n ( z )= 1 τ ( z − x i ) · ˜ n ( z ) ≥ , with the inequality being strict when x i / ∈ ∂ Ω h . This impliesthat the x + i ∈ ∂ Ω h cannot be a minimizer if x i / ∈ ∂ Ω h , and if x i ∈ ∂ Ω h , we will have x + i = x i . In both cases, the minimizer x + i is also a critical point of the function J i . This allows usto express (13) equivalently by: x + i = x i − τ ∂ F h,N ( x + i , x − i ) . (14) We note that in the limit τ → , we get a gradient ﬂow that canbe shown to converge to a critical point of F h,N . We thereforehope that this property is preserved over a neighborhood of τ = 0 . In what follows, we establish that this is indeed the caseand provide a sufﬁcient strict upper bound on τ for which theproperty is preserved. From α -smoothness of F h,N , we get: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F h,N ( x + ) − F h,N ( x ) − N (cid:88) i =1 (cid:68) ∂ F h,N ( x i , x − i ) , x + i − x i (cid:69)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ α (cid:107) x + − x (cid:107) . We can rewrite the above as: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F h,N ( x + ) − F h,N ( x ) − N (cid:88) i =1 (cid:68) ∂ F h,N ( x + i , x − i ) , x + i − x i (cid:69) − N (cid:88) i =1 (cid:68) ∂ F h,N ( x i , x − i ) − ∂ F h,N ( x + i , x − i ) , x + i − x i (cid:69)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ α (cid:107) x + − x (cid:107) . By (14), we now have − (cid:80) Ni =1 (cid:10) ∂ F h,N ( x + i , x − i ) , x + i − x i (cid:11) = τ (cid:107) x + − x (cid:107) and by the α -smoothness of F h,N : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 (cid:10) ∂ F h,N ( x i , x − i ) − ∂ F h,N ( x + i , x − i ) , x + i − x i (cid:11)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ α (cid:107) x + − x (cid:107) . From the above inequalities, we therefore obtain: F h,N ( x + ) ≤ F h,N ( x ) − (cid:18) τ − α (cid:19) (cid:107) x + − x (cid:107) . Thus, for τ < α , when every agent follows the update (13),we get a descent in F h,N , and x + belongs to the F h,N -sublevel set of x . We can express the above inequality forany time instant k ∈ N as: F h,N ( x ( k + 1)) ≤ F h,N ( x ( k )) − (cid:18) τ − α (cid:19) (cid:107) x ( k + 1) − x ( k ) (cid:107) . Summing over k = 0 , . . . , K − , we obtain: F h,N ( x ( K )) ≤ F h,N ( x (0)) − (cid:18) τ − α (cid:19) K (cid:88) k =1 (cid:107) x ( k ) − x ( k − (cid:107) , and it follows that: K (cid:88) k =1 (cid:107) x ( k ) − x ( k − (cid:107) ≤ (cid:18) τ − α (cid:19) (cid:16) F h,N ( x (0)) − F h,N ( x ( K )) (cid:17) . Since the sequence { x ( k ) } k ∈ N belongs to the F h,N -sublevelset of x (0) (for all x (0) ∈ Ω Nh ), which is a subset of Ω Nh (compact), it is precompact. By the boundedness above, in thelimit K → ∞ , we get lim K →∞ (cid:107) x ( K ) − x ( K − (cid:107) = 0 .Since Ω h is compact, there is a convergent subsequence { x ( k (cid:96) ) } to a point x ∈ Ω Nh . Given x , deﬁne the mapping G h,N x ( z ) = (cid:18) τ − α (cid:19) (cid:107) x − z (cid:107) + F h,N ( z ) , z ∈ Ω Nh . Let x + be the next iteration of (14) from x . Then, from theabove, G h,N x ( x + ) ≤ F h,N ( x ) = G h,N x ( x ) . Due to the factthat x ( k (cid:96) ) converges to x , we also have that G h,N x ( x ) = F h,N ( x ) ≤ G h,N x ( k (cid:96) ) ( x ( k (cid:96) + 1)) , for all (cid:96) . Following similarsteps as in the proof of Theorem 1, one can ﬁnd a constant M such that | G h,N x ( z ) − G h,N x ( k (cid:96) ) ( z ) | ≤ M (cid:107) x − x ( k (cid:96) ) (cid:107) for all z ∈ Ω Nh . This implies that | G h,N x ( x + ) − G h,N x ( k (cid:96) ) ( x ( k (cid:96) + 1)) | ≤ (cid:15) ,for all (cid:96) ≥ (cid:96) . It is easy to see that G h,N x ( x + ) ≤ G h,N x ( x ) ≤ G h,N x ( k (cid:96) ) ( x ( k (cid:96) + 1)) holds, and thus G h,N x ( x + ) = F h,N ( x ) ,which can only happen when x + = x . In other words, x is aﬁxed point of (14), and we thereby get: ∂ F h,N ( x i , x − i ) = 0 , ∀ i ∈ { , . . . , N } , and ∇ F h,N ( x ) = 0 . From here, the point x cannot be a localmaximizer since { F h,N ( x ( k (cid:96) )) } k ∈ N is decreasing and lower-bounded by F h,N ( x ) and, consequently, every neighborhoodof x contains at least one point with a higher value of F h,N .Note that this conclusion applies for every accumulation pointof the entire sequence { x ( k ) } k ∈ N .Finally, suppose that an accumulation point x satisﬁes x ∈ ∆ δ , for some δ > and h ∈ (0 , ¯ h δ ] . From Lemmas 8 and 6,we conclude that there exists an open ball B ( x ) ⊂ Ω N suchthat for all x ∈ B ( x ) , we have F h,N ( x ) ≥ F h,N ( x ) , whichimplies that x must be a local minimizer.Theorem 4 establishes the convergence of (13) to criticalpoints of the function F h,N , which are not necessarily localmaximizers. This is a weaker result than Theorem 2, whichestablished convergence of the transport scheme (9) to theglobal minimizer µ (cid:63) of F . The guarantee is weakened after thediscretization of F , which is involved in deﬁning the multi-agent transport scheme (the convergence results for F employthe convexity properties of F , which are lost by F h,N .) How-ever, we can still hope to achieve the convergence to the globalminimizer in the limit of particle and time discretizations,thereby guaranteeing best performance asymptotically. In thesection that follows, we evaluate this possibility. C. Continuous-time and many-particle limits

We now derive the continuous-time and many-particle limitsfor the multi-agent transport scheme (13), retrieving (9) from(13) as N → ∞ and h → limit. We know from Theorem 2that transport of a probability measure µ by (9), which isidentical to the following: x + = arg min z ∈ Ω τ | x − z | + ϕ ( z ) ,x ∼ µ. (15)with ϕ ≡ δFδν (cid:12)(cid:12) µ , is guaranteed to converge to the globalminimizer µ ∗ of F . Informally, we see that as τ → in (15),we have that x + → x and we let v ( x ) = lim τ → x + − xτ = −∇ ϕ ( x ) . We can thus expect the solutions to (15) convergeto the solution of the gradient ﬂow under the vector ﬁeld v = −∇ ϕ . We now show, in a weak sense, that the abovereasoning holds. We observe that the vector ﬁeld v = −∇ ϕ satisﬁes a zero-ﬂux boundary condition v · n = ∇ ϕ · n = 0 on ∂ Ω owing to the deﬁnition of the functional F . Proposition 1 ( Model of transport in the continuous time andmany-particle limits ). Let Ω and F satisfy the assumptions ofTheorem 1. The following hold: (i) Convergence of update scheme: The scheme (13) convergesin distribution to (15) in the limit N → ∞ . (ii) Gradient ﬂow: For every decreasing sequence { τ n } n ∈ N satisfying τ < l and lim n →∞ τ n = 0 , the sequenceof solutions { x n } n ∈ N to (15) with corresponding { τ n } n ∈ N contains a convergent subsequence, and the limit is a weaksolution to the gradient ﬂow: ∂ t X t ( x ) = −∇ ϕ t ( X t ( x )) , (16) with X ( x ) = x , µ ( t ) = X t µ and ϕ t = δFδν (cid:12)(cid:12) µ ( t ) . (iii) Continuity equation: Let

T > and v ∈ L ∞ ([0 , T ] × Lip(Ω) d ) , and ˙ x i ( t ) = v ( t, x i ( t )) for any t ∈ [0 , T ] and i ∈ N , with x i (0) ∼ i.i.d µ . Then, for x N = ( x , . . . , x N ) for any N ∈ N , the sequence { x N } N ∈ N converges in a distributionalsense to a solution µ of the continuity equation: ∂µ∂t + ∇ · ( µ v ) = 0 , µ (0) = µ . (17)Owing to space constraints, we skip here the proof ofProposition 1. The gradient ﬂow on the functional F is deﬁnedhere as the transport (17) with v = −∇ ϕ as in (16). Recall thatthe gradient ﬂow satisﬁes the boundary condition ∇ ϕ · n = 0 on ∂ Ω . The following theorem establishes the asymptoticstability of the gradient ﬂow on F , with convergence to µ ∗ ∈ P (Ω) , the global minimizer of F as t → ∞ : Theorem 5 ( Asymptotic stability of gradient ﬂow ). Let Ω ⊆ R d be a compact, convex set and F : Ω → R be an l -smoothand strictly geodesically convex functional with minimizer µ (cid:63) .Then the solutions to the gradient ﬂow w.r.t. F converge to µ (cid:63) in the limit t → ∞ .Proof. Let { µ t } t ≥ be a solution gradient ﬂow w.r.t. F in P (Ω) . We have: ddt F ( µ t ) = (cid:90) Ω (cid:28) ∇ (cid:18) δFδµ (cid:19) , v (cid:29) dµ t = − (cid:90) Ω (cid:12)(cid:12)(cid:12)(cid:12) ∇ (cid:18) δFδµ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) dµ t ≤ . This implies that F ( µ t ) ≤ F ( µ ) for all t ≥ , andtherefore { µ t } t ≥ is contained in the sublevel set S ( µ ) = { ν ∈ P (Ω) | F ( ν ) ≤ F ( µ ) } . From Lemma 1, we have that S ( µ ) is compact in ( P (Ω) , W ) , which implies that theorbit { µ t } t ≥ is precompact. Moreover, the functional F islower bounded in S ( µ ) by F ( µ ∗ ) . By the LaSalle invarianceprinciple for Banach spaces [38]–[40], we have that the orbitconverges in ( P (Ω) , W ) (also weakly, from Lemma 14)asymptotically to the largest invariant set contained in ˙ F − (0) .We have: ˙ F − (0) = (cid:26) µ ∈ P (Ω) (cid:12)(cid:12)(cid:12)(cid:12) ∇ (cid:18) δFδµ (cid:19) = 0 , a.e. in Ω (cid:27) , which implies that the Fr´echet derivative of F is zero in theset ˙ F − (0) . This corresponds to the set of critical points of F and from the strict geodesic convexity of F , we thereforeget that ˙ F − (0) = { µ (cid:63) } . V. M

ULTI - AGENT COVERAGE CONTROL ALGORITHMS

In this section, we aim to place well-known multi-agentcoverage control algorithms in the literature [1], [4] within themultiscale theoretical framework established in the previoussections, in an effort to understand the macroscopic behaviorof the coverage algorithms. To do this, we ﬁrst relate thecorresponding coverage objective functions used in both for-mulations and then apply our results to analyze their behaviorin the limit N → ∞ . We begin with a widely-used aggregateobjective function for coverage control of multi-agent systems,the multi-center distortion function, and then obtain its func-tional counterpart in the space of probability measures. Themulti-center distortion function H f : Ω N → R ≥ [1] is givenby: H f ( x ) = (cid:90) Ω min i ∈{ ,...,N } f ( | x − x i | ) dµ ∗ ( x ) . (18)where f : R ≥ → R ≥ is a non-decreasing function and µ (cid:63) ( x ) = ρ (cid:63) ( x ) dvol , with ρ (cid:63) a target density in Ω . TheVoronoi partition of Ω , {V i } Ni =1 , generated by x ∈ Ω N facilitates the analysis of H f and is deﬁned is as follows: V i = { x ∈ Ω | | x − x i | ≤ | x − x j | ∀ j ∈ { , . . . , N } } , ∀ i. The following proposition establishes the relationship between H f and the optimal transport cost C f in (1): Proposition 2 ( Optimal transport formulation of coverageobjective ). The aggregate objective function H f as deﬁnedin (18) , satisﬁes: H f ( x ) = min w ∈ ∆ N − C f (cid:32) N (cid:88) i =1 w i δ x i , µ (cid:63) (cid:33) . where ∆ N − = { w ∈ R N ≥ (cid:12)(cid:12)(cid:12) (cid:80) Ni =1 w i = 1 } is the ( N − -simplex. Furthermore, the minimizing weights w (cid:63) =( w (cid:63) , . . . , w (cid:63)N ) are given by w (cid:63)i = µ (cid:63) ( V i ) , where {V i } Ni =1 isthe Voronoi partition of Ω . We skip the proof of Proposition 2 here owing to spaceconstraints. The following corollary applies Proposition 2 tothe special case of f ( x ) = x : Corollary 2 ( L -Wasserstein distance as aggregate ob-jective function ). Applying Proposition 2 with a quadraticcost f ( x ) = x (and the corresponding aggregate objectivefunction H ), we have: H ( x ) = W (cid:32) N (cid:88) i =1 µ (cid:63) ( V i ) δ x i , µ (cid:63) (cid:33) . We now investigate the properties of the aggregate objectivefunction H f in the limit N → ∞ . Lemma 10.

Let µ (cid:63) ∈ P (Ω) be an absolutely continuousmeasure deﬁning H f . Let x i ∼ i.i.d µ , for i ∈ { , . . . , N } ,where µ ∈ P (Ω) is any absolutely continuous probabilitymeasure such that supp( µ ) ⊇ supp( µ (cid:63) ) . It holds almost surelythat lim N →∞ H f ( x ) = 0 .Proof. From the Glivenko-Cantelli theorem, it follows that, as N → ∞ , the limit (cid:80) Ni =1 µ (cid:63) ( V i ) δ x i → µ (cid:63) holds almost surely, in the weak sense (from the expectation w.r.t. (cid:80) Ni =1 µ (cid:63) ( V i ) δ x i of any simple function). Thus, by the continuity of C f : lim N →∞ H f ( x ) = lim N →∞ C f (cid:32) N (cid:88) i =1 µ (cid:63) ( V i ) δ x i , µ (cid:63) (cid:33) = 0 . The previous result holds for any conﬁguration of the points { x i } Ni =1 as long as they are sampled from a distribution whosesupport contains that of µ (cid:63) . Note that this is consistent withwhat happens in the discrete particle case, in the coveragecontrol problem. In this case, critical point conﬁgurations aregiven by the so-called centroidal Voronoi conﬁgurations [1].However, as the number of agents goes to inﬁnity, any con-ﬁguration of points asymptotically become centroids of theirVoronoi regions. Thus, those positions correspond to localoptimizers of the discrete coverage control problem. In thisway, while the empirical measure N (cid:80) Ni =1 δ x i correspondingto the points { x i } Ni =1 samples from µ converges uniformlyalmost surely to µ (Glivenko-Cantelli theorem), the quanti-zation energy H f , converges to zero, which does not reallyreﬂect the discrepancy between the measures µ and µ (cid:63) . Thus,the functional H f suffers from this deﬁciency as a candidateaggregate function for coverage control in the large scale limit.Consider instead the following aggregate objective function: ¯ H f ( x ) = C f (cid:32) N N (cid:88) i =1 δ x i , µ (cid:63) (cid:33) . (19)This performance metric has been used before in the so-calledarea (weight)-constrained coverage control problem [4] (theweights w i = 1 /N are balanced in the case of (19)). Lemma 11.

Let µ (cid:63) ∈ P (Ω) be an absolutely continuousmeasure and let ¯ H f be deﬁned as in (19) . Let x i ∼ i.i.d µ ,for i ∈ { , . . . , N } , where µ ∈ P (Ω) is any absolutelycontinuous probability measure. It holds almost surely that lim N →∞ ¯ H f ( x ) = C f ( µ, µ (cid:63) ) .Proof. This can be seen from the following: ¯ H f ( x ) = C f (cid:32) N N (cid:88) i =1 δ x i , µ (cid:63) (cid:33) = min T :Ω →{ x i } Ni =1 T µ (cid:63) = N (cid:80) Ni =1 δ xi (cid:90) Ω f ( | x − T ( x ) | ) dµ (cid:63) ( x )= min T :Ω →{ x i } Ni =1 µ (cid:63) ( T − ( { x i } ))= N (cid:90) Ω f ( | x − T ( x ) | ) dµ (cid:63) ( x ) . Similar to H f ( x ) , the functional ¯ H f can be expressed as thesum of integrals over certain space partition. However, thiscase involves a generalized Voronoi partition {W i } Ni =1 : W i = { x ∈ Ω | f ( | x − x i | ) − ω i ≤ f ( | x − x j | ) − ω j } , where { ω , . . . , ω N } are chosen such that µ (cid:63) ( W i ) = 1 /N forall i ∈ { , . . . , N } . We refer the reader to [4] for a detailedtreatment. We can now write: ¯ H f ( x ) = N (cid:88) i =1 (cid:90) W i f ( | x − x i | ) dµ (cid:63) ( x ) . Now, by letting x i ∼ i.i.d µ , where µ ∈ P (Ω) is any absolutelycontinuous probability measure, in the limit N → ∞ , we have N (cid:80) Ni =1 δ x i converging uniformly almost surely to µ . In thisway, by the continuity of C f , we have: lim N →∞ ¯ H f ( x ) = C f ( µ, µ (cid:63) ) , a.s. Similarly to (12), we can formulate a multi-agent proximaldescent algorithm on the aggregate objective function ¯ H f , with f ( x ) = x , as follows, for every i ∈ { , . . . , N } : x + i = arg min z ∈ Ω τ | x i − z | + ¯ H f ( z, x − i ) . (20)Note that this is a proximal formulation of the load-balancingvariant of the Lloyd’s algorithm in [4]. Theorem 6 ( Convergence to generalized centroidal Voronoiconﬁguration and µ (cid:63) ). The Lloyd proximal descent (20) ,with f ( x ) = x , converges to a local minimizer of ¯ H f .Furthermore, as N → ∞ , the proximal descent scheme (20) converges to: x + = arg min z ∈ Ω τ | x − z | + φ ( z ) , (21) with x ∼ µ and φ = δW ( ν,µ (cid:63) ) δν (cid:12)(cid:12)(cid:12) µ , the Kantorovich potentialfor optimal transport from µ to µ (cid:63) . The sequence { µ k } k ∈ N obtained as the transport of an absolutely continuous proba-bility measure µ ∈ P (Ω) by (21) , with x ∼ µ , convergesweakly to µ (cid:63) as k → ∞ .Proof. Let (cid:98) µ h,N x be deﬁned as in (10) with a kernel satisfy-ing Assumption 2. We see that C f ( (cid:98) µ h,N x , µ (cid:63) ) as a functionof x is α -smooth for some α > (from Proposition 4 inAppendix B and an application of Lemma 5). Further, we notethat ¯ H f ( x ) = lim h → C f ( (cid:98) µ h,N x , µ (cid:63) ) and the α -smoothnessproperty carries over to the limit, as well as the comparisonLemma 8 for ¯ H f ( x ) . The convergence of (20) with f ( x ) = x to a local minimizer of ¯ H f then follows from a similar versionof Theorem 4 applied to ¯ H f ( x ) . It is easy to see that theselocal minima correspond to generalized centroidal Voronoiconﬁgurations as in [4].Following a similar reasoning as in Proposition 1 for F = C f and F h,N = C h,Nf , we have that, as N → ∞ , the proximaldescent scheme (20) converges to (21).With F ( ν ) = W ( ν, µ ∗ ) , let G µ k ( ν ) = τ W ( µ k , ν ) + F ( ν ) . The Fr´echet derivative of G µ k is given by ∇ (cid:16) δG µk ( ν ) δν (cid:12)(cid:12)(cid:12) ν (cid:17) = τ ∇ φ ν → µ k + ∇ φ ν → µ ∗ . Moreover, atthe critical point µ k +1 of G µ k we have τ ∇ φ µ k +1 → µ k + ∇ φ µ k +1 → µ ∗ = τ (cid:0) id − T µ k +1 → µ k (cid:1) + (cid:0) id − T µ k +1 → µ ∗ (cid:1) = 0 ,which implies that (cid:0) T µ k +1 → µ k − id (cid:1) = τ (cid:0) id − T µ k +1 → µ ∗ (cid:1) .We then have W ( µ k , µ k +1 ) = τ W ( µ k +1 , µ ∗ ) . For any (and only) ν on the geodesic between µ k and µ ∗ , we have W ( µ k , µ ∗ ) = W ( µ k , ν ) + W ( ν, µ ∗ ) (wherein the triangleinequality is an equality), and this is the case if and onlyif (cid:82) Ω (cid:104) id − T ν → µ k , T ν → µ ∗ − id (cid:105) dν = 2 W ( µ k , ν ) W ( ν, µ ∗ ) .We see that this is indeed the case for ν = µ k +1 , from whichwe infer that µ k +1 lies on the geodesic between µ k and µ ∗ . Wetherefore get that { µ k } k ∈ N lies on the geodesic connecting µ and µ ∗ . Now, from Proposition 3 in Appendix B it follows that W ( · , µ k ) is generalized geodesically convex with referencemeasure µ k , and similarly W ( · , µ ∗ ) is generalized geodesi-cally convex with reference measure µ ∗ , the two measures µ k and µ ∗ are interchangeable as reference measures alongthe geodesic between them. It then follows that the function G µ k is generalized geodesically convex along the geodesicbetween µ and µ ∗ , with reference measure µ k . Then, weakconvergence to µ (cid:63) of the sequence { µ k } k ∈ N obtained as thetransport of an absolutely continuous probability measure µ ∈P (Ω) by (21) follows from an application of Theorem 3 andthe strict (generalized) geodesic convexity and l -smoothnessof W ( · , µ (cid:63) ) (by an application of Propositions 3 and 4 inAppendix B).It is known that the generalized Lloyd’s algorithm results inconvergence to generalized centroidal Voronoi conﬁgurations[4], where the generators { x , . . . , x N } of the generalizedVoronoi partition are also the centroids of their respectivegeneralized Voronoi cells. The generalized centroidal Voronoiconﬁguration is, however, not unique, and this relates to thefact that the convergence is to the local minimizers of ¯ H f ,which is typically nonconvex.We now present results from numerical experiments for thecoverage control algorithm (20) for the objective function ¯ H f ,with f ( x ) = x . We ﬁrst sample i.i.d. from a multimodalGaussian distribution and normalize the histogram of thesamples over a discretization of the spatial domain to obtaina (quantized) target distribution over the domain. We thenimplement the coverage control algorithm (20) for varioussizes N of the multi-agent system, from random initializationsof the agent positions. We present the following: (i) the steadystate distribution of agents (in Figure 1), and (ii) the valueof the coverage objective function as a function of time (inFigure 2), for various sizes N of the multi-agent system.VI. C ONCLUSION

In this paper, we have introduced a multiscale framework forthe analysis and design of multi-agent coverage algorithms thatbegins with a macroscopic speciﬁcation of the target coveragebehavior to derive provably-correct microscopic, agent-levelalgorithms that achieve the target macroscopic speciﬁcation.Our class of macroscopic proximal descent schemes exploitconvexity properties of coverage objective functionals to steerthe macroscopic conﬁguration, which are then translated intoagent-level algorithms via a variational discretization. Weuncover the relationship with previously studied coveragealgorithms, and obtain insights into the large-scale behaviorof these algorithms. Future work will consider the extensionto a constrained optimization framework to include such -0.8 -0.4 0 0.4 0.8-0.8-0.400.40.8 -0.8 -0.4 0 0.4 0.8-0.8-0.400.40.8 -0.8 -0.4 0 0.4 0.8-0.8-0.400.40.8 -0.8 -0.4 0 0.4 0.8-0.8-0.400.40.8 Fig. 1. The ﬁgure shows the steady state distribution of the agents implementing the coverage algorithm (20) with the target distribution depicted in grayscale,for N = 10 , , , . We observe that the distribution of the agents more closely approximates the target distribution as the size N of the system increases. Fig. 2. The ﬁgure is a representative plot of the value of the aggregateobjective function ¯ H f ( x t ) (with f ( x ) = x ) vs. time t for various sizes N of the multi-agent system and random initializations of agent positions. Weobserve that the steady state value decreases with the size N of the system,in accordance with our theoretical results. constraints as sensing limitations, dynamic and collision-avoidance constraints. R EFERENCES[1] J. Cort´es, S. Mart´ınez, T. Karatas, and F. Bullo. Coverage controlfor mobile sensing networks.

IEEE Transactions on Robotics andAutomation , 20(2):243–255, 2004.[2] J. Cort´es. Motion coordination algorithms resulting from classicalgeometric optimization problems. In K. Tas, D. Krupka, D. Baleanu,and O. Krupkova, editors,

Proceedings of the International Workshopon Global Analysis , volume 729 of

AIP Conference Proceedings Series ,pages 54–68. American Institute of Physics, New York, 2004.[3] J. Cort´es, S. Mart´ınez, and F. Bullo. Spatially-distributed coverage op-timization and control with limited-range interactions.

ESAIM. Control,Optimisation & Calculus of Variations , 11(4):691–719, 2005.[4] J. Cort´es. Coverage optimization and spatial load balancing by roboticsensor networks.

IEEE Transactions on Automatic Control , 55(3):749–754, 2010.[5] M. Zhong and C. G. Cassandras. Distributed coverage control anddata collection with mobile sensor networks.

IEEE Transactions onAutomatic Control , 56(10):2445–2455, 2011.[6] A. Breitenmoser, M. Schwager, J.-C. Metzger, R. Siegwart, and D. Rus.Voronoi coverage of non-convex environments with a group of net-worked robots. In

IEEE Int. Conf. on Robotics and Automation , page4982–4989, 2010. [7] M. Pavone, A. Arsie, E. Frazzoli, and F. Bullo. Distributed algorithms forenvironment partitioning in mobile robotic networks.

IEEE Transactionson Automatic Control , 56(8):1834–1848, 2011.[8] Y. Ru and S. Mart´ınez. Coverage control in constant ﬂow environmentsbased on a mixed energy-time metric.

Automatica , 49(9):2632–2640,2013.[9] S. Bhattacharya, N. Michael, and V. Kumar. Distributed coverage andexploration in unknown non-convex environments. In

Int. Symposium onDistributed Autonomous Robotic Systems , page 61–75. Springer, 2013.[10] S. Bandyopadhyay, S. J. Chung, and F. Y. Hadaegh. InhomogeneousMarkov chain approach to probabilistic swarm guidance algorithms. In

Int. Conf. on Spacecraft Formation Flying Missions and Technologies ,page 1–13, 2013.[11] N. Demir, U. Eren, and B. Acikmese. Decentralized probabilistic densitycontrol of autonomous swarms with safety constraints.

AutonomousRobots , 39(4):537–554, 2015.[12] S. Bandyopadhyay, S. J. Chung, and F. Y. Hadaegh. Probabilistic anddistributed control of a large-scale swarm of autonomous agents.

IEEETransactions on Robotics , 33(5):1103–1123, 2017.[13] M. E. Chamie, Y. Yu, B. Ac¸ıkmes¸e, and M. Ono. Controlled markovprocesses with safety state constraints.

IEEE Transactions on AutomaticControl , 64(3):1003–1018, 2018.[14] V. Krishnan and S. Mart´ınez. Distributed control for spatial self-organization of multi-agent swarms.

SIAM Journal on Control andOptimization , 56(5):3642–3667, 2018.[15] U. Eren and B. Acikmese. Velocity ﬁeld generation for density controlof swarms using heat equation and smoothing kernels. In

IFAC PapersOnline , volume 50, page 9405–9411, 2017.[16] T. Zheng, Q. Han, and H. Lin. PDE-based dynamic density estimationfor large-scale agent systems.

IEEE Control Systems Letters , 5(2):541–546, 2020.[17] P. Frihauf and M. Krstic. Leader-enabled deployment onto planar curves:A PDE-based approach.

IEEE Transactions on Automatic Control ,56(8):1791–1806, 2011.[18] K. Elamvazhuthi, H. Kuiper, and S. Berman. PDE-based optimizationfor stochastic mapping and coverage strategies using robotic ensembles.

Automatica , 95:356–367, 2018.[19] F. Zhang, A. Bertozzi, K. Elamvazhuthi, and S. Berman. Performancebounds on spatial coverage tasks by stochastic robotic swarms.

IEEETransactions on Automatic Control , 63(6):1563–1578, 2018.[20] K. Elamvazhuthi and S. Berman. Mean-ﬁeld models in swarm robotics:A survey.

Bioinspiration & Biomimetics , 15(1):015001, 2019.[21] K. Elamvazhuthi, Z. Kakish, A. Shirsat, and S. Berman. Controllabilityand stabilization for herding a robotic swarm using a leader: A mean-ﬁeld approach.

IEEE Transactions on Robotics , 2020.[22] V. Krishnan and S. Mart´ınez. Distributed optimal transport for thedeployment of swarms. In

IEEE Int. Conf. on Decision and Control ,pages 4583–4588, Miami Beach, FL, USA, 2018.[23] G. Foderaro, S. Ferrari, and T. A. Wettergren. Distributed optimal controlfor multi-agent trajectory optimization.

Automatica , 50:149–154, 2014.[24] S. Ferrari, G. Foderaro, P. Zhu, and T. A. Wettergren. Distributed optimalcontrol of multiscale dynamical systems: a tutorial.

IEEE ControlSystems , 36(2):102–116, 2016.[25] S. Bandyopadhyay, S. J. Chung, and F. Y. Hadaegh. Probabilisticswarm guidance using optimal transport. In

IEEE Conf. on ControlApplications , page 498–505, 2014. [26] M. H. de Badyn, U. Eren, B. Ac¸ikmes¸e, and M. Mesbahi. Optimal masstransport and kernel density estimation for state-dependent networkeddynamic systems. In IEEE Int. Conf. on Decision and Control , pages1225–1230, 2018.[27] Q. Du, V. Faber, and M. Gunzburger. Centroidal voronoi tessellations:applications and algorithms.

SIAM Review , 41(4):637–676, 1999.[28] D. Bourne, B. Schmitzer, and B. Wirth. Semi-discrete unbalancedoptimal transport and quantization. arXiv preprint arXiv:1808.01962 ,2018.[29] D. Bourne and S. Roper. Centroidal power diagrams, lloyd’s algorithm,and applications to optimal location problems.

SIAM Journal onNumerical Analysis , 53(6):2545–2569, 2015.[30] V. Hartmann and D. Schuhmacher. Semi-discrete optimal transport-thecase p= 1. arXiv preprint arXiv:1706.07650 , 2017.[31] L. Ambrosio, N. Gigli, and G. Savar´e.

Gradient ﬂows: in metric spacesand in the space of probability measures . Springer, 2008.[32] R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation ofthe fokker–planck equation.

SIAM Journal on Mathematical Analysis ,29(1):1–17, 1998.[33] L. Chizat and F. Bach. On the global convergence of gradient descentfor over-parameterized models using optimal transport. In

Advances inNeural Information Processing Systems , pages 3036–3046, 2018.[34] K. Caluya and A. Halder. Proximal recursion for solving the fokker-planck equation. In

American Control Conference , pages 4098–4103,2019.[35] P. Billingsley.

Probability and measure . John Wiley, 2008.[36] V. S. Varadarajan. On the convergence of sample probability dis-tributions.

Sankhy¯a: The Indian Journal of Statistics (1933-1960) ,19(1/2):23–26, 1958.[37] R. McCann. Existence and uniqueness of monotone measure-preservingmaps.

Duke Mathematical Journal , 80(2):309–324, 1995.[38] D. Henry.

Geometric theory of semilinear parabolic equations . Springer,1981.[39] J.A. Walker.

Dynamical Systems and Evolution Equations: Theory andApplications , volume 20. Springer, 2013.[40] J. A. Walker. Some results on Liapunov functions and generateddynamical systems.

Journal of Differential Equations , 30(3):424–440,1978.[41] P. Billingsley.

Convergence of probability measures . John Wiley &Sons, 2013.[42] F. Santambrogio.

Optimal transport for applied mathematicians .Springer, 2015.[43] C. Villani.

Optimal transport: old and new , volume 338. Springer, 2008. A PPENDIX AM ATHEMATICAL PRELIMINARIES

We present here the mathematical preliminaries on con-vergence of measures, the L -Wasserstein space and smooth-ness and convexity notions for functions deﬁned on the L -Wasserstein space. A. Weak convergence of measures

The results of this manuscript rely on the notions of weakconvergence in P (Ω) , the topology of weak convergence, itsmetrizability, and the compactness of sets of P (Ω) . We recallthem here and refer the reader to [41] for more information. Deﬁnition 2 ( Weak convergence ). Let Ω ⊆ R d , and P (Ω) be its set of probability measures. A sequence { µ k } k ∈ N ⊆P (Ω) converges weakly to µ ∈ P (Ω) if for any bounded andcontinuous function f on Ω , lim k →∞ (cid:82) Ω f dµ k = (cid:82) Ω f dµ . Equivalently, in the deﬁnition above, the sequence { µ k } k ∈ N in P (Ω) is said to converge to µ in P (Ω) equipped withthe topology of weak convergence . The space of probabilitymeasures P (Ω) equipped with the topology of weak con-vergence is metrizable [41]. In other words, there exists ametric on P (Ω) such that the topology of weak convergence is obtained as the topology induced by the metric. One suchmetric is the Wasserstein distance, see Section A-B. We nowstate Prokhorov’s theorem [41] on the equivalence betweentightness and precompactness of a collection of probabilitymeasures over a separable and complete metric (Polish) space. Lemma 12 ( Prokhorov’s theorem ). Let Ω be a completemetric space, and let K ⊆ P (Ω) . The closure of K w.r.t. thetopology of weak convergence in P (Ω) is compact if and onlyif K is tight. That is, K is tight if for any (cid:15) > there exists acompact K (cid:15) ⊆ Ω such that µ ( K (cid:15) ) > − (cid:15) , for all µ ∈ K . Corollary 3 ( Compactness of P (Ω) ). Let Ω ⊆ R d a compactset. Then, the closure of P (Ω) w.r.t. the topology of weak con-vergence in P (Ω) is compact. This follows from Prokhorov’stheorem in Lemma 12, since P (Ω) is tight: for any (cid:15) > , wechoose Ω itself as the compact set and have µ (Ω) = 1 > − (cid:15) for any µ ∈ P (Ω) . Moreover, since P (Ω) is also closedw.r.t. the topology of weak convergence, it is therefore compact.B. The L -Wasserstein distance The L -Wasserstein distance between two probability mea-sures µ, ν ∈ P (Ω) is given by: W ( µ, ν ) = min π ∈ Π( µ,ν ) (cid:90) Ω × Ω | x − y | dπ ( x, y ) , (22)where Π( µ, ν ) is the space of joint probability measuresover Ω × Ω with marginals µ and ν . The deﬁnition of L -Wasserstein distance in (22) follows from the so-calledKantorovich formulation of optimal transport. An alternativeformulation of this problem, called the Monge formulation ofoptimal transport, is given below: W ( µ, ν ) = min T :Ω → Ω T µ = ν (cid:90) Ω | x − T ( x ) | dµ ( x ) . (23)In the Monge formulation (23), the minimization is carriedout over the space of maps T : Ω → Ω for which theprobability measure ν is obtained as the pushforward of µ .This can be viewed as a deterministic formulation of optimaltransport, where the transport is carried out by a map, whereasthe Kantorovich formulation (22) can be seen as a problemrelaxation, where the transport plan is described by a jointprobability measure π over Ω × Ω , with µ and ν as itsmarginals. It is to be noted that the Monge formulationdoes not always admit a solution, while the Kantorovichproblem does. Roughly speaking, the Kantorovich formulationis the “minimal” extension of the Monge formulation, asboth problems attain the same inﬁmum [42]. Further, thetwo formulations (22) and (23) are equivalent under certainconditions, and in the sense laid out in the ensuing lemma. Lemma 13 ( Monge-Kantorovich optimal transport , cf. [42],Theorem 1.17 for c ( x, y ) = | x − y | ). Assume that Ω iscompact in R d . There exists a minimizer π ∗ to the Kantorovichproblem (22) . Moreover, if the measure µ is atomless, and µ ( ∂ Ω) = 0 , then the minimizer π ∗ is unique, the Mongeformulation (23) admits a unique minimizer T ∗ , and it holdsthat π ∗ = (id , T ∗ ) µ , with id : Ω → Ω the identitymapping. Furthermore, there exists a Lipschitz continuous function φ : Ω → R , called the Kantorovich potential, suchthat ∇ φ = id − T ∗ . The space of probability measures P (Ω) endowed with the L -Wasserstein distance W will equivalently be referred to asthe L -Wasserstein space ( P (Ω) , W ) over Ω . The followinglemma, which follows from Theorem 6.9 in [43], establishesthe equivalence between convergence in the sense of thetopology of weak convergence and in the L -Wassersteinmetric. Lemma 14 ( Convergence in ( P (Ω) , W ) ). For compact Ω ⊂ R d , the L -Wasserstein distance W metrizes theweak convergence in P (Ω) . That is, a sequence of measures { µ k } k ∈ N in P (Ω) converges weakly to µ ∈ P (Ω) if and onlyif lim k →∞ W ( µ k , µ ) = 0 .C. Derivatives of functionals on atomless measures We start by introducing the notion of ﬁrst variation of afunctional on P (Ω) as follows: Deﬁnition 3 ( First variation of a functional on P (Ω) ). Let F : P (Ω) → R , µ ∈ P (Ω) and let { µ (cid:15) } (cid:15) ∈ R be asmooth one-parameter family of probability measures. Supposethat there exists a unique δFδµ ( µ ) such that dd(cid:15) F ( µ (cid:15) ) (cid:12)(cid:12) (cid:15) =0 =lim (cid:15) → (cid:15) (cid:82) Ω δFδµ ( µ ) ( dµ (cid:15) − dµ ) for any smooth { µ (cid:15) } (cid:15) ∈ R .Then, δFδµ ( µ ) is the ﬁrst variation of F evaluated at µ . For functionals for which the ﬁrst variation exists as inthe above deﬁnition, we can introduce the notion of Fr´echetderivative on the L -Wasserstein space ( P (Ω) , W ) : Deﬁnition 4 ( Derivative of a functional on ( P (Ω) , W ) ). A functional F : P (Ω) → R is Fr´echet differentiable withderivative ξ at an atomless measure µ ∈ P (Ω) , if forany smooth one-parameter family of probability measures { µ (cid:15) } (cid:15) ∈ R , the following limit exists: lim (cid:15) → F ( µ (cid:15) ) − F ( µ ) − (cid:82) Ω (cid:104) ξ, T µ → µ (cid:15) − id (cid:105) dµW ( µ , µ (cid:15) ) = 0 , where ξ = ∇ ϕ , ϕ = δFδ ˜ µ ( µ ) and T µ → µ (cid:15) is the optimaltransport map from µ to µ (cid:15) . We now introduce the notion of directional derivative ofa functional over probability measures. For this, let v = t ( T µ → ν − id) , which implies that ν = (id + t v ) µ . We have: W ( µ, ν ) = (cid:115)(cid:90) Ω | T µ → ν − id | dµ = t (cid:115)(cid:90) Ω | v | dµ, and we get: lim t → F ((id + t v ) µ ) − F ( µ ) − t (cid:82) Ω (cid:104) ξ, v (cid:105) dµt (cid:113)(cid:82) Ω | v | dµ = 0 . Therefore, the directional derivative of F along v is ddt (cid:12)(cid:12)(cid:12)(cid:12) v F ( µ ) = lim t → F ((id + t v ) µ ) − F ( µ ) t = (cid:90) Ω (cid:104) ξ, v (cid:105) dµ, where ξ is the Fr´echet derivative of F evaluated at µ .

1) Lipschitz-continuous derivatives:

We now introduce thenotion of l -smoothness that will be useful for the developmentof gradient descent-based transport schemes in the paper. Deﬁnition 5 ( l -smoothness of functionals on ( P (Ω) , W ) ). A functional F : P (Ω) → R is called l -smooth (or Lipschitzdifferentiable ) if for any µ, ν ∈ P (Ω) , we have: (cid:115)(cid:90) Ω | ξ µ − ξ ν | dν ≤ lW ( µ, ν ) , where ξ µ , ξ ν are the Fr´echet derivatives of F evaluated at µ and ν respectively. From the above deﬁnition on l -smooth functionals, thefollowing lemma can be easily verﬁed: Lemma 15 ( l -smooth functionals ). A functional F : P (Ω) → R that is l -smooth on ( P (Ω) , W ) satisﬁes: (cid:12)(cid:12) F ( ν ) − F ( µ ) − (cid:82) Ω (cid:104) ξ µ , T µ → ν − id (cid:105) dµ (cid:12)(cid:12) ≤ l W ( µ, ν ) , (cid:12)(cid:12)(cid:82) Ω (cid:104) ξ µ − ξ ν , T ν → µ − id (cid:105) dν (cid:12)(cid:12) ≤ lW ( µ, ν ) ,for any two atomless probability measures µ, ν ∈ P (Ω) , where ξ µ and ξ ν are the Fr´echet derivatives of F evaluated at µ and ν respectively.D. Convexity of functionals on the Wasserstein space Results in convex analysis can be appropriately general-ized to functionals on the L -Wasserstein space ( P (Ω) , W ) ,see [31] for a detailed treatment. In this section, we introduceand deﬁne notions related to the convexity of functionalson ( P (Ω) , W ) used to build the results in this paper. Beforewe can deﬁne any notion of convexity, we introduce anappropriate notion of interpolation: Deﬁnition 6 ( Generalized displacement interpolation ). Let Ω be a compact subset of R d , µ, ν ∈ P (Ω) , and θ ∈ P r (Ω) be an atomless probability measure. Let T θ → µ : Ω → Ω and T θ → ν : Ω → Ω are optimal transport maps from θ to µ , and θ to ν resp. in the L -Wasserstein space over Ω . A (generalized)displacement interpolant of µ and ν w.r.t. θ is given by γ t =((1 − t ) T θ → µ + tT θ → ν ) θ , for t ∈ [0 , . It can be shown that for a compact and convex Ω ⊂ R d , thespace of probability measures P (Ω) is geodesically convexw.r.t. the notion of (generalized) displacement interpolation inDeﬁnition 6. Lemma 16 ( Geodesic convexity of P (Ω) ). Let Ω ⊆ R d be a compact, convex set. Then, the L -Wasserstein space ( P (Ω) , W ) is geodesically convex w.r.t. the notion of inter-polations as in Deﬁnition 6. Now, we introduce the following standard deﬁnition on the(generalized) geodesic convexity of functionals on the L -Wasserstein space ( P (Ω) , W ) : Deﬁnition 7 ( Generalized geodesic convexity ). Let Ω ⊆ R d be a compact and convex set, and let µ, ν ∈ P (Ω) and θ ∈P (Ω) be an atomless probability measure, for which there exist T θ → µ : Ω → Ω and T θ → ν : Ω → Ω optimal transport mapsfrom θ to µ and from θ to ν respectively, in the L -Wasserstein space over Ω . A functional F : P (Ω) → R is (generalized)geodesically convex (resp. (generalized) strictly geodesicallyconvex ) if the following holds for every t ∈ [0 , : F (((1 − t ) T θ → µ + tT θ → ν ) θ ) ≤ (1 − t ) F ( µ ) + tF ( ν ) . (resp. the previous inequality holds with strict inequality). Lemma 17 ( First-order convexity condition ). Let Ω ⊆ R d becompact and convex, µ, ν, θ ∈ P (Ω) be atomless probabilitymeasures. Let F : P (Ω) → R be a Fr´echet differentiable and(generalized) geodesically convex functional (in the sense ofDeﬁnition 7). Then, we have: F ( ν ) ≥ F ( µ ) + (cid:90) Ω (cid:104) ξ µ , T θ → ν − T θ → µ (cid:105) dθ, (24) where ξ µ is the Fr´echet derivative of F at µ , and T θ → µ : Ω → Ω and T θ → ν : Ω → Ω are optimal transport maps from θ to µ and from θ to ν respectively. We now deﬁne below the notion of strong geodesic convex-ity of Fr´echet-differentiable functionals on ( P (Ω) , W ) : Deﬁnition 8 ( Strong geodesic convexity of a functionalon ( P (Ω) , W ) ). Let Ω ⊆ R d be compact and convexand µ, ν, θ ∈ P (Ω) be atomless probability measures. Let F : P (Ω) → R be a Frech´et-differentiable functional. Let ξ µ and ξ ν be the Fr´echet derivatives of F evaluated at measures µ and ν , respectively. Then, F is strongly (geodesically) convex if there exists an m > such that: (cid:90) Ω (cid:104) ξ ν − ξ µ , T θ → ν − T θ → µ (cid:105) dθ ≥ mW ( µ, ν ) , (25) where T θ → µ : Ω → Ω and T θ → ν : Ω → Ω are optimaltransport maps from θ to µ and from θ to ν respectively. A PPENDIX BA GGREGATE OBJECTIVE FUNCTIONS

Proposition 3 ( Strict geodesic convexity of C f ( · , µ ∗ ) ). Fix µ ∗ ∈ P (Ω) (absolutely continuous) as the reference measureand let µ , µ ∈ P (Ω) . Let T µ ∗ → µ and T µ ∗ → µ be optimaltransport maps from µ ∗ to µ and µ ∗ to µ respectively,corresponding to the optimal transport cost C f , and let T t = (1 − t ) T µ ∗ → µ + tT µ ∗ → µ for t ∈ [0 , . For µ t = T t µ ∗ ,we have: C f ( µ t , µ ∗ ) < (1 − t ) C f ( µ , µ ∗ ) + tC f ( µ , µ ∗ ) . Proof.

Proposition 4 ( l -smoothness of C f ( · , µ ∗ ) ). Let the Fr´echetderivative of the functional F ( µ ) = C f ( µ, µ ∗ ) at µ ∈ P r (Ω) be denoted as ξ µ . The functional F ( µ ) = C f ( µ, µ ∗ ) satisﬁes: (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω (cid:104) ξ µ − ξ µ , T µ → µ − id (cid:105) dµ (cid:12)(cid:12)(cid:12)(cid:12) ≤ l (cid:90) Ω | T µ → µ − id | dµ , where T µ → µ is the optimal transport map from µ to µ w.r.t. C f .Proof. Let φ µ = δC f ( µ,µ ∗ ) δµ be the Kantorovich potential forthe optimal transport from µ to µ ∗ . We now have the followingrelation [42]: T µ → µ ∗ = id − ( ∇ h ) − ( ∇ φ µ ) , where the function h : R d → R is such that h ( v ) = f ( | v | ) . Itfollows from the l -smoothness of f that the function h is also l -smooth. From the above and l -smoothness of h , we get: (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω (cid:104) ξ µ − ξ µ , T µ → µ − id (cid:105) dµ (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Ω (cid:104)∇ h (id − T µ → µ ∗ ) − ∇ h (id − T µ → µ ∗ ) , T µ → µ − id (cid:105) dµ (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) Ω |(cid:104)∇ h (id − T µ → µ ∗ ) − ∇ h (id − T µ → µ ∗ ) , T µ → µ − id (cid:105)| dµ ≤ l (cid:90) Ω | T µ → µ − id | dµ2