[PDF] Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm

Abstract

We study the fixed-support Wasserstein barycenter problem (FS-WBP), which consists in computing the Wasserstein barycenter of m discrete probability measures supported on a finite metric space of size n . We show first that the constraint matrix arising from the standard linear programming (LP) representation of the FS-WBP is \textit{not totally unimodular} when m≥3 and n≥3 . This result resolves an open question pertaining to the relationship between the FS-WBP and the minimum-cost flow (MCF) problem since it proves that the FS-WBP in the standard LP form is not an MCF problem when m≥3 and n≥3 . We also develop a provably fast \textit{deterministic} variant of the celebrated iterative Bregman projection (IBP) algorithm, named \textsc{FastIBP}, with a complexity bound of O ~ (m n 7/3 ε −4/3 ) , where ε∈(0,1) is the desired tolerance. This complexity bound is better than the best known complexity bound of O ~ (m n 2 ε −2 ) for the IBP algorithm in terms of ε , and that of O ~ (m n 5/2 ε −1 ) from accelerated alternating minimization algorithm or accelerated primal-dual adaptive gradient algorithm in terms of n . Finally, we conduct extensive experiments with both synthetic data and real images and demonstrate the favorable performance of the \textsc{FastIBP} algorithm in practice.

Full PDF

aa r X i v : . [ c s . CC ] D ec Fixed-Support Wasserstein Barycenters:Computational Hardness and Fast Algorithm

Tianyi Lin ⋄ Nhat Ho ⋆ Xi Chen ‡ Marco Cuturi ⊳,⊲

Michael I. Jordan ⋄ , † Department of Electrical Engineering and Computer Sciences ⋄ Department of Statistics † University of California, UC BerkeleyDepartment of Statistics and Data Sciences, University of Texas, Austin ⋆ Stern School of Business, New York University ‡ CREST - ENSAE ⊳ , Google Brain ⊲ December 22, 2020

Abstract

We study the ﬁxed-support Wasserstein barycenter problem (FS-WBP), which consistsin computing the Wasserstein barycenter of m discrete probability measures supported ona ﬁnite metric space of size n . We show ﬁrst that the constraint matrix arising from thestandard linear programming (LP) representation of the FS-WBP is not totally unimodular when m ≥ n ≥

3. This result resolves an open question pertaining to the relationshipbetween the FS-WBP and the minimum-cost ﬂow (MCF) problem since it proves that theFS-WBP in the standard LP form is not an MCF problem when m ≥ n ≥ deterministic variant of the celebrated iterative Bregmanprojection (IBP) algorithm, named FastIBP , with a complexity bound of e O ( mn / ε − / ),where ε ∈ (0 ,

1) is the desired tolerance. This complexity bound is better than the bestknown complexity bound of e O ( mn ε − ) for the IBP algorithm in terms of ε , and that of e O ( mn / ε − ) from accelerated alternating minimization algorithm or accelerated primal-dual adaptive gradient algorithm in terms of n . Finally, we conduct extensive experimentswith both synthetic data and real images and demonstrate the favorable performance ofthe FastIBP algorithm in practice.

Over the past decade, the Wasserstein barycenter problem [Agueh and Carlier, 2011] (WBP)has served as a foundation for theoretical analysis in a wide range of ﬁelds, including eco-nomics [Carlier and Ekeland, 2010, Chiappori et al., 2010] and physics [Buttazzo et al., 2012,Cotar et al., 2013, Trouv´e and Younes, 2005] to statistics [Munch et al., 2015, Ho et al., 2017,Srivastava et al., 2018], image and shape analysis [Rabin et al., 2011, Bonneel et al., 2015,2016] and machine learning [Cuturi and Doucet, 2014]. The WBP problem is related to theoptimal transport (OT) problem, in that both are based on the Wasserstein distance, but theWBP is signiﬁcantly harder. It requires the minimization of the sum of Wasserstein distances,and typically considers m > m measures;see Villani [2008] for a comprehensive treatment of OT theory and Peyr´e and Cuturi [2019]for an introduction of OT applications and algorithms.An ongoing focus of work in both the WBP and the OT problem is the design of fast algo-rithms for computing the relevant distances and optima and the delineation of lower bounds1hat capture the computational hardness of these problems [Peyr´e and Cuturi, 2019]. For theOT problem, Cuturi [2013] introduced the Sinkhorn algorithm which has triggered signiﬁcantprogress [Cuturi and Peyr´e, 2016, Genevay et al., 2016, Altschuler et al., 2017, Dvurechensky et al.,2018, Blanchet et al., 2018, Lin et al., 2019b, Lahn et al., 2019, Quanrud, 2019, Jambulapati et al.,2019, Lin et al., 2019c]. Variants of the Sinkhorn and Greenkhorn algorithms [Altschuler et al.,2017, Dvurechensky et al., 2018, Lin et al., 2019b] continue to serve as the baseline approachesin practice. As for the theoretical complexity, the best bound is e O ( n ε − ) [Blanchet et al.,2018, Quanrud, 2019, Lahn et al., 2019, Jambulapati et al., 2019]. Moreover, Lin et al. [2019a]provided a complexity bound for the multimarginal OT problem.There has been signiﬁcant eﬀort devoted to the development of fast algorithms in thecase of m > m empirical measures is also an empirical measure withsupport whose cardinality is at most the size of the union of the support of the m measures,minus m −

1. When m = 2 and the measures are bound and the support is ﬁxed, thecomputation of the barycenter amounts to solving a network ﬂow problem on a directedgraph. Borgwardt and Patterson [2019] proved that ﬁnding a barycenter of sparse support isNP hard even in the simple setting when m = 3. However, their analysis cannot be extendedto the ﬁxed-support WBP, where the supports of the constituent m probability measures areprespeciﬁed. Contribution.

In this paper, we revisit the ﬁxed-support Wasserstein barycenter problem(FS-WBP) between m discrete probability measures supported on a prespeciﬁed set of n points. Our contributions can be summarized as follows:1. We prove that the FS-WBP in the standard LP form is not a minimum-cost ﬂow (MCF)problem in general. In particular, we show that the constraint matrix arising from thestandard LP representation of the FS-WBP is totally unimodular when m ≥ n = 2 but not totally unimodular when m ≥ n ≥

3. Our results shed light on the necessity of problem reformulation —e.g., entropic regularization [Cuturi and Doucet,2014, Benamou et al., 2015] and block reduction [Ge et al., 2019].2. We propose a fast deterministic variant of the iterative Bregman projection (IBP) al-2orithm, named

FastIBP , and provide a theoretical guarantee for the algorithm. Let-ting ε ∈ (0 ,

1) denote the target tolerance, the complexity bound of the algorithm is e O ( mn / ε − / ), which improves the complexity bound of e O ( mn ε − ) of the IBP algo-rithm [Benamou et al., 2015] in terms of ε and the complexity bound of e O ( mn / ε − )from the accelerated IBP and APDAGD algorithms in terms of n [Kroshnin et al., 2019,Guminov et al., 2019]. We conduct experiments on synthetic and real datasets anddemonstrate that the FastIBP algorithm achieves the favorable performance in prac-tice.

Organization.

The remainder of the paper is organized as follows. In Section 2, we providethe basic setup for the entropic-regularized FS-WBP and the dual problem. In Section 3, wepresent our computational hardness results for the FS-WBP in the standard LP form. InSections 4, we propose and analyze the

FastIBP algorithm. We present experimental resultson synthetic and real data in Section 5 and conclude in Section 6.

Notation.

We let [ n ] be the set { , , . . . , n } and R n + be the set of all vectors in R n withnonnegative components. n and n are the n -vectors of ones and zeros. ∆ n stands for theprobability simplex: ∆ n = { u ∈ R n + : ⊤ n u = 1 } . For a smooth function f , we denote ∇ f and ∇ λ f as the full gradient and the gradient with respect to a variable λ . For x ∈ R n and 1 ≤ p ≤ ∞ , we write k x k p for its ℓ p -norm. For X = ( X ij ) ∈ R n × n , the notationsvec ( X ) ∈ R n and det( X ) stand for the vector representation and the determinant. Thenotations k X k ∞ = max ≤ i,j ≤ n | X ij | and k X k = P ≤ i,j ≤ n | X ij | . The notations r ( X ) = X n and c ( X ) = X ⊤ n . Let X, Y ∈ R n × n , the Frobenius and Kronecker inner product are denotedby h X, Y i and X ⊗ Y . Given the dimension n and ε , the notation a = O ( b ( n, ε )) stands forthe upper bound a ≤ C · b ( n, ε ) where C > n and ε , and a = e O ( b ( n, ε ))indicates the previous inequality where C depends only the logarithmic factors of n and ε . In this section, we introduce the basic setup of the ﬁxed-support Wasserstein barycenterproblem (FS-WBP), starting with the linear programming (LP) presentation and entropic-regularized formulation and including a speciﬁcation of an approximate barycenter.

For p ≥

1, let P p (Ω) be the set of Borel probability measures on Ω with ﬁnite p -th moment.The Wasserstein distance of order p ≥ µ, ν ∈ P p (Ω) is deﬁned by [Villani, 2008]: W p ( µ, ν ) := inf π ∈ Π( µ,ν ) (cid:18)Z Ω × Ω d p ( x , y ) π ( d x , d y ) (cid:19) /p , (1)where d ( · , · ) is a metric on Ω and Π( µ, ν ) is the set of couplings between µ and ν . Given aweight vector ( ω , ω , . . . , ω m ) ∈ ∆ m for m ≥

2, the

Wasserstein barycenter [Agueh and Carlier,2011] of m probability measures { µ k } mk =1 is a solution of the following functional minimizationproblem min µ ∈P p (Ω) m X k =1 ω k W pp ( µ, µ k ) . (2)3ecause our goal is to provide computational schemes to approximately solve the WBP, weneed to provide a deﬁnition of an ε -approximate solution to the WBP. Deﬁnition 2.1.

The probability measure b µ ∈ P p (Ω) is called an ε -approximate barycenter if P mk =1 ω k W pp ( b µ, µ k ) ≤ P mk =1 ω k W pp ( µ ⋆ , µ k ) + ε where µ ⋆ is an optimal solution to problem (2) . There are two main settings: (i) free-support Wasserstein barycenter , namely, when weoptimize both the weights and supports of the barycenter in Eq. (2); and (ii) ﬁxed-supportWasserstein barycenter , namely, when the supports of the barycenter are obtained from thosefrom the probability measures { µ k } mk =1 and we optimize the weights of the barycenter inEq. (2). The free-support WBP problem is notoriously diﬃcult to solve.

It can either be solved usinga solution to the multimarginal-OT (MOT) problem, as described in detail by Agueh and Carlier[2011], or approximated using alternative optimization techniques. Assuming that each mea-sure is supported on n distinct points, the WBP problem can be solved exactly by solvingﬁrst a MOT, to then compute ( n − m + 1 barycenters of points in Ω (these barycenters areexactly the support of the barycentric measure). Solving a MOT is, however, equivalent tosolving an LP with n m variables and ( n − m + 1 constraints. The other route, alternativeoptimization, requires specifying an initial guess for the barycenter, a discrete measure sup-ported on k weighted points (where k is predeﬁned). One can then proceed by updating thelocations of µ (or even add new ones) to decrease the objective function in Eq. (2), beforechanging their weights. In the Euclidean setting with p = 2, the free-support WBP is closelyrelated to the clustering problem, and equivalent to k -means when m = 1 [Cuturi and Doucet,2014]. Whereas solving the free-support WBP using MOT results in a convex (yet intractable)problem, the alternating mimimization approach is not, in very much the same way that the k -means problem is not, and results in the minimization of a piece-wise quadratic function. On the other hand, the ﬁxed-support WBP is comparatively easier to solve, and as such hasplayed a role in real-world applications.

For instance, in imaging sciences, pixels and voxels aresupported on a predeﬁned, ﬁnite grid. In these applications, the barycenter and µ k measuresshare the same support.In view of this, throughout the remainder of the paper, we let ( µ k ) mk =1 be discrete probabil-ity measures and take the support points { x ki } i ∈ [ n ] to be ﬁxed. Since { µ k } mk =1 have the ﬁxedsupport, they are fully characterized by the weights { u k } mk =1 . Accordingly, the support of thebarycenter { b x i } i ∈ [ n ] is also ﬁxed and can be prespeciﬁed by { x ki } i ∈ [ n ] . Given this setup, theFS-WBP between { µ k } mk =1 has the following standard LP representation [Cuturi and Doucet,2014, Benamou et al., 2015, Peyr´e and Cuturi, 2019]:min { X i } mi =1 ⊆ R n × n + m X k =1 ω k h C k , X k i , s.t. r ( X k ) = u k for all k ∈ [ m ] ,c ( X k +1 ) = c ( X k ) for all k ∈ [ m − , (3)where { X k } mk =1 and { C k } mk =1 ⊆ R n × ... × n + denote a set of transportation plans and nonnegativecost matrices and ( C k ) ij = d p ( x ki , b x j ) for all k ∈ [ m ]. The ﬁxed-support Wasserstein barycen-ter u ∈ ∆ n is determined by the weight P mk =1 ω k c ( X k ) and the support ( b x , b x , . . . , b x n ).From Eq. (3), the FS-WBP is an LP with 2 mn − n equality constraints and mn vari-ables. This has inspired work on solving the FS-WBP using classical optimization algo-rithms [Ge et al., 2019, Yang et al., 2018]. Although progress has been made, the understand-ing of the structure of FS-WBP via this approach has remained limited. Particularly, whilethe OT problem [Villani, 2008] is equivalent to a minimum-cost ﬂow (MCF) problem, it re-mains unknown whether the FS-WBP is a MCF problem even in the simplest setting when m = 2. 4 .2 Entropic regularized FS-WBP Using Cuturi’s entropic approach to the OT problem [Cuturi, 2013], we deﬁne a regularizedversion of the FS-WBP in Eq. (3), where an entropic regularization term is added to theWasserstein barycenter objective. The resulting formulation is as follows:min { X i } mi =1 ⊆ R n × n + m X k =1 ω k ( h C k , X k i − ηH ( X k )) , s.t. r ( X k ) = u k for all k ∈ [ m ] ,c ( X k +1 ) = c ( X k ) for all k ∈ [ m − , (4)where η > H ( X ) := −h X, log( X ) − n ⊤ n i denotes the entropicregularization term. We refer to Eq. (4) as entropic regularized FS-WBP . When η is large,the optimal value of entropic regularized FS-WBP may yield a poor approximation of the costof the FS-WBP. To guarantee a good approximation, we scale the parameter η as a functionof the desired accuracy. Deﬁnition 2.2.

The probability vector b u ∈ ∆ n is called an ε -approximate barycenter if thereexists a feasible solution ( b X , b X , . . . , b X m ) ∈ R n × n + × · · · × R n × n + for the FS-WBP in Eq. (3) such that b u = P mk =1 ω k c ( b X k ) for all k ∈ [ m ] and P mk =1 ω k h C k , b X k i ≤ P mk =1 ω k h C k , X ⋆k i + ε where ( X ⋆ , X ⋆ , . . . , X ⋆m ) is an optimal solution of the FS-WBP in Eq. (3) . With these deﬁnitions in mind, we develop eﬃcient algorithms for approximately solvingthe FS-WBP where the dependence on m , n and ε is competitive to state-of-the-art algo-rithms [Kroshnin et al., 2019, Guminov et al., 2019]. Using the duality theory of convex optimization [Rockafellar, 1970], one dual form of entropicregularized FS-WBP has been derived before [Cuturi and Doucet, 2014, Kroshnin et al., 2019].Diﬀering from the usual 2-marginal or multimarginal OT [Cuturi and Peyr´e, 2018, Lin et al.,2019a], the dual entropic regularized FS-WBP is a convex optimization problem with an aﬃneconstraint set. Formally, we havemin λ,τ ∈ R mn ϕ old ( λ, τ ) := m X k =1 ω k  X ≤ i,j ≤ n e λ ki + τ kj − η − ( C k ) ij − λ ⊤ k u k  , s.t. m X k =1 ω k τ k = n . (5)However, the objective function in Eq. (5) is not suﬃciently smooth because of the sum ofexponents. This makes the acceleration very challenging. In order to alleviate this issue, weturn to derive another smooth dual form of entropic-regularized FS-WBP.By introducing the dual variables { α , α , . . . , α m , β , β , . . . , β m − } ⊆ R n , we deﬁne theLagrangian function of the entropic regularized FS-WBP in Eq. (4) as follows: L ( X , . . . , X m , α , . . . , α m , β , . . . , β m − ) (6)= m X k =1 ω k ( h C k , X k i − ηH ( X k )) − m X k =1 α ⊤ k ( r ( X k ) − u k ) − m − X k =1 β ⊤ k ( c ( X k +1 ) − c ( X k )) . By the deﬁnition of H ( X ), the nonnegative constraint X ≥ { ( X ,...,X m ): k X k k =1 , ∀ k ∈ [ m ] } L ( X , . . . , X m , α , . . . , α m , β , . . . , β m − ) .

5n the above problem, the objective function is strongly convex. Thus, the optimal solutionis unique. For the simplicity, we let α = ( α , α , . . . , α m ) ∈ R mn and β = ( β , β , . . . , β m − ) ∈ R ( m − n and assume the convention β = β m = n . After the simple calculations, the optimalsolution ¯ X ( α, β ) = ( ¯ X ( α, β ) , . . . , ¯ X m ( α, β )) has the following form:( ¯ X k ( α, β )) ij = e η − ( ω − k ( α ki + β k − ,j − β kj ) − ( C k ) ij ) P ≤ i,j ≤ n e η − ( ω − k ( α ki + β k − ,j − β kj ) − ( C k ) ij ) for all k ∈ [ m ] , (7)Plugging Eq. (7) into Eq. (6) yields that the dual form is:max α ,...,α m ,β ,...,β m −  − η m X k =1 ω k log  X ≤ i,j ≤ n e η − ( ω − k ( α ki + β k − ,j − β kj ) − ( C k ) ij )  + m X k =1 α ⊤ k u k  . In order to streamline our subsequent presentation, we perform a change of variables λ k =( ηω k ) − α k and τ k = ( ηω k ) − ( β k − − β k ) for all k ∈ [ m ]. Recall that β = β m = n , we have P mk =1 ω k τ k = n . Putting these pieces together, we reformulate the problem asmin λ,τ ∈ R mn ϕ ( λ, τ ) := m X k =1 ω k log  X ≤ i,j ≤ n e λ ki + τ kj − η − ( C k ) ij  − m X k =1 ω k λ ⊤ k u k , s.t. m X k =1 ω k τ k = n . To further simplify the above formulation, we use the notation B k ( λ, τ ) ∈ R n × n by ( B k ( λ k , τ k )) ij = e λ ki + τ kj − η − ( C k ) ij ) for all ( i, j ) ∈ [ n ] × [ n ]. To this end, we obtain the dual entropic-regularizedFS-WBP problem deﬁned bymin λ,τ ∈ R mn ϕ ( λ, τ ) := m X k =1 ω k (cid:16) log( k B k ( λ k , τ k ) k ) − λ ⊤ k u k (cid:17) , s.t. m X k =1 ω k τ k = n . (8) Remark 2.1.

The ﬁrst part of the objective function ϕ is in the form of the logarithm of sumof exponents while the second part is a linear function. This is diﬀerent from the objectivefunction used in previous dual entropic regularized OT problem in Eq. (5) . We also note thatEq. (8) is a special instance of a softmax minimization problem, and the objective function ϕ is known to be smooth [Nesterov, 2005]. Finally, we point out that the same problem wasderived in the concurrent work by Guminov et al. [2019] and used for analyzing the acceleratedalternating minimization algorithm. In the remainder of the paper, we also denote ( λ ⋆ , τ ⋆ ) ∈ R mn × R mn as an optimal solutionof the dual entropic regularized FS-WBP problem in Eq. (8). In this section, we present several useful properties of the dual entropic regularized MOT inEq. (8). In particular, we show that there exists an optimal solution ( λ ⋆ , τ ⋆ ) such that it hasan upper bound in terms of the ℓ ∞ -norm. Lemma 2.2.

For the dual entropic regularized FS-WBP, let ¯ C = max ≤ k ≤ m k C k k ∞ and ¯ u = min ≤ k ≤ m, ≤ j ≤ n u kj , there exists an optimal solution ( λ ⋆ , τ ⋆ ) such that the following ℓ ∞ -norm bound holds true: k λ ⋆k k ∞ ≤ R λ , k τ ⋆k k ∞ ≤ R τ , for all k ∈ [ m ] , where R λ = 5 η − ¯ C + log( n ) − log(¯ u ) and R τ = 4 η − ¯ C . roof. First, we show that there exists m pairs of optimal solutions { ( λ j , τ j ) } j ∈ [ m ] such thateach of ( λ j , τ j ) satisﬁes thatmax ≤ i ≤ n ( τ jk ) i ≥ ≥ min ≤ i ≤ n ( τ jk ) i , for all k = j. (9)Fixing j ∈ [ m ], we let ( b λ, b τ ) be an optimal solution of the dual entropic regularized FS-WBPin Eq. (8). If b τ satisﬁes Eq. (9), the claim holds true for the ﬁxed j ∈ [ m ]. Otherwise, wedeﬁne m − b τ k = max ≤ i ≤ n ( b τ k ) i + min ≤ i ≤ n ( b τ k ) i ∈ R for all k = j, and let ( λ j , τ j ) with τ jk = b τ k − ∆ b τ k n , λ jk = b λ k + ∆ b τ k n , for all k = j,τ jj = b τ j + ( P k = j ( ω k ω j )∆ b τ k ) n , λ jj = b λ j − ( P k = j ( ω k ω j )∆ b τ k ) n . Using this construction, we have ( λ jk ) i + ( τ jk ) i ′ = ( b λ k ) i + ( b τ k ) i ′ for all i, i ′ ∈ [ n ] and all k ∈ [ m ].This implies that B k ( b λ k , b τ k ) = B k ( λ k ′ k , τ k ′ k ) for all k ∈ [ m ]. Furthermore, we have m X k =1 ω k τ jk = m X k =1 ω k b τ k , m X k =1 ω k ( λ jk ) ⊤ u k = m X k =1 ω k b λ ⊤ k u k . Putting these pieces together yields ϕ ( λ j , τ j ) = ϕ ( b λ, b τ ). Moreover, by the deﬁnition of ( λ j , τ j )and m − τ j satisﬁes Eq. (9). Therefore, we conclude that ( λ j , τ j ) is an optimalsolution that satisﬁes Eq. (9) for the ﬁxed j ∈ [ m ]. Since j ∈ [ m ] is chosen arbitarily, we canﬁnd the desired pairs of optimal solutions { ( λ j , τ j ) } j ∈ [ m ] satisfying Eq. (9) by repeating theabove argument m times.Furthermore, each of ( λ j , τ j ) must satisfy the optimality condition for Eq. (8) for all k ∈ [ m ]. Fixing j ∈ [ m ], there exists z ∈ R n such that m X k =1 ω k τ jk = n and B k ( λ jk , τ jk ) ⊤ n k B k ( λ jk , τ jk ) k − z = n for all k ∈ [ m ] . (10)By the deﬁnition of B k ( · , · ), we have τ jk = log( z ) + log( k B k ( λ jk , τ jk ) k ) n − log( e − η − C k diag( e λ jk ) n ) for all k ∈ [ m ] . This together with the ﬁrst equality in Eq. (10) yields that τ jk = m X l =1 ω l log( e − η − C l diag( e λ jl ) n ) − log( e − η − C k diag( e λ jk ) n ) for all k ∈ [ m ] . For each i ∈ [ n ] and l ∈ [ m ], by the nonnegativity of each entry of C l , we have − η − k C l k ∞ + log( ⊤ n e λ jl ) ≤ [log( e − η − C l diag( e λ jl ) n )] i ≤ log( ⊤ n e λ jl ) . Putting these pieces together yieldsmax ≤ i ≤ n ( τ jk ) i − min ≤ i ≤ n ( τ jk ) i ≤ η − k C k k ∞ + m X l =1 ω l η − k C l k ∞ for all k ∈ [ m ] . (11)7ombining Eq. (9) and Eq. (11) yields that k τ jk k ∞ ≤ η − k C k k ∞ + m X l =1 ω l η − k C l k ∞ for all k = j. (12)Since P mk =1 ω k τ jk = n , we have k τ jj k ∞ ≤ ω − j X k = j ω k k τ jk k ∞ ≤ ( ηω j ) − X k = j ω k k C k k ∞ + ( ηω j ) − (1 − ω j ) m X k =1 ω k k C k k ∞ . Finally, we proceed to the key part and deﬁne the averaging iterate λ ⋆ = m X j =1 ω j λ j , τ ⋆ = m X j =1 ω j τ j . Since ϕ is convex and ( ω , ω , . . . , ω m ) ∈ ∆ m , we have ϕ ( λ ⋆ , τ ⋆ ) ≤ P mj =1 ω j ϕ ( λ j , τ j ) and P mk =1 ω k τ ⋆k = n . Since ( λ j , τ j ) are optimal solutions for all j ∈ [ m ], we conclude that ( λ ⋆ , τ ⋆ )is an optimal solution.The remaining step is to show that k λ ⋆k k ∞ ≤ R λ and k τ ⋆k k ∞ ≤ R τ for all k ∈ [ m ]. Morespeciﬁcally, we have k τ ⋆k k ∞ ≤ m X j =1 ω j k τ jk k ∞ = ω k k τ kk k ∞ + X j = k ω j k τ jk k ∞ ≤ η − X l = k ω l k C l k ∞ + η − (1 − ω k ) m X l =1 ω l k C l k ∞ + η − (1 − ω k )( k C k k ∞ + m X l =1 ω l k C l k ∞ ) ≤ η − k C k k ∞ + 3 η − m X l =1 ω l k C l k ∞ ≤ η − ¯ C = R τ . Since ( λ ⋆ , τ ⋆ ) is an optimal solution, it satisﬁes the optimality condition for Eq. (8). Formally,we have B k ( λ ⋆k , τ ⋆k ) n k B k ( λ ⋆k , τ ⋆k ) k − u k = n for all k ∈ [ m ] . (13)By the deﬁnition of B k ( · , · ), we have λ ⋆k = log( u k ) + log( k B k ( λ ⋆k , τ ⋆k ) k ) n − log( e − η − C k diag( e τ ⋆k ) n ) for all k ∈ [ m ] . This implies thatmax ≤ i ≤ n ( λ ⋆k ) i ≤ η − k C k k ∞ + log( n ) + k τ ⋆k k ∞ + log( k B k ( λ ⋆k , τ ⋆k ) k ) , min ≤ i ≤ n ( λ ⋆k ) i ≥ log(¯ u ) − log( n ) − k τ ⋆k k ∞ + log( k B k ( λ ⋆k , τ ⋆k ) k ) . Therefore, we conclude that k λ ⋆k k ∞ ≤ η − k C k k ∞ + log( n ) − log(¯ u ) + k τ ⋆k k ∞ . This completes the proof. (cid:3) emark 2.3. Lemma 2.2 is analogous to [Lin et al., 2019b, Lemma 3.2] for the OT problem.However, the dual entropic-regularized FS-WBP is more complex and requires a novel construc-tive iterate, ( λ ⋆ , τ ⋆ ) = P mj =1 ω j ( λ j , τ j ) . Moreover, the techniques in Kroshnin et al. [2019] arenot applicable for the analysis of the FastIBP algorithm, and, accordingly, Lemma 2.2 is cru-cial for the analysis.

The upper bound for the ℓ ∞ -norm of the optimal solution of dual entropic-regularizedFS-WBP in Lemma 2.2 leads to the following straightforward consequence. Corollary 2.4.

For the dual entropic regularized FS-WBP, there exists an optimal solution ( λ ⋆ , τ ⋆ ) such that for all k ∈ [ m ] , k λ ⋆k k ≤ √ nR λ , k τ ⋆k k ≤ √ nR τ , for all k ∈ [ m ] , where R λ , R τ > are deﬁned in Lemma 2.2. Finally, we observe that ϕ can be decomposed into the weighted sum of m functions andprove that each component function ϕ k has Lipschitz continuous gradient with the constant4 in the following lemma. Lemma 2.5.

The following statement holds true, ϕ ( λ, τ ) = P mk =1 ϕ k ( λ k , τ k ) where ϕ k ( λ k , τ k ) =log( ⊤ n B k ( λ k , τ k ) n ) − λ ⊤ k u k for all k ∈ [ m ] . Moreover, each ϕ k has Lipschitz continuous gra-dient in ℓ -norm and the Lipschitz constant is upper bounded by . Formally, k∇ ϕ k ( λ, τ ) − ∇ ϕ k ( λ ′ , τ ′ ) k ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λτ (cid:19) − (cid:18) λ ′ τ ′ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) for all k ∈ [ m ] . which is equivalent to ϕ ( λ ′ , τ ′ ) − ϕ ( λ, τ ) ≤ (cid:18) λ ′ − λτ ′ − τ (cid:19) ⊤ ∇ ϕ ( λ, τ ) + 2 m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ′ k − λ k τ ′ k − τ k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ! . Proof.

The ﬁrst statement directly follows from the deﬁnition of ϕ in Eq. (8). For the secondstatement, we provide the explicit form of the gradient of ϕ k as follows, ∇ ϕ k ( λ, τ ) =  B k ( λ,τ ) n ⊤ n B k ( λ,τ ) n − u kB k ( λ,τ ) ⊤ n ⊤ n B k ( λ,τ ) n  . Now we construct the following entropic regularized OT problem,min X : k X k =1 h C k , X i − ηH ( X ) , s.t. X n = (1 /n ) n , X ⊤ n = (1 /n ) n . Since the function − H ( X ) is strongly convex with respect to the ℓ -norm on the probabilitysimplex Q ⊆ R n , the above entropic regularized OT problem is a special case of the followinglinearly constrained convex optimization problem:min x ∈ Q f ( x ) , s.t. Ax = b, where f is strongly convex with respect to the ℓ -norm on the set Q . We use the ℓ -norm forthe dual space of the Lagrange multipliers. By Nesterov [2005, Theorem 1] and the fact that k A k → = 2, the dual objective function ˜ ϕ k satisﬁes the following inequality: k∇ ˜ ϕ k ( α, β ) − ∇ ˜ ϕ k ( α ′ , β ′ ) k ≤ η (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) αβ (cid:19) − (cid:18) α ′ β ′ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) . ϕ k is given by˜ ϕ k ( α, β ) = η log  n X i,j =1 e − ( Ck ) ij − αi − βjη −  − h α, u i − h β, v i + η. This together with the deﬁnition of B k ( · , · ) implies that ∇ ˜ ϕ k ( α, β ) =  B k ( η − α − (1 / n ,η − β − (1 / n ) n ⊤ n B k ( η − α − (1 / n ,η − β − (1 / n ) n − u B k ( η − α − (1 / n ,η − β − (1 / n ) ⊤ n ⊤ n B k ( η − α − (1 / n ,η − β − (1 / n ) n − v  Performing the change of variable λ = η − α − (1 / n and τ = η − β − (1 / n (resp. λ ′ and τ ′ ), we have k∇ ϕ k ( λ, τ ) − ∇ ϕ k ( λ ′ , τ ′ ) k = k∇ ˜ ϕ k ( η ( λ + (1 / n ) , η ( τ + (1 / n )) − ∇ ˜ ϕ k ( η ( λ ′ + (1 / n ) , η ( τ ′ + (1 / n )) k≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ′ − λτ ′ − τ (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) . This completes the proof. (cid:3)

Remark 2.6.

It is worthy noting that Lemma 2.5 exploits the decomposable structure ofthe dual function ϕ , and gives the a weighted smoothness inequality. This inequality isnecessary for deriving the complexity bound which depends linearly on the number of probabilitymeasures. In this section, we analyze the computational hardness of the FS-WBP in Eq. (3). Afterintroducing some characterization theorems in combinatorial optimization, we show that theFS-WBP in Eq. (3) is a minimum-cost ﬂow (MCF) problem when m = 2 and n ≥ m ≥ n ≥ We present some classical results in combinatorial optimization and graph theory, includingGhouila-Houri’s celebrated characterization theorem [Ghouila-Houri, 1962].

Deﬁnition 3.1.

A totally unimodular (TU) matrix is one for which every square submatrixhas determinant − , or . Proposition 3.1 (Ghouila-Houri) . A {− , , } -valued matrix A ∈ R m × n is TU if and onlyif for each I ⊆ [ m ] there is a partition I = I ∪ I so that P i ∈ I a ij − P i ∈ I a ij ∈ {− , , } for j ∈ [ n ] . The second result [Berge, 2001, Theorem 1, Chapter 15] shows that the incidence matricesof directed graphs and 2-colorable undirected graphs are TU.10 roposition 3.2.

Let A be a {− , , } -valued matrix. Then A is TU if each column containsat most two nonzero entries and all rows are partitioned into two sets I and I such that: Iftwo nonzero entries of a column have the same sign, they are in diﬀerent sets. If these twoentries have diﬀerent signs, they are in the same set. Finally, we characterize the constraint matrix arising in a MCF problem.

Deﬁnition 3.2.

The MCF problem ﬁnds the cheapest possible way of sending a certainamount of ﬂow through a ﬂow network. Formally, min P ( u,v ) ∈ E f ( u, v ) · a ( u, v )s.t. f ( u, v ) ≥ , for all ( u, v ) ∈ E,f ( u, v ) ≤ c ( u, v ) for all ( u, v ) ∈ E,f ( u, v ) = − f ( v, u ) for all ( u, v ) ∈ E, P ( u,w ) ∈ E or ( w,u ) ∈ E f ( u, w ) = 0 , P w ∈ V f ( s, w ) = d and P w ∈ V f ( w, t ) = d. The ﬂow network G = ( V, E ) is a directed graph G = ( V, E ) with a source vertex s ∈ V anda sink vertex t ∈ V , where each edge ( u, v ) ∈ E has capacity c ( u, v ) > , ﬂow f ( u, v ) ≥ and cost a ( u, v ) , with most MCF algorithms supporting edges with negative costs. The cost ofsending this ﬂow along an edge ( u, v ) is f ( u, v ) · a ( u, v ) . The problem requires an amount ofﬂow d to be sent from source s to sink t . The deﬁnition of the problem is to minimize thetotal cost of the ﬂow over all edges. Proposition 3.3.

The constraint matrix arising in a MCF problem is TU and its rows arecategorized into a single set using Proposition 3.2.Proof.

The standard LP representation of the MCF problem ismin x ∈ R | E | c ⊤ x, s.t. Ax = b, l ≤ x ≤ u. where x ∈ R | E | with x j being the ﬂow through arc j , b ∈ R | V | with b i being external supplyat node i and ⊤ b = 0, c j is unit cost of ﬂow through arc j , l j and u j are lower and upperbounds on ﬂow through arc j and A ∈ R | V |×| E | is the arc-node incidence matrix with entries A ij =  − j starts at node i j ends at node i . Since each arc has two endpoints, the constraint matrix A is a {− , , } -valued matrix inwhich each column contains two nonzero entries 1 and −

1. Using Proposition 3.2, we obtainthat A is TU and the rows of A are categorized into a single set. (cid:3) We present our main results on the computational hardness of the FS-WBP in Eq. (3). First,the FS-WBP in this LP form is an MCF problem when m = 2 and n ≥

2; see Figure 1 forthe graph when ( m, n ) = (2 , n warehouses, n transshipment centers and n retail outlets. Each u i is the amount of supply provided bythe i th warehouse and each u j is the amount of demand requested by the j th retail outlet.11 u u u u u u u Figure 1: Represent the FS-WBP in Eq. (3) as a MCF problem when ( m, n ) = (2 , X ) ij is the ﬂow sent from i th warehouse to the j th transshipment center and ( X ) ij is theﬂow sent from the i th transshipment center to the j th retail outlet. ( C ) ij and ( C ) ij refer tothe unit cost of the corresponding ﬂow. See [Anderes et al., 2016, Page 400].Proceed to the setting m ≥

3, it is unclear whether the graph representation of the FS-WBP carries over. Instead, we turn to algebraic techniques and provide an explicit form asfollows: min m X k =1 h C k , X k i , s.t.  − E · · · · · · · · · · · · ... E . . . . . . ...... . . . . . . . . . ...... . . . . . . ( − m − E ...... . . . . . . . . . ( − m EG − G . . . . . . ...... − G G . . . ...... . . . . . . . . . ... · · · · · · · · · ( − m G ( − m +1 G   vec ( X )vec ( X )............vec ( X m )  =  − u u ...( − m − u m − ( − m u m n ... n  , (14) where E = I n ⊗ ⊤ n ∈ R n × n and G = ⊤ n ⊗ I n ∈ R n × n . Each column of the constraint matrixarising in Eq. (14) has either 2 or 3 nonzero entries in {− , , } . In the following theorem,we study the structure of the constraint matrix when m ≥ n = 2. Theorem 3.4.

The constraint matrix arising in Eq. (14) is TU when m ≥ and n = 2 .Proof. When n = 2, the constraint matrix A has E = I ⊗ ⊤ and G = ⊤ ⊗ I . The matrix A ∈ R (4 m − × m is a {− , , } -valued matrix with several redundant rows and each columnhas at most three nonzero entries in {− , , } . Now we simplify the matrix A by removing aspeciﬁc set of redundant rows. In particular, we observe that X i ∈{ , , , , m +1 , m +2 } a ij = 0 , ∀ j ∈ [4 m ] , m + 2)th row is redundant. Similarly, we have X i ∈{ , , , , m +3 , m +4 } a ij = 0 , ∀ j ∈ [4 m ] , which implies that the (2 m + 3)th row is redundant. Using this argument, we remove m − m − A ∈ R (3 m − × m has very nice structuresuch that each column has only two nonzero entries 1 and −

1; see the following matrix when m is odd: ¯ A =  − E · · · · · · · · · · · · ... E . . . . . . ...... . . . . . . . . . ...... . . . . . . ( − m − E ...... . . . . . . . . . ( − m E ⊤ ⊗ e − ⊤ ⊗ e . . . . . . ...... − ⊤ ⊗ e ⊤ ⊗ e . . . ...... . . . . . . . . . ... · · · · · · · · · ( − m ⊤ ⊗ e ( − m +1 ⊤ ⊗ e  . where e and e are respectively the ﬁrst and second standard basis (row) vectors in R . Fur-thermore, the rows of ¯ A are categorized into a single set so that the criterion in Proposition 3.2holds true (the dashed line in the formulation of ¯ A serves as a partition of this single set intotwo sets). Using Proposition 3.2, we conclude that ¯ A is TU. (cid:3) To facilitate the reader, we provide an illustrative counterexample for showing that theFS-WBP in Eq. (14) is not an MCF problem when m = 3 and n = 3. Example 3.1.

When m = 3 and n = 3 , the constraint matrix is A =  − I ⊗ ⊤ × × × I ⊗ ⊤ × × × − I ⊗ ⊤ ⊤ ⊗ I − ⊤ ⊗ I × × − ⊤ ⊗ I ⊤ ⊗ I  . Setting the set I = { , , , , , , } and letting e , e and e be the ﬁrst, second andthird standard basis row vectors in R n , the resulting matrix with the rows in I is R =  − e ⊗ ⊤ × × × e ⊗ ⊤ × × × − e ⊗ ⊤ ⊤ ⊗ e − ⊤ ⊗ e × ⊤ ⊗ e − ⊤ ⊗ e × × − ⊤ ⊗ e ⊤ ⊗ e × − ⊤ ⊗ e ⊤ ⊗ e  . nstead of considering all columns of R , it suﬃces to show that no partition of I guaranteesfor any j ∈ { , , , , , , } that X i ∈ I R ij − X i ∈ I R ij ∈ {− , , } . We write the submatrix of R with these columns as ¯ R =  − − − −

11 0 0 0 − − − −  First, we claim that rows , , , and are in the same set I . Indeed, columns and imply that rows , and are in the same set. Column and imply that rows , and are in the same set. Putting these pieces together yields the desired claim. Then we considerthe set that the row belongs to and claim a contradiction. Indeed, row can not be in I since column implies that rows and are not in the same set. However, row must bein I since columns and imply that rows , and are in the same set. Putting thesepieces together yields the desired contradiction. Finally, by using Propositions 3.1 and 3.3, weconclude that A is not TU and problem (14) is not a MCF problem when m = 3 and n = 3 . Finally, we prove that the FS-WBP in Eq. (14) is not a MCF when m ≥ n ≥ Theorem 3.5.

The FS-WBP in Eq. (14) is not a MCF problem when m ≥ and n ≥ .Proof. We use the proof by contradiction. In particular, assume that problem (14) is a MCFproblem when m ≥ n ≥

3, Proposition 3.3 implies that the constraint matrix A isTU. Since A is a {− , , } -valued matrix, Proposition 3.1 further implies that for each set I ⊆ [2 mn − n ] there is a partition I , I of I such that X i ∈ I a ij − X i ∈ I a ij ∈ {− , , } , ∀ j ∈ [ mn ] . (15)In what follows, for any given m ≥ n ≥

3, we construct a set of rows I such that nopartition of I guarantees that Eq. (15) holds true. For the ease of presentation, we rewritethe matrix A ∈ R (2 mn − n ) × mn as follows, A =  − I n ⊗ ⊤ n · · · · · · · · · · · · ... I n ⊗ ⊤ n . . . . . . ...... . . . . . . . . . ...... . . . . . . ( − m − I n ⊗ ⊤ n ...... . . . . . . . . . ( − m I n ⊗ ⊤ n ⊤ n ⊗ I n − ⊤ n ⊗ I n . . . . . . ...... − ⊤ n ⊗ I n ⊤ n ⊗ I n . . . ...... . . . . . . . . . ... · · · · · · · · · ( − m ⊤ n ⊗ I n ( − m +1 ⊤ n ⊗ I n  . I = { , n + 1 , n + 1 , n + 1 , n + 2 , n + 1 , n + 3 } and letting e , e and e be the ﬁrst, second and third standard basis row vectors in R n , the resulting matrix with therows in I is R =  − e ⊗ ⊤ n × n × n × n · · · × n × n e ⊗ ⊤ n × n × n · · · × n × n × n − e ⊗ ⊤ n × n · · · × n ⊤ n ⊗ e − ⊤ n ⊗ e × n × n · · · × n ⊤ n ⊗ e − ⊤ n ⊗ e × n × n · · · × n × n − ⊤ n ⊗ e ⊤ n ⊗ e × n · · · × n × n − ⊤ n ⊗ e ⊤ n ⊗ e × n · · · × n  . Instead of considering all columns of R , it suﬃces to show that no partition of I guarantees X i ∈ I R ij − X i ∈ I R ij ∈ {− , , } , for all j ∈ { , , n + 2 , n + 3 , n + n + 1 , n + 1 , n + 3 } . We write the submatrix of R withthese columns as ¯ R =  − − − −

11 0 0 0 − − − −  . Applying the same argument used in Example 3.1, we obtain from Propositions 3.1 and 3.3that A is not TU when m ≥ n ≥

3, which is a contradiction. As a consequence, theconclusion of the theorem follows. (cid:3)

Remark 3.6.

Theorem 3.5 resolves an open question and partially explains why the directapplication of network ﬂow algorithms to the FS-WBP in Eq. (14) is ineﬃcient. However, thisdoes not eliminate the possibility that the FS-WBP is equivalent to some other LP with goodcomplexity. For example, Ge et al. [2019] have recently successfully identiﬁed an equivalentLP formulation of the FS-WBP which is suitable for the interior-point algorithm. Further-more, our results support the problem reformulation of the FS-WBP which forms the basisfor various algorithms; e.g., Benamou et al. [2015], Cuturi and Peyr´e [2016], Kroshnin et al.[2019], Ge et al. [2019], Guminov et al. [2019].

In this section, we present a fast deterministic variant of the iterative Bregman projection(IBP) algorithm, named

FastIBP algorithm, and prove that it achieves the complexity boundof e O ( mn / ε − / ). This improves over e O ( mn ε − ) from iterative Bregman projection algo-rithm [Benamou et al., 2015] in terms of ε and e O ( mn / ε − ) from the APDAGD and acceler-ated Sinkhorn algorithms [Kroshnin et al., 2019, Guminov et al., 2019] in terms of n .15 lgorithm 1: FastIBP ( { C k , u k } k ∈ [ m ] , ε ) Initialization: t = 0, θ = 1 and ˇ λ = ˜ λ = ˇ τ = ˜ τ = mn . while E t > ε doStep 1: Compute (cid:18) ¯ λ t ¯ τ t (cid:19) = (1 − θ t ) (cid:18) ˇ λ t ˇ τ t (cid:19) + θ t (cid:18) ˜ λ t ˜ τ t (cid:19) . Step 2:

Compute r k = r ( B k (¯ λ tk , ¯ τ tk )) and c k = c ( B k (¯ λ tk , ¯ τ tk )) for all k ∈ [ m ] and perform˜ λ t +1 k = ˜ λ tk − θ k (cid:18) r k ⊤ n r k − u k (cid:19) , for all k ∈ [ m ] , ˜ τ t +1 = argmin P mk =1 ω k τ k = n m X k =1 ω k (cid:20) ( τ k − ¯ τ tk ) ⊤ c k ⊤ n c k + 2 θ t k τ k − ˜ τ tk k (cid:21) . Step 3:

Compute (cid:18)b λ t b τ t (cid:19) = (cid:18) ¯ λ t ¯ τ t (cid:19) + θ t (cid:18) ˜ λ t +1 ˜ τ t +1 (cid:19) − θ t (cid:18) ˜ λ t ˜ τ t (cid:19) . Step 4:

Compute (cid:18) ´ λ t ´ τ t (cid:19) = argmin (cid:26) ϕ ( λ, τ ) | (cid:18) λτ (cid:19) ∈ (cid:26)(cid:18) ˇ λ t ˇ τ t (cid:19) , (cid:18)b λ t b τ t (cid:19)(cid:27)(cid:27) . Step 5a:

Compute c k = c ( B k (´ λ tk , ´ τ tk )) for all k ∈ [ m ]. Step 5b:

Compute ` τ tk = ´ τ tk + P mk =1 ω k log( c k ) − log( c k ) for all k ∈ [ m ] and ` λ t +1 = ´ λ t . Step 6a:

Compute r k = r ( B k (` λ tk , ` τ tk )) for all k ∈ [ m ]. Step 6b:

Compute λ tk = ` λ tk + log( u k ) − log( r k ) for all k ∈ [ m ] and τ t = ` τ t . Step 7a:

Compute c k = c ( B k ( λ tk , τ tk )) for all k ∈ [ m ]. Step 7b:

Compute ˇ τ t +1 k = τ tk + P mk =1 ω k log( c k ) − log( c k ) for all k ∈ [ m ] and ˇ λ t +1 = λ t . Step 8:

Compute θ t +1 = θ t ( p θ t + 4 − θ t ) / Step 9:

Increment by t = t + 1. end whileOutput: ( B ( λ t , τ t ) , B ( λ t , τ t ) , . . . , B m ( λ tm , τ tm )). To facilitate the later discussion, we present the

FastIBP algorithm in pseudocode form inAlgorithm 1 and its application to entropic regularized FS-WBP in Algorithm 2. Note that( B ( λ t , τ t ) , . . . , B m ( λ tm , τ tm )) stand for the primal variables while ( λ t , τ t ) are the dual variablesfor the entropic regularized FS-WBP.The FastIBP algorithm is a deterministic variant of the iterative Bregman projection(IBP) algorithm [Benamou et al., 2015]. While the IBP algorithm can be interpreted as adual coordinate descent, the acceleration achieved by the

FastIBP algorithm mostly dependson the reﬁned characterization of per-iteration progress using the scheme with momentum;see

Step 1-3 and

Step 8 . To the best of our knowledge, this scheme has been well stud-ied by [Nesterov, 2012, 2013, Fercoq and Richt´arik, 2015, Nesterov and Stich, 2017] yet ﬁrst introduced to accelerate the optimal transport algorithms.Furthermore,

Step 4 guarantees that { ϕ (ˇ λ t , ˇ τ t ) } t ≥ is monotonically decreasing and Step7 ensures the suﬃcient large progress from ( λ tk , τ tk ) to (ˇ λ t +1 , ˇ τ t +1 ). Step 5 are performed suchthat τ tk = ` τ tk satisﬁes the bounded diﬀerence property: max ≤ i ≤ n ( τ tk ) i − min ≤ i ≤ n ( τ tk ) i ≤ R τ / Step 6 guarantees that the marginal conditions hold true: r ( B k ( λ tk , τ tk )) = u k for all k ∈ [ m ]. We see from Guminov et al. [2019, Lemma 9] that Step 5-7 refer to the alternatingminimization steps for the dual objective function ϕ with respect to two-block variable ( λ, τ ).16ore speciﬁcally, these steps can be rewritten as follows,( Step 5 ) ` λ t = ´ λ t , ` τ t = argmin ϕ (´ λ t , τ ) , s.t. P mk =1 τ k = n , ( Step 6 ) λ t = argmin ϕ ( λ, ` τ t ) , τ t = ` τ t , ( Step 7 ) ˇ λ t +1 = λ t , ˇ τ t +1 = argmin ϕ ( λ t , τ ) , s.t. P mk =1 τ k = n . We also remark that

Step 4-7 are specialized to the FS-WBP in Eq. (3) and have not beenappeared in the coordinate descent literature before.Finally, the optimality conditions of primal entropic regularized WBP in Eq. (4) and dualentropic regularized WBP in Eq. (8) are n = r ( B k ( λ k ,τ k )) k B k ( λ k ,τ k ) k − u k , for all k ∈ [ m ] , n = c ( B k ( λ k ,τ k )) k B k ( λ k ,τ k ) k − P mi =1 ω i c ( B i ( λ i ,τ i )) k B i ( λ i ,τ i ) k , for all k ∈ [ m ] , n = P mk =1 ω k τ k . Since the

FastIBP algorithm guarantees that P mk =1 ω k τ tk = n and r ( B k ( λ tk , τ tk )) = u k ∈ ∆ n for all k ∈ [ m ], we can solve simultaneously primal and dual entropic regularized FS-WBPwith an adaptive stopping criterion which does not require to calculate any objective value.The criterion depends on the following quantity to measure the residue at each iteration: E t := m X k =1 ω k k c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ ti , τ ti )) k . (16)For the existing algorithms, e.g., accelerated IBP and APDAGD, they are developed basedon the primal-dual optimization framework which allows for achieving better dependence on1 /ε than FastIBP by directly optimizing E t . In contrast, the FastIBP algorithm indirectlyoptimizes E t through the dual objective gap and the scheme with momentum (cf. Step 1-3 and

Step 8 ), which can lead to better dependence on n . Remark 4.1.

We provide some comments on the

FastIBP algorithm. First, each iterationupdates O ( mn ) entries which is similar to the IBP algorithm. Updating ˜ λ and ˇ λ can be eﬃ-ciently implemented in distributed manner and each of m machine updates O ( n ) entries ateach iteration. Second, the computation of m marginals can be performed using implemen-tation tricks. Indeed, this can be done eﬀectively by using r ( e − η − C k ) and c ( e − η − C k ) for all k ∈ [ m ] , which are computed and stored at the beginning of the algorithm. Remark 4.2.

First, we notice that ( b X , b X , . . . , b X m ) are one set of approximate optimaltransportation plans between m measures { u k } k ∈ [ m ] and an ε -approximate barycenter b u . Thesematrices are equivalent to those constructed by using [Altschuler et al., 2017, Algorithm 2]. Wealso remark that the approximate barycenter b u can be constructed by only using ( e X , e X , . . . , e X m ) ;see [Kroshnin et al., 2019, Section 2.2] for the details. We present several technical lemmas which are important to analyzing the

FastIBP algorithm.The ﬁrst lemma provides the inductive formula and the upper bound for θ t . Lemma 4.3.

Let { θ t } t ≥ be the iterates generated by the FastIBP algorithm. Then we have < θ t ≤ / ( t + 2) and θ − t = (1 − θ t +1 ) θ − t +1 for all t ≥ . lgorithm 2: Finding a Wasserstein barycenter by the

FastIBP algorithm

Input: η = ε/ (4 log( n )) and ¯ ε = ε/ (4 max ≤ k ≤ m k C k k ∞ ). Step 1:

Compute (˜ u , . . . , ˜ u m ) = (1 − ¯ ε/ u , . . . , u m ) + (¯ ε/ n )( n , . . . , n ). Step 2:

Compute ( e X , e X , . . . , e X m ) = FastIBP ( { C k , ˜ u k } k ∈ [ m ] , ¯ ε/ Step 3:

Round ( e X , e X , . . . , e X m ) to ( b X , b X , . . . , b X m ) using Kroshnin et al. [2019, Algorithm 4]such that ( b X , b X , . . . , b X m ) is feasible to the FS-WBP in Eq. (3). Step 4:

Compute b u = P mk =1 ω k b X ⊤ k n Output: b u . Proof.

By the deﬁnition of θ t , we have (cid:18) θ t +1 θ t (cid:19) = 14 (cid:18)q θ t + 4 − θ t (cid:19) = 1 + θ t (cid:18) θ t − q θ t + 4 (cid:19) = 1 − θ t +1 , which implies the desired inductive formula and θ t > t ≥

0. Then we proceed toprove that 0 < θ t ≤ / ( t + 2) for all t ≥ t = 0 as θ = 1. Assume that the hypothesis holds for t ≤ t , i.e., θ t ≤ / ( t + 2),we have θ t +1 = 21 + q θ t ≤ t + 3 . This completes the proof of the lemma. (cid:3)

The second lemma shows that all the iterates generated by the

FastIBP algorithm arefeasible to the dual entropic regularized FS-WBP for all t ≥ Lemma 4.4.

Let { (ˇ λ t , ˇ τ t ) } t ≥ , { (˜ λ t , ˜ τ t ) } t ≥ , { (¯ λ t , ¯ τ t ) } t ≥ , { ( b λ t , b τ t ) } t ≥ , { (´ λ t , ´ τ t ) } t ≥ , and { ( λ t , τ t ) } t ≥ be the iterates generated by the FastIBP algorithm. Then, we have m X k =1 ω k ˇ τ tk = m X k =1 ω k ˜ τ tk = m X k =1 ω k ¯ τ tk = m X k =1 ω k b τ tk = m X k =1 ω k ´ τ tk = m X k =1 ω k ` τ tk = m X k =1 ω k τ tk = n for all t ≥ . Proof.

We ﬁrst verify Lemma 4.4 when t = 0. Indeed, m X k =1 ω k ˇ τ k = m X k =1 ω k ˜ τ k = n . By the deﬁnition, ¯ τ is a convex combination of ˇ τ and ˜ τ and b τ is a linear combination of¯ τ , ˜ τ and ˜ τ . Thus, we have m X k =1 ω k ¯ τ k = m X k =1 ω k b τ k = n . This also implies that P mk =1 ω k ´ τ k = n . Using the update formula for ` τ , τ and ˇ τ , we have m X k =1 ω k ` τ k = m X k =1 ω k τ k = m X k =1 ω k ˇ τ k = n . Besides that, the update formula for ˜ τ implies P mk =1 ω k ˜ τ k = n . Repeating this argument,we obtain the desired equality in the conclusion of Lemma 4.4 for all t ≥ (cid:3) { τ t } t ≥ generated by the FastIBP algorithmsatisﬁes the bounded diﬀerence property: max ≤ i ≤ n ( τ tk ) i − min ≤ i ≤ n ( τ tk ) i ≤ R τ / Lemma 4.5.

Let { ( λ t , τ t ) } t ≥ be the iterates generated by the FastIBP algorithm. Then thefollowing statement holds true: max ≤ i ≤ n ( τ tk ) i − min ≤ i ≤ n ( τ tk ) i ≤ R τ / , where R τ > is deﬁned in Lemma 2.2.Proof. We observe that τ tk = ` τ tk for all k ∈ [ m ]. By the update formula for ` τ tk , we have` τ tk = ´ τ tk + m X i =1 ω i log( c i ) − log( c k ) = m X i =1 ω i log( e − η − C i diag( e ´ λ ti ) n ) − log( e − η − C k diag( e ´ λ tk ) n ) . After the simple calculation, we have − η − k C k k ∞ + ⊤ n e ´ λ tk ≤ log( e − η − C k diag( e ´ λ tk ) n )] j ≤ ⊤ n e ´ λ tk . Therefore, the following inequality holds true for all k ∈ [ m ],max ≤ i ≤ n ( τ tk ) i − min ≤ i ≤ n ( τ tk ) i ≤ η − k C k k ∞ + η − m X i =1 ω i k C i k ∞ ! = 2 η − ( max ≤ k ≤ m k C k k ∞ ) . This together with the deﬁnition of R τ yields the desired inequality. (cid:3) The ﬁnal lemma presents a key descent inequality for the

FastIBP algorithm.

Lemma 4.6.

Let { (ˇ λ t , ˇ τ t ) } t ≥ be the iterates generated by the FastIBP algorithm and let ( λ ⋆ , τ ⋆ ) be an optimal solution in Lemma 2.2. Then the following statement holds true: ϕ (ˇ λ t +1 , ˇ τ t +1 ) − (1 − θ t ) ϕ (ˇ λ t , ˇ τ t ) − θ t ϕ ( λ ⋆ , τ ⋆ ) ≤ θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ tk τ ⋆k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ t +1 k τ ⋆k − ˜ τ t +1 k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) !! . Proof.

Using Lemma 2.5 with ( λ ′ , τ ′ ) = ( b λ t +1 , b τ t +1 ) and ( λ, τ ) = (¯ λ t , ¯ τ t ), we have ϕ ( b λ t +1 , b τ t +1 ) ≤ ϕ (¯ λ t , ¯ τ t ) + θ t (cid:18) ˜ λ t +1 − ˜ λ t ˜ τ t +1 − ˜ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) + 2 θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) ˜ λ t +1 k − ˜ λ tk ˜ τ t +1 k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ! . After some simple calculations, we ﬁnd that ϕ (¯ λ t , ¯ τ t ) = (1 − θ t ) ϕ (¯ λ t , ¯ τ t ) + θ t ϕ (¯ λ t , ¯ τ t ) , (cid:18) ˜ λ t +1 − ˜ λ t ˜ τ t +1 − ˜ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) = − (cid:18) ˜ λ t − ¯ λ t ˜ τ t − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) + (cid:18) ˜ λ t +1 − ¯ λ t ˜ τ t +1 − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) . Putting these pieces together yields that ϕ ( b λ t +1 , b τ t +1 ) ≤ (1 − θ t ) ϕ (¯ λ t , ¯ τ t ) − θ t (cid:18) ˜ λ t − ¯ λ t ˜ τ t − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) | {z } I (17)+ θ t  ϕ (¯ λ t , ¯ τ t ) + (cid:18) ˜ λ t +1 − ¯ λ t ˜ τ t +1 − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) + 2 θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) ˜ λ t +1 − ˜ λ t ˜ τ t +1 − ˜ τ t (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) !| {z } II  . λ t , ¯ τ t ) that − θ t (cid:18) ˜ λ t − ¯ λ t ˜ τ t − ¯ τ t (cid:19) = θ t (cid:18) ¯ λ t ¯ τ t (cid:19) + (1 − θ t ) (cid:18) ˇ λ t ˇ τ t (cid:19) − (cid:18) ¯ λ t ¯ τ t (cid:19) = (1 − θ t ) (cid:18) ˇ λ t − ¯ λ t ˇ τ t − ¯ τ t (cid:19) . Using this equality and the convexity of ϕ , we haveI = (1 − θ t ) ϕ (¯ λ t , ¯ τ t ) + (cid:18) ˇ λ t − ¯ λ t ˇ τ t − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) ! ≤ (1 − θ t ) ϕ (ˇ λ t , ˇ τ t ) . (18)For the term II in equation (17), the deﬁnition of (˜ λ t +1 , ˜ τ t +1 ) implies that (cid:18) λ − ˜ λ t +1 τ − ˜ τ t +1 (cid:19) ⊤  ∇ ϕ (¯ λ t , ¯ τ t ) + 4 θ t  ω (˜ λ t +11 − ˜ λ t )... ω m (˜ λ t +1 m − ˜ λ tm ) ω (˜ τ t +11 − ˜ τ t )... ω m (˜ τ t +1 m − ˜ τ tm )  ≥ , for all ( λ, τ ) ∈ R mn × P . Letting ( λ, τ ) = ( λ ⋆ , τ ⋆ ) and rearranging the resulting inequality yields that (cid:18) ˜ λ t +1 − ¯ λ t ˜ τ t +1 − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) + 2 θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) ˜ λ t +1 k − ˜ λ tk ˜ τ t +1 k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ! ≤ (cid:18) λ ⋆ − ¯ λ t τ ⋆ − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) + 2 θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ tk τ ⋆k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ t +1 k τ ⋆k − ˜ τ t +1 k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) !! . Using the convexity of ϕ again, we have (cid:18) λ ⋆ − ¯ λ t τ ⋆ − ¯ τ t (cid:19) ⊤ ∇ ϕ (¯ λ t , ¯ τ t ) ≤ ϕ ( λ ⋆ , τ ⋆ ) − ϕ (¯ λ t , ¯ τ t ) . Putting these pieces together yields thatI ≤ ϕ ( λ ⋆ , τ ⋆ ) + 2 θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ tk τ ⋆k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ t +1 k τ ⋆k − ˜ τ t +1 k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) !! . (19)Plugging Eq. (18) and Eq. (19) into Eq. (17) yields that ϕ ( b λ t +1 , b τ t +1 ) ≤ (1 − θ t ) ϕ (ˇ λ t , ˇ τ t )+ θ t ϕ ( λ ⋆ , τ ⋆ )+2 θ t m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ tk τ ⋆k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ t +1 k τ ⋆k − ˜ τ t +1 k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) !! . Since (ˇ λ t +1 , ˇ τ t +1 ) is obtained by an exact coordinate update from ( λ t , τ t ), we have ϕ ( λ t , τ t ) ≥ ϕ (ˇ λ t +1 , ˇ τ t +1 ). Using the similar argument, we have ϕ (´ λ t , ´ τ t ) ≥ ϕ (` λ t , ` τ t ) ≥ ϕ ( λ t , τ t ). By thedeﬁnition of (´ λ t , ´ τ t ), we have ϕ ( b λ t , b τ t ) ≥ ϕ (´ λ t , ´ τ t ). Putting these pieces together yields thedesired inequality. (cid:3) .3 Main result We present an upper bound for the iteration numbers required by the

FastIBP algorithm.

Theorem 4.7.

Let { ( λ t , τ t ) } t ≥ be the iterates generated by the FastIBP algorithm. Thenthe number of iterations required to reach the stopping criterion E t ≤ ε satisﬁes t ≤ (cid:18) n ( R λ + R τ ) ε (cid:19) / , where R λ , R τ > are deﬁned in Lemma 2.2.Proof. First, let δ t = ϕ (ˇ λ t , ˇ τ t ) − ϕ ( λ ⋆ , τ ⋆ ), we show that δ t ≤ n ( R λ + R τ )( t + 1) . (20)Indeed, by Lemma 4.3 and 4.6, we have (cid:18) − θ t +1 θ t +1 (cid:19) δ t +1 − (cid:18) − θ t θ t (cid:19) δ t ≤ m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ tk τ ⋆k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ t +1 k τ ⋆k − ˜ τ t +1 k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) !! . By unrolling the recurrence and using θ = 1 and ˜ λ = ˜ τ = mn , we have (cid:18) − θ t θ t (cid:19) δ t + 2 m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ tk τ ⋆k − ˜ τ tk (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ! ≤ (cid:18) − θ θ (cid:19) δ + 2 m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k − ˜ λ k τ ⋆k − ˜ τ k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ! ≤ m X k =1 ω k (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) λ ⋆k τ ⋆k (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) ! Corollary . ≤ n ( R λ + R τ ) . For t ≥

1, Lemma 4.3 implies that θ − t − = (1 − θ t ) θ − t . Therefore, we conclude that δ t ≤ θ t − n ( R λ + R τ ) . This together with the fact that 0 < θ t − ≤ / ( t + 1) yields the desired inequality.Furthermore, we show that δ t − δ t +1 ≥ E t . (21)Indeed, by the deﬁnition of ∆ t , we have δ t − δ t +1 = ϕ (ˇ λ t , ˇ τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) ≥ ϕ ( λ t , τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) . By the deﬁnition of ϕ , we have ϕ ( λ t , τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) = m X k =1 ω k (log( k B k ( λ tk , τ tk ) k ) − log( k B k (ˇ λ t +1 k , ˇ τ t +1 k ) k )) . Since r ( B k ( λ tk , τ tk )) = u k ∈ ∆ n for all k ∈ [ m ], we have k B k ( λ tk , τ tk ) k = 1. This together withthe update formula of (ˇ λ t +1 , ˇ τ t +1 ) yields that ϕ ( λ t , τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) = − log (cid:16) ⊤ n e P mk =1 ω k log( c ( B k ( λ tk ,τ tk ))) (cid:17) . x ) ≤ x for all x ∈ R , we have ϕ ( λ t , τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) ≥ − ⊤ n e P mk =1 ω k log( c ( B k ( λ tk ,τ tk ))) . Since r ( B k ( λ tk , τ tk )) = u k ∈ ∆ n for all k ∈ [ m ], we have ⊤ n c ( B k ( λ tk , τ tk )) = 1. In addition,( ω , ω , . . . , ω m ) ∈ ∆ m . Thus, we have ϕ ( λ t , τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) ≥ ⊤ n m X k =1 ω k c ( B k ( λ tk , τ tk )) − e P mk =1 ω k log( c ( B k ( λ tk ,τ tk ))) ! . Combining c ( B k ( λ tk , τ tk )) ∈ ∆ n with the arguments in Kroshnin et al. [2019, Lemma 6] yields ϕ ( λ t , τ t ) − ϕ (ˇ λ t +1 , ˇ τ t +1 ) ≥ m X k =1 ω k k c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ ti , τ ti )) k . Using the Cauchy-Schwarz inequality together with P mk =1 ω k = 1, we have E t ≤ m X k =1 ω k k c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ i , τ i )) k . Putting these pieces together yields the desired inequality.Finally, we derive from Eq. (20) and (21) and the non-negativeness of δ t that + ∞ X i = t E i ≤ + ∞ X i = t ( δ i − δ i +1 ) ! ≤ δ t ≤ n ( R λ + R τ )( t + 1) Let

T > E T ≤ ε , we have E t > ε for all t ∈ [ T ]. Without loss of generality, weassume T is even. Then the following statement holds true: ε ≤ n ( R λ + R τ ) T . Rearranging the above inequality yields the desired inequality. (cid:3)

Equipped with the result of Theorem 4.7, we are ready to present the complexity boundof Algorithm 2 for approximating the FS-WBP in Eq. (3).

Theorem 4.8.

The

FastIBP algorithm for approximately solving the FS-WBP in Eq. (3) (Algorithm 2) returns an ε -approximate barycenter b u ∈ R n within O  mn / (max ≤ k ≤ m k C k k ∞ ) p log( n ) ε ! /  arithmetic operations.Proof. Consider the iterate ( e X , e X , . . . , e X m ) be generated by the FastIBP algorithm (cf. Al-gorithm 1), the rounding scheme (cf. Kroshnin et al. [2019, Algorithm 4]) returns the feasiblesolution ( b X , b X , . . . , b X m ) to the FS-WBP in Eq. (3) and c ( b X k ) are the same for all k ∈ [ m ].22o show that b u = P mk =1 ω k c ( b X k ) is an ε -approximate barycenter (cf. Deﬁnition 2.2), itsuﬃces to show that m X k =1 ω k h C k , b X k i ≤ m X k =1 ω k h C k , X ⋆k i + ε, (22)where ( X ⋆ , X ⋆ , . . . , X ⋆m ) is a set of optimal transportation plan between m measures { u k } k ∈ [ m ] and the barycenter of the FS-WBP.First, we derive from the scheme of Kroshnin et al. [2019, Algorithm 4] that the followinginequality holds for all k ∈ [ m ], k b X k − e X k k ≤ k c ( e X k ) − m X i =1 ω i c ( e X i ) k . This together with the H¨older’s inequality implies that m X k =1 ω k h C k , b X k − e X k i ≤ (cid:18) max ≤ k ≤ m k C k k ∞ (cid:19) m X k =1 ω k k c ( e X k ) − m X i =1 ω i c ( e X i ) k ! . (23)Furthermore, we have m X k =1 ω k h C k , e X k − X ⋆k i = m X k =1 ω k ( h C k , e X k i − ηH ( e X k )) − m X k =1 ω k ( h C k , X ⋆k i − ηH ( X ⋆k ))+ m X k =1 ω k ηH ( e X k ) − m X k =1 ω k ηH ( X ⋆k ) . Since 0 ≤ H ( X ) ≤ n ) for any X ∈ R n × n + satisfying that k X k = 1 [Cover and Thomas,2012] and P mk =1 ω k = 1, we have m X k =1 ω k h C k , e X k − X ⋆k i ≤ η log( n ) + m X k =1 ω k ( h C k , e X k i − ηH ( e X k )) − m X k =1 ω k ( h C k , X ⋆k i − ηH ( X ⋆k )) . Let ( X η , X η , . . . , X ηm ) be a set of optimal transportation plans to the entropic regularizedFS-WBP in Eq. (4), we have m X k =1 ω k ( h C k , X ηk i − ηH ( X ηk )) ≤ m X k =1 ω k ( h C k , X ⋆k i − ηH ( X ⋆k )) . By the optimality of ( X η , X η , . . . , X ηm ), we have m X k =1 ω k ( h C k , X ηk i − ηH ( X ηk )) = − η (cid:18) min λ ∈ R mn ,τ ∈P ϕ ( λ, τ ) (cid:19) ≥ − ηϕ ( λ t , τ t ) . Since ( e X , e X , . . . , e X m ) is generated by the FastIBP algorithm, we have e X k = B k ( λ tk , τ tk ) forall k ∈ [ m ] where ( λ t , τ t ) are the dual iterates. Then m X k =1 ω k ( h C k , e X k i − ηH ( e X k )) = m X k =1 ω k ( h C k , B k ( λ tk , τ tk ) i − ηH ( B k ( λ tk , τ tk )))= − η m X k =1 ω k ( ⊤ n B k ( λ tk , τ tk ) n − ( λ tk ) ⊤ u k ) ! + η m X k =1 ω k ( τ tk ) ⊤ c ( B k ( λ tk , τ tk ))= − ηϕ ( λ t , τ t ) + η m X k =1 ω k ( τ tk ) ⊤ c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ ti , τ ti )) !! . m X k =1 ω k h C k , e X k − X ⋆k i ≤ η log( n ) + η m X k =1 ω k ( τ tk ) ⊤ c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ ti , τ ti )) !! . Since ⊤ n c ( B k ( λ tk , τ tk )) = 1 for all k ∈ [ m ], we have m X k =1 ω k ( τ tk ) ⊤ c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ ti , τ ti )) !! = m X k =1 ω k (cid:18) τ tk − max ≤ i ≤ n ( τ tk ) i + min ≤ i ≤ n ( τ tk ) i n (cid:19) ⊤ c ( B k ( λ tk , τ tk )) − m X i =1 ω i c ( B i ( λ ti , τ ti )) !! ≤ (cid:13)(cid:13)(cid:13)(cid:13) τ tk − max ≤ i ≤ n ( τ tk ) i + min ≤ i ≤ n ( τ tk ) i n (cid:13)(cid:13)(cid:13)(cid:13) ∞ m X k =1 ω k k c ( e X k ) − m X i =1 ω i c ( e X i ) k ! . Using Lemma 4.5, we have (cid:13)(cid:13)(cid:13)(cid:13) τ tk − max ≤ i ≤ n ( τ tk ) i + min ≤ i ≤ n ( τ tk ) i n (cid:13)(cid:13)(cid:13)(cid:13) ∞ ≤ R τ . Putting these pieces together yields that m X k =1 ω k h C k , e X k − X ⋆k i ≤ η log( n ) + ηR τ m X k =1 ω k k c ( e X k ) − m X i =1 ω i c ( e X i ) k ! . (24)Recall that E t = P mk =1 ω k k c ( e X k ) − P mi =1 ω i c ( e X i ) k and R τ = 4 η − (max ≤ k ≤ m k C k k ∞ ). ThenEq. (23) and Eq. (24) together imply that m X k =1 ω k h C k , b X k − X ⋆k i ≤ η log( n ) + 3 (cid:18) max ≤ k ≤ m k C k k ∞ (cid:19) E t . This together with E t ≤ ¯ ε/ η and ¯ ε implies Eq. (22) as desired. Complexity bound estimation.

We ﬁrst bound the number of iterations required by the

FastIBP algorithm (cf. Algorithm 1) to reach E t ≤ ¯ ε/

2. Indeed, Theorem 4.7 implies that t ≤ (cid:18) n ( R λ + R τ )¯ ε (cid:19) / ≤ √ n (cid:18) R λ + R τ ¯ ε (cid:19) / . For the simplicity, we let ¯ C = max ≤ k ≤ m k C k k ∞ . Using the deﬁnition of R λ and R τ inLemma 2.2, the construction of { ˜ u k } k ∈ [ m ] and the choice of η and ¯ ε , we have t ≤ √ n (cid:18) Cε (cid:18)

36 log( n ) ¯ Cε + log( n ) − log (cid:18) n ¯ Cε (cid:19)(cid:19)(cid:19) / = O  √ n ¯ C p log( n ) ε ! /  . Recall that each iteration of the

FastIBP algorithm requires O ( mn ) arithmetic operations,the total arithmetic operations required by the FastIBP algorithm as the subroutine inAlgorithm 2 is bounded by O  mn / ¯ C p log( n ) ε ! /  . { ˜ u k } k ∈ [ m ] needs O ( mn ) arithmetic operations while therounding scheme in Kroshnin et al. [2019, Algorithm 4] requires O ( mn ) arithmetic operations.Putting these pieces together yields that the desired complexity bound of Algorithm 2. (cid:3) Remark 4.9.

For the simplicity, we assume that all measures have the same support size.This assumption is not necessary and our analysis is still valid when each measure has ﬁxedsupport of diﬀerent size. However, our results can not be generalized to the free-supportWasserstein barycenter problem in general since the computation of free-support barycentersrequires solving a multimarginal OT problem where the complexity bounds of algorithms becomemuch worse; see Lin et al. [2019a] for the details.

In this section, we report the results of extensive numerical experiments to evaluate the

FastIBP algorithm for computing ﬁxed-support Wasserstein barycenters. In all our ex-periments, we consider the Wasserstein distance with ℓ -norm, i.e., 2-Wasserstein distance,and compare the FastIBP algorithm with Gurobi, iterative Bregman projection (IBP) al-gorithm [Benamou et al., 2015] and Bregman ADMM (BADMM) [Ye et al., 2017] . All theexperiments are conducted in MATLAB R2020a on a workstation with an Intel Core i5-9400F(6 cores and 6 threads) and 32GB memory, equipped with Ubuntu 18.04. For the

FastIBP algorithm, the regularization parameter η is chosen from { . , . } inour experiments. We follow Benamou et al. [2015, Remark 3] to implement the algorithm andterminate it when P mk =1 ω k k c ( B k ( λ tk , τ tk )) − P mi =1 ω i c ( B i ( λ ti , τ ti )) k P mk =1 ω k k c ( B k ( λ tk , τ tk )) k + k P mi =1 ω i c ( B i ( λ ti , τ ti )) k ≤ Tol ﬁbp , P mk =1 ω k k r ( B k ( λ tk , τ tk )) − u k k P mk =1 ω k k r ( B k ( λ tk , τ tk )) k + P mk =1 ω k k u k k ≤ Tol ﬁbp , k P mi =1 ω i c ( B i ( λ ti , τ ti )) − P mi =1 ω i c ( B i ( λ t − i , τ t − i )) k k P mi =1 ω i c ( B i ( λ ti , τ ti )) k + k P mi =1 ω i c ( B i ( λ t − i , τ t − i )) k ≤ Tol ﬁbp , P mk =1 ω k k B k ( λ tk , τ tk ) − B k ( λ t − k , τ t − k ) k F P mk =1 ω k k B k ( λ tk , τ tk ) k F + P mk =1 ω k k B k ( λ t − k , τ t − k ) k F ≤ Tol ﬁbp , P mk =1 ω k k λ tk − λ t − k k P mk =1 ω k k λ tk k + P mk =1 ω k k λ t − k k ≤ Tol ﬁbp , P mk =1 ω k k τ tk − τ t − k k P mk =1 ω k k τ tk k + P mk =1 ω k k τ t − k k ≤ Tol ﬁbp . These inequalities guarantee that (i) the infeasibility violations for marginal constraints, (ii)the iterative gap between approximate barycenters, and (iii) the iterative gap between dual We implement ADMM [Yang et al., 2018], APDAGD [Kroshnin et al., 2019] and acceleratedIBP [Guminov et al., 2019] and ﬁnd that they perform worse than our algorithm. However, we believe itis largely due to our own implementation issue since these algorithms require ﬁne hyper-parameter tuning. Weare also unaware of any public codes available online. Thus, we exclude them for a fair comparison. η = 0 .

01 and every 200 iteration when η = 0 . ﬁbp = 10 − andMaxIter ﬁbp = 10000 on synthetic data and Tol ﬁbp = 10 − on MNIST images.For IBP and BADMM, we use the Matlab code implemented by Ye et al. [2017] and ter-minate them with the reﬁned stopping criterion provided by Yang et al. [2018]. The regular-ization parameter η for the IBP algorithm is still chosen from { . , . } . For synthetic data,we set Tol badmm = 10 − and Tol ibp = 10 − with MaxIter badmm = 5000 and MaxIter ibp = 10000.For MNIST images, we set Tol ibp = 10 − .For the linear programming algorithm, we apply Gurobi 9.0.2 (Gurobi Optimization, 2019)(with an academic license) to solve the FS-WBP in Eq. (3). Since Gurobi can provide highquality solutions when the problem of medium size, we use the solution obtained by Gurobi asa benchmark to evaluate the qualities of solution obtained by diﬀerent algorithms on syntheticdata. In our experiments, we force Gurobi to only run the dual simplex algorithm and useother parameters in the default settings.For the evaluation metrics, “ normalized obj ” stands for the normalized objective valuewhich is deﬁned by normalized obj := | P mk =1 ω k h C k , b X k i − P mk =1 ω k h C k , X g k i|| P mk =1 ω k h C k , X g k i| , where ( b X , . . . , b X m ) is the solution obtained by each algorithm and ( X g , . . . , X g m ) denotesthe solution obtained by Gurobi. “ feasibility ” denotes the the deviation of the terminatingsolution from the feasible set ; see Yang et al. [2018, Section 5.1]. “ iteration ” denotes thenumber of iterations. “ time (in seconds) ” denotes the computational time.In what follows, we present our experimental results. In Section 5.2, we evaluate allthe candidate algorithms on synthetic data and compare their computational performancein terms of accuracy and speed. In Section 5.3, we compare our algorithm with IBP onthe MNIST dataset to visualize the quality of approximate barycenters obtained by eachalgorithm. For the simplicity of the presentation, in our ﬁgures “g” stands for Gurobi; “b”stands for BADMM; “i1” and “i2” stand for the IBP algorithm with η = 0 .

01 and η = 0 . FastIBP algorithm with η = 0 .

01 and η = 0 . In this section, we generate a set of discrete probability distributions { µ k } mk =1 with µ k = { ( u ki , x i ) ∈ R + × R d | i ∈ [ n ] } and P ni =1 u ki = 1. The ﬁxed-support Wasserstein barycenter b µ = { ( b u i , x i ) ∈ R + × R d | i ∈ [ n ] } where ( x , x , . . . , x n ) are known. In our experiment, weset d = 3 and choose diﬀerent values of ( m, n ). Then, given each tuple ( m, n ), we randomlygenerate a trial as follows.First, we generate the support points ( x k , x k , . . . , x kn ) whose entries are drawn from aGaussian mixture distribution via the Matlab commands provided by Yang et al. [2018]: gm num = 5; gm mean = [-20; -10; 0; 10; 20];sigma = zeros(1, 1, gm num); sigma(1, 1, :) = 5*ones(gm num, 1); Available in https://github.com/bobye/WBC Matlab Since we do not put the iterative gap between dual variables in “ feasibility ” and the FS-WBP is relativelyeasier than general WBP, our results for BADMM and IBP are consistently smaller than that presentedby Ye et al. [2017], Yang et al. [2018], Ge et al. [2019]. numer of marginals (m) -2 no r m a li z ed ob j n=100 f2 f1 i2 i1 b

50 100 150 200 numer of marginals (m) t i m e ( i n s e c ond s ) n=100 f2 f1 i2 i1 b g Figure 2: The average normalized objective value and computational time (in seconds) of

FastIBP , IBP , BADMM, and Gurobi from 10 independent trials.

50 100 150 200 numer of marginals (m) -2 -1 no r m a li z ed ob j n=100 f (prox) i (prox)

50 100 150 200 numer of marginals (m) t i m e ( i n s e c ond s ) n=100 f (prox) i (prox) Figure 3: The average normalized objective value and computational time (in seconds) of theproximal variants of

FastIBP and

IBP from 10 independent trials. gm weights = rand(gm num, 1); gm weights = gm weights/sum(gm weights);distrib = gmdistribution(gm mean, sigma, gm weights);

For each k ∈ [ m ], we generate the weight vector ( u k , u k , . . . , u kn ) whose entries are drawnfrom the uniform distribution on the interval (0, 1), and normalize it such that P ni =1 u ki = 1.After generating all { µ k } mk =1 , we use the k-means method to choose n points from { x ki | i ∈ [ n ] , k ∈ [ m ] } to be the support points of the barycenter. Finally, we generate the weight vector( ω , ω , . . . , ω m ) whose entries are drawn from the uniform distribution on the interval (0 , P mk =1 ω k = 1.We present some preliminary numerical results in Figure 2 and 3. Given n = 100, weevaluate the performance of FastIBP , IBP , BADMM algorithms, and Gurobi by varying m ∈ { , , , } and use the same setup to compare the proximal variants of FastIBP and

IBP . We use the proximal framework [Kroshnin et al., 2019] with the same parametersetting as provided by their paper. As indicated in Figure 2, the

FastIBP algorithm performsbetter than BADMM and IBP algorithms in the sense that it consistently returns an objectivevalue closer to that of Gurobi in less computational time. More speciﬁcally, IBP converges veryfast when η = 0 .

01, but suﬀers from a crude solution with poor objective value; BADMM takesmuch more time with unsatisfactory objective value, and is not provably convergent in theory;Gurobi is highly optimized and can solve the problem of relatively small size very eﬃciently.However, when the problem size becomes larger, Gurobi would take much more time. As In our experiments, we call the Matlab function kmeans , which is built in machine learning toolbox.

27n example, for the case where ( m, n ) = (200 , FastIBP algorithm with η = 0 .

001 while keeping relatively small normalizedobjective value. As indicated in Figure 3, the proximal variant of

FastIBP algorithm alsooutperforms that of

IBP algorithm in terms of objective value while not sacriﬁcing the timeeﬃciency. To facilitate the readers, we present the averaged results from 10 independent trialswith

FastIBP , IBP , BADMM algorithms, and Gurobi in Table 1. Note that we implement the rounding scheme after each algorithm (except Gurobi) so the terms in “feasibility” arezero up to numerical errors for most of medium-size problems.

500 1000 1500 2000 numer of marginals (m) t i m e ( i n s e c ond s ) n=100 f2g m g f2 normalized obj

200 - 3.6e-03 ± ± ± ± feasibility

200 3.2e-07 ± ± ± ± ± ± ± ± iteration

200 1.5e+05 ± ± ± ± ± ± ± ± Figure 4:

Preliminary results with Gurobiand the

FastIBP algorithm ( η = 0 . To further compare the performances of Gurobiand the

FastIBP algorithm, we conduct the exper-iment with n = 100 and the varying number ofmarginals m ∈ { , , , } . We ﬁx Tol ﬁbp =10 − but without setting the maximum iteration num-ber. Figure 4 shows the average running time taken bytwo algorithms over 5 independent trials. We see thatthe FastIBP algorithm is competitive with Gurobiin terms of objective value and feasibility violation.In terms of computational time, the

FastIBP algo-rithm increases linearly with respect to the number ofmarginals, while Gurobi increases much more rapidly.Compared to the similar results of Gurobi presentedbefore [Yang et al., 2018, Ge et al., 2019], we ﬁnd thatthe feasibility violation in our paper is better but thecomputational time grows much faster. This makessense since we run the dual simplex algorithm, whichiterates over the feasible solutions but is more compu-tationally expensive than the interior-point algorithm.Figure 4 demonstrates that the structure of the FS-WBP is not favorable to the dual simplex algorithm,conﬁrming our computational hardness results in Sec-tion 3.

To better visualize the quality of approximate barycenters obtained by each algorithm, wefollow Cuturi and Doucet [2014] on the MNIST dataset [LeCun et al., 1998]. We randomlyselect 50 images for each digit (1 ∼

9) and resize each image to ζ times of its original size of28 ×

28, where ζ is drawn uniformly at random from [0 . , ×

56 blank image and normalize the resulting image so that all pixelvalues add up to 1. Each image can be viewed as a discrete distribution supported on grids.Additionally, we set the weight vector ( ω , ω , . . . , ω m ) such that ω k = 1 /m for all k ∈ [ m ].We apply the FastIBP algorithm ( η = 0 . η = 0 . ×

56. For a fair comparison, we do not implement convolutional technique [Solomon et al., Available in http://yann.lecun.com/exdb/mnist/ m n g b i1 i2 f1 f2 normalized obj

20 50 - 5.9e-01 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± feasibility

20 50 4.9e-07 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± iteration

20 50 3115 ±

532 5000 ± ±

158 7140 ± ± ± ±

999 3400 ±

94 400 ± ± ± ± ± ±

175 400 ± ± ± ± ± ± ±

227 7320 ± ± ± ± ±

103 580 ±

63 9920 ± ± ± ± ±

97 580 ±

63 6800 ± ±

63 1240 ± ± ±

887 380 ±

274 5180 ± ±

84 5940 ± ± ±

84 700 ±

287 10020 ± ± ± ± ±

114 920 ±

103 9720 ± ± ± ± ± ±

141 11140 ± ± ± ± ± ±

537 12840 ± ± ± ± ±

94 1440 ±

158 12160 ± ± ± time (in seconds)

20 50 3.6e-01 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± astIBP ( η = 0 . η = 0 . Approximate barycenters obtained by running

FastIBP and IBP for 100s, 200s, 400s, 800s. η . The approximate barycenters obtained by the FastIBP and IBP algorithms are presented in Table 2. It can be seen that the

FastIBP algorithmprovides a “sharper” approximate barycenter than IBP when η = 0 .

001 is set for both. Thisdemonstrates the good quality of the solution obtained by our algorithm.

In this paper, we study the computational hardness for solving the ﬁxed-support Wassersteinbarycenter problem (FS-WBP) and proves that the FS-WBP in the standard linear program-ming form is not a minimum-cost ﬂow (MCF) problem when m ≥ n ≥

3. Our resultssuggest that the direct application of network ﬂow algorithms to the FS-WBP in standardLP form is ineﬃcient, shedding the light on the practical performance of various existingalgorithms, which are developed based on problem reformulation of the FS-WBP. Moreover,we propose a deterministic variant of iterative Bregman projection (IBP) algorithm, namely

FastIBP , and prove that the complexity bound is e O ( mn / ε − / ). This bound is betterthan the complexity bound of e O ( mn ε − ) from the IBP algorithm in terms of ε , and that of e O ( mn / ε − ) from other accelerated algorithms in terms of n . Experiments on synthetic andreal datasets demonstrate the favorable performance of the FastIBP algorithm in practice.

We would like to thank Lei Yang for very helpful discussion with the experiments on MNISTdatasets. Xi Chen is supported by National Science Foundation via the Grant IIS-1845444.This work is supported in part by the Mathematical Data Science program of the Oﬃce ofNaval Research under grant number N00014-18-1-2764.30 eferences

M. Agueh and G. Carlier. Barycenters in the Wasserstein space.

SIAM Journal on Mathe-matical Analysis , 43(2):904–924, 2011. (Cited on pages 1, 3, and 4.)

J. Altschuler, J. Weed, and P. Rigollet. Near-linear time approximation algorithms for optimaltransport via Sinkhorn iteration. In

NIPS , pages 1964–1974, 2017. (Cited on pages 2 and 17.)

E. Anderes, S. Borgwardt, and J. Miller. Discrete wasserstein barycenters: Optimal transportfor discrete data.

Mathematical Methods of Operations Research , 84(2):389–409, 2016. (Citedon pages 2 and 12.)

J-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyr´e. Iterative Bregman projectionsfor regularized transportation problems.

SIAM Journal on Scientiﬁc Computing , 37(2):A1111–A1138, 2015. (Cited on pages 2, 3, 4, 15, 16, and 25.)

C. Berge.

The Theory of Graphs . Courier Corporation, 2001. (Cited on page 10.)

J. Blanchet, A. Jambulapati, C. Kent, and A. Sidford. Towards optimal running times foroptimal transport.

ArXiv Preprint: 1810.07717 , 2018. (Cited on page 2.)

N. Bonneel, J. Rabin, G. Peyr´e, and H. Pﬁster. Sliced and radon wasserstein barycenters ofmeasures.

Journal of Mathematical Imaging and Vision , 51(1):22–45, 2015. (Cited on pages 1and 2.)

N. Bonneel, G. Peyr´e, and M. Cuturi. Wasserstein barycentric coordinates: histogram re-gression using optimal transport.

ACM Transactions on Graphics , 35(4):71:1–71:10, 2016. (Cited on page 1.)

S. Borgwardt and S. Patterson. On the computational complexity of ﬁnding a sparse Wasser-stein barycenter.

ArXiv Preprint: 1910.07568 , 2019. (Cited on page 2.)

S. Borgwardt and S. Patterson. Improved linear programs for discrete barycenters.

InformsJournal on Optimization , 2(1):14–33, 2020. (Cited on page 2.)

G. Buttazzo, L. De Pascale, and P. Gori-Giorgi. Optimal-transport formulation of electronicdensity-functional theory.

Physical Review A , 85(6):062502, 2012. (Cited on page 1.)

G. Carlier and I. Ekeland. Matching for teams.

Economic theory , 42(2):397–418, 2010. (Citedon page 1.)

G. Carlier, A. Oberman, and E. Oudet. Numerical methods for matching for teams andwasserstein barycenters.

ESAIM: Mathematical Modelling and Numerical Analysis , 49(6):1621–1642, 2015. (Cited on page 2.)

P-A. Chiappori, R. J. McCann, and L. P. Nesheim. Hedonic price equilibria, stable matching,and optimal transport: equivalence, topology, and uniqueness.

Economic Theory , 42(2):317–354, 2010. (Cited on page 1.)

S. Claici, E. Chien, and J. Solomon. Stochastic Wasserstein barycenters. In

ICML , pages998–1007, 2018. (Cited on page 2.)

31. Cotar, G. Friesecke, and C. Kl¨uppelberg. Density functional theory and optimal trans-portation with coulomb cost.

Communications on Pure and Applied Mathematics , 66(4):548–599, 2013. (Cited on page 1.)

T. M. Cover and J. A. Thomas.

Elements of Information Theory . John Wiley & Sons, 2012. (Cited on page 23.)

M. Cuturi. Sinkhorn distances: lightspeed computation of optimal transport. In

NIPS , pages2292–2300, 2013. (Cited on pages 2 and 5.)

M. Cuturi and A. Doucet. Fast computation of Wasserstein barycenters. In

ICML , pages685–693, 2014. (Cited on pages 1, 2, 4, 5, and 28.)

M. Cuturi and G. Peyr´e. A smoothed dual approach for variational Wasserstein problems.

SIAM Journal on Imaging Sciences , 9(1):320–343, 2016. (Cited on pages 2 and 15.)

M. Cuturi and G. Peyr´e. Semidual regularized optimal transport.

SIAM Review , 60(4):941–965, 2018. (Cited on pages 2 and 5.)

P. Dvurechenskii, D. Dvinskikh, A. Gasnikov, C. Uribe, and A. Nedich. Decentralize andrandomize: faster algorithm for Wasserstein barycenters. In

NeurIPS , pages 10783–10793,2018. (Cited on page 2.)

P. Dvurechensky, A. Gasnikov, and A. Kroshnin. Computational optimal transport: complex-ity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In

ICML , pages1366–1375, 2018. (Cited on page 2.)

O. Fercoq and P. Richt´arik. Accelerated, parallel, and proximal coordinate descent.

SIAMJournal on Optimization , 25(4):1997–2023, 2015. (Cited on page 16.)

W. Gangbo and A. Swiech. Optimal maps for the multidimensional Monge-Kantorovichproblem.

Communications on Pure and Applied Mathematics , 51(1):23–45, 1998. (Cited onpage 1.)

D. Ge, H. Wang, Z. Xiong, and Y. Ye. Interior-point methods strike back: solving theWasserstein barycenter problem. In

NeurIPS , pages 6891–6902, 2019. (Cited on pages 2, 4, 15,26, and 28.)

A. Genevay, M. Cuturi, G. Peyr´e, and F. Bach. Stochastic optimization for large-scale optimaltransport. In

NIPS , pages 3440–3448, 2016. (Cited on page 2.)

A. Ghouila-Houri. Caract´erisation des matrices totalement unimodulaires.

Comptes RedusHebdomadaires des S´eances de l’Acad´emie des Sciences (Paris) , 254:1192–1194, 1962. (Citedon page 10.)

S. Guminov, P. Dvurechensky, N. Tupitsa, and A. Gasnikov. Accelerated alternating mini-mization, accelerated Sinkhorn’s algorithm and accelerated iterative Bregman projections.

ArXiv Preprint: 1906.03622 , 2019. (Cited on pages 2, 3, 5, 6, 15, 16, and 25.)

N. Ho, X. L. Nguyen, M. Yurochkin, H. H. Bui, V. Huynh, and D. Phung. Multilevel clusteringvia Wasserstein means. In

ICML , pages 1501–1509. JMLR. org, 2017. (Cited on page 1.)

A. Jambulapati, A. Sidford, and K. Tian. A direct tilde { O } (1/epsilon) iteration parallelalgorithm for optimal transport. In NeurIPS , pages 11355–11366, 2019. (Cited on page 2.)

32. Kroshnin, N. Tupitsa, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, and C. Uribe. On thecomplexity of approximating Wasserstein barycenters. In

ICML , pages 3530–3540, 2019. (Cited on pages 2, 3, 5, 9, 15, 17, 18, 22, 23, 25, and 27.)

N. Lahn, D. Mulchandani, and S. Raghvendra. A graph theoretic additive approximation ofoptimal transport. In

NeurIPS , pages 13836–13846, 2019. (Cited on page 2.)

T. Le, V. Huynh, N. Ho, D. Phung, and M. Yamada. On scalable variant of Wassersteinbarycenter.

ArXiv Preprint: 1910.04483 , 2019. (Cited on page 2.)

Y. LeCun, L. Bottou, Y. Bengio, and P. Haﬀner. Gradient-based learning applied to documentrecognition.

Proceedings of the IEEE , 86(11):2278–2324, 1998. (Cited on page 28.)

T. Lin, N. Ho, M. Cuturi, and M. I. Jordan. On the complexity of approximating multi-marginal optimal transport.

ArXiv Preprint: 1910.00152 , 2019a. (Cited on pages 2, 5, and 25.)

T. Lin, N. Ho, and M. Jordan. On eﬃcient optimal transport: an analysis of greedy andaccelerated mirror descent algorithms. In

ICML , pages 3982–3991, 2019b. (Cited on pages 2and 9.)

T. Lin, N. Ho, and M. I. Jordan. On the eﬃciency of the Sinkhorn and Greenkhorn algorithmsand their acceleration for optimal transport.

ArXiv Preprint: 1906.01437 , 2019c. (Cited onpage 2.)

E. Munch, K. Turner, P. Bendich, S. Mukherjee, J. Mattingly, J. Harer, et al. Probabilisticfr´echet means for time varying persistence diagrams.

Electronic Journal of Statistics , 9(1):1173–1204, 2015. (Cited on page 1.)

Y. Nesterov. Smooth minimization of non-smooth functions.

Mathematical Programming , 103(1):127–152, 2005. (Cited on pages 6 and 9.)

Y. Nesterov. Eﬃciency of coordinate descent methods on huge-scale optimization problems.

SIAM Journal on Optimization , 22(2):341–362, 2012. (Cited on page 16.)

Y. Nesterov. Gradient methods for minimizing composite functions.

Mathematical Program-ming , 140(1):125–161, 2013. (Cited on page 16.)

Y. Nesterov and S. U. Stich. Eﬃciency of the accelerated coordinate descent method onstructured optimization problems.

SIAM Journal on Optimization , 27(1):110–123, 2017. (Cited on page 16.)

G. Peyr´e and M. Cuturi. Computational optimal transport.

Foundations and Trends ® inMachine Learning , 11(5-6):355–607, 2019. (Cited on pages 1, 2, and 4.) G. Puccetti, L. R¨uschendorf, and S. Vanduﬀel. On the computation of Wasserstein barycenters.

Journal of Multivariate Analysis , 176:104581, 2020. (Cited on page 2.)

K. Quanrud. Approximating optimal transport with linear programs. In

SOSA . SchlossDagstuhl-Leibniz-Zentrum fuer Informatik, 2019. (Cited on page 2.)

J. Rabin, G. Peyr´e, J. Delon, and M. Bernot. Wasserstein barycenter and its application totexture mixing. In

International Conference on Scale Space and Variational Methods inComputer Vision , pages 435–446. Springer, 2011. (Cited on pages 1 and 2.)

33. T. Rockafellar.

Convex Analysis , volume 28. Princeton University Press, 1970. (Cited onpage 5.)

B. Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems.

SIAM Journal on Scientiﬁc Computing , 41(3):A1443–A1481, 2019. (Cited on page 30.)

J. Solomon, F. De Goes, G. Peyr´e, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas.Convolutional Wasserstein distances: eﬃcient optimal transportation on geometric domains.

ACM Transactions on Graphics (TOG) , 34(4):1–11, 2015. (Cited on page 28.)

S. Srivastava, C. Li, and D. B. Dunson. Scalable Bayes via barycenter in Wasserstein space.

Journal of Machine Learning Research , 19(1):312–346, 2018. (Cited on page 1.)

M. Staib, S. Claici, J. M. Solomon, and S. Jegelka. Parallel streaming Wasserstein barycenters.In

NIPS , pages 2647–2658, 2017. (Cited on page 2.)

A. Trouv´e and L. Younes. Local geometry of deformable templates.

SIAM Journal on Math-ematical Analysis , 37(1):17–59, 2005. (Cited on page 1.)

C. A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, and A. Nedi´c. Distributed com-putation of Wasserstein barycenters over networks. In

CDC , pages 6544–6549. IEEE, 2018. (Cited on page 2.)

C. Villani.

Optimal Transport: Old and New , volume 338. Springer Science & Business Media,2008. (Cited on pages 1, 3, and 4.)

L. Yang, J. Li, D. Sun, and K. Toh. A fast globally linearly convergent algorithm for thecomputation of Wasserstein barycenters.

Arxiv Preprint: 1809.04249 , 2018. (Cited on pages 2,4, 25, 26, and 28.)

J. Ye, P. Wu, J. Z. Wang, and J. Li. Fast discrete distribution clustering using wassersteinbarycenter with sparse support.

IEEE Transactions on Signal Processing , 65(9):2317–2332,2017. (Cited on pages 2, 25, and 26.)(Cited on pages 2, 25, and 26.)