Optimal transportation of processes with infinite Kantorovich distance. Independence and symmetry
aa r X i v : . [ m a t h . F A ] N ov OPTIMAL TRANSPORTATION OF PROCESSES WITHINFINITE KANTOROVICH DISTANCE. INDEPENDENCE ANDSYMMETRY.
ALEXANDER V. KOLESNIKOV AND DANILA A. ZAEV
Abstract.
We consider probability measures on R ∞ and study optimal trans-portation mappings for the case of infinite Kantorovich distance. Our exam-ples include 1) quasi-product measures, 2) measures with certain symmetricproperties, in particular, exchangeable and stationary measures. We show inthe latter case that existence problem for optimal transportation is closely re-lated to ergodicity of the target measure. In particular, we prove existence ofthe symmetric optimal transportation for a certain class of stationary Gibbsmeasures. Introduction
Let us consider two Borel probability measures µ, ν on R d . The central re-sult (Brenier theorem) of the finite-dimensional optimal transportation theory es-tablishes under fairy general assumptions existence of the corresponding optimaltransportation mapping T , which can be characterized by the following properties:1) T = ∇ ϕ , where ϕ is a convex function2) ν is the image of µ under T : ν = µ ◦ T − .The mapping T exists, in particular, when both measures are absolutely con-tinuous and have finite second moments. The second assumption can be replacedby the weaker assumption of the finiteness of the corresponding Kantorovich dis-tance W ( µ, ν ) but it is does not make much difference for the finite-dimensionalproblems. However, this difference becomes essential in the infinite-dimensionalcase.It is well-known that the optimal transportation mapping T solves the so-calledMonge problem, meaning that T gives minimum to the functional Z R d k r ( x ) − x k dµ ( x )among of the mappings r : R d R d pushing forward µ onto ν ; here k · k is thestandard Euclidean norm. The corresponding minimal value coincides with thesquared Kantorovich distance W ( µ, ν ) . Key words and phrases.
Monge–Kantorovich problem, optimal transportation, Kantorovichduality, Gaussian measures, Gibbs measures, log-concave measures, exchangeability, stationarity,ergodicity, transportation inequalities, entropy, and Kullback-Leibler distance.The first named author was supported by RFBR project 14-01-00237 and the DFG projectCRC 701. This study (research grant No 14-01-0056) was supported by The National ResearchUniversity-Higher School of Economics’ Academic Fund Program in 2014/2015. The second authorwas partially supported by AG Laboratory NRU-HSE, RF government grant, ag. 11.G34.31.0023. ow let us consider a couple of measures on an infinite-dimensional linear space X ; to avoid unessential technicalities, we will assume everywhere that X = R ∞ .We deal throughout with the standard Hilbert norm k x k := k x k l = ∞ X i =1 x i , which takes infinite value almost everywhere with respect to most of the measureswe are interested in.What is a natural analog of the Brenier theorem in this setting? To understandthe situation better let us consider the Gaussian model. Example . Let γ = Q ∞ i =1 γ i = Q ∞ i =1 1 √ π e − x i dx i be the standard Gaussianproduct measure on R ∞ and H = l be the corresponding Cameron–Martin space.More generally, one can consider any abstract Wiener space.The optimal transportation problem is well-understood for the case of measures µ and ν which are absolutely continuous with respect to γ . The most general resultswere obtained in [12] (another approach has been developed in [15]). In particular,for a broad class of probability measures f · γ absolutely continuous w.r.t. γ thereexists a transportation mapping T ( x ) = x + ∇ ϕ ( x ) minimizing the cost Z k T ( x ) − x k l dγ and pushing forward γ onto f · γ . Analogously, there exists a transportation mappingpushing forward f · γ onto γ . The gradient operator ∇ is understood with respectto h· , ·i l -scalar product.It is known (this follows from the so-called Talagrand transportation inequality)that under assumption R f log f dγ < ∞ the Kantorovich distance between γ and f · γ is finite W ( γ, f · γ ) = Z k T ( x ) − x k l dγ < ∞ . In particular, ∇ ϕ ( x ) ∈ l for γ -almost all x . More on optimal transportation on theWiener space, the corresponding Monge–Amp´ere equation, regularity issues, andtransportation on other infinite-dimensional spaces see in [5], [6], [8], [11], and [10].It this paper we study situation when the Kantorovich distance between measuresis a priori infinite . This makes impossible in general to understand T as a solutionto a certain minimization problem. Nevertheless, we have many good candidates tobe called ”optimal transportation” in many particular cases. The following examplemotivates our study. Example .
1) Let µ = Q ∞ i =1 µ i ( dx i ), ν = Q ∞ i =1 ν i ( dx i ) be product probabilitymeasures. Assume that all µ i have densities. Then there exists a mass transporta-tion mapping T pushing forward µ onto ν which has the form T ( x ) = ( T ( x ) , · · · , T i ( x i ) , · · · ) , where T i ( x i ) is the one-dimensional optimal transportation pushing forward µ i onto ν i .2) Let us consider the Gaussian measure µ which is a push-forward image ofthe standard Gaussian measure γ under a linear mapping T ( x ) = Ax with A symmetric and positive. It is well-known (and can be obtained from the law of large umbers) that γ and µ are mutually singular even in the simplest case A = 2 · Id. T is ”optimal” because it is linear and given by a positive symmetric operator.Heuristically, T ( x ) = 12 ∇h Ax, x i . It is clear that in both cases T cannot be obtained as a minimizer of a functionalof the type R k T ( x ) − x k l dµ .We state now the central problem of this paper. Problem 1.3.
Let µ and ν be two probability measures on R ∞ . When does exista transportation mapping T pushing forward µ onto ν which is ”optimal” for thecost function c ( x, y ) = k x − y k l ?In this paper we deal with two model situations. Quasi-product measures.
We assume that both measures have densities with respect to product probabilitymeasures µ = f · µ , ν = g · ν ,µ = ∞ Y i =1 µ i ( dx i ) , ν = ∞ Y i =1 ν i ( dx i ) . Then the corresponding ”optimal tranportation” is a small perturbation of thediagonal mapping, considered in Example 1.2.
Symmetric measures.
It is possible to give a meaning to the Monge–Kantorovich optimization problemif we restrict ourselves to a certain class of symmetric measures. In this paper weconsider two types of symmetry: exchangeable measures (invariant with respect tofinite permutations of coordinates) and stationary measures on R ∞ (invariant withrespect to shifts of coordinates). Note that k x − y k l is symmetric with respect toboth types of symmetry. More generally, let G be a group of linear operators whichacts on X = Y = R ∞ and X × Y : x → gx , ( x, y ) → ( gx, gy ), g ∈ G and preservesthe cost function c ( x, y ). We assume that every basic vector e j can be obtained fromany other e i by action of this group: there exists g ∈ G such that e i = ge j . Notethat under these assumptions all the coordinates are identically distributed. Thisleads us to the following definition: given G -invariant marginals µ and ν we call π an optimal (symmetric, invariant) solution to the Monge–Kantorovich problem if π solves the Monge–Kantorovich problem Z ( x − y ) dπ → minamong all of the measures which are invariant with respect to G . If there exists amapping T such that its graph Γ = { x, T ( x ) } satisfies m (Γ) = 1, we say that T isan optimal transportation mapping pushing forward µ onto ν .The following counter-example, however, demonstrates that the optimal trans-portation may fail to exist by a quite simple reason. Example . Let µ = γ be the standard Gaussian measure on R ∞ and ν = 12 ( γ + γ ) e the average of γ and its homothetic image γ = γ ◦ S − , where S ( x ) = 2 x .There is no any mass transportation T of µ to ν which commutes with any cylin-drical rotation. Indeed, any mapping of such a type must have the form T ( x ) = g ( x )( x , x , · · · ) = g ( x ) · x , where g is invariant with respect to any ”rotation”, inparticular, with respect to any coordinate permutation. But any function g of thistype is constant γ -a.e. This is a corollary of the Hewitt–Savage 0 − µ is called ergodic with respect to a group action G , if for every G -invariant set A one has either µ ( A ) = 1 or µ ( A ) = 0. It follows directly from the definition that there does not exists a bijective mass transportation T pushing forwarg µ onto ν ,such that T ◦ g = g ◦ T for every g ∈ G , provided µ is G -ergodic but ν is not .This observation leads to the following problem. Problem.
Let G be a group of linear operators acting on R ∞ and preserving l -distance (model example: group of shifts). Let µ, ν be ergodic G -invariantmeasures. When does exist a transportation T : R ∞ R ∞ pushing forward µ onto µ , which commutes with G and gives minimum to the Monge functional T R R ∞ ( T ( x ) − x ) dµ ?Trivially, the ergodicity by itself is not sufficient for the affirmative answer tothis problem. In addition to it, we need to have certain infinite-dimesional analogsof ”absolute continuity” for the source measure µ .We believe that the symmetric transportation problem must have deep and veryinteresting relation with the ergodic theory. The second named author studied theinterplay between ergodic decompositions and transportation theory in [26]. An-other interesting connection has been established in [3]. It was shown that theBirkhoff ergodic theorem implies equivalence between optimality and the so-calledcyclical monotonicity property. The related problems on optimal transportation insymmetric settings have been considered in [22] (stationary processes), in [23] (sym-metric measures on graphs), and in [19], [20], [9] (ergodic theory). Transportationproblems with symmetries have been studied in [13], [21]. Further developmentof the duality theory for transportation problem with linear restriction has beenobtained in [25].The paper is organized as follows: in Section 2 we give preliminaries in trans-portation theory, ergodic theory, and recall some important results on log-concavemeasures. In Section 3 we establish sufficient conditions for existence of optimaltransportation mappings which are obtained as a.e.-limits of finite-dimensional ap-proximations. The applications of this result are obtained in Section 4. Here weprove existence of optimal transportation for a couple of measures having densitieswith respect to product measures. In Section 5 we discuss the invariant optimaltransportation problem, consider examples and prove some basic facts. In Section6 we briefly discuss Kantorovich duality for problem which is invariant with respectto the action of a group. In Section 7 we construct a non-trivial example of asymmetric optimal transportation T . Namely, we establish sufficient conditions forexistence of T pushing forward a stationary measure into the standard Gaussianmeasure. Finally, we apply this result to a certain class of Gibbs measures. . Preliminaries
Optimal transportation problem.Kantorovich problem.
Given two probability measures µ and ν on the spaces X and Y respectively, and a cost function c : X × Y R ∪ { + ∞} we are lookingfor the minimum of the functional W ( µ, ν ) = inf nZ k x − y k dm : m ∈ P ( µ, ν ) o , on the space P ( µ, ν ) of probability measures with fixed projections: P r X m = µ, P r Y m = ν .In the classical setup X = Y = R n , c = | x − y | the solution m is supported onthe graph of a mapping T : R n R n : m (Γ) = 1 , where Γ = { ( x, T ( x )) , x ∈ R d } . (see [1], [7], [24].). The functional W ( µ, ν ) is a distance in the space of probabilitymeasures. In what follows we call it the Kantorovich distance. The mapping T iscalled optimal transportation of µ onto ν .Another well-known fact which will be used throughout the paper is the followingrelation called the Kantorovich duality: W ( µ, ν ) = − J ( ϕ, ψ ) , where J ( ϕ, ψ ) = inf ϕ,ψ nZ (cid:16) ϕ ( x ) − x (cid:17) dµ + Z (cid:16) ψ ( y ) − y (cid:17) dν, ϕ ( x ) + ψ ( y ) ≥ h x, y i o , where the infimum is taken over couples of integrable Borel functions ϕ ( x ) , ψ ( y ).The function ϕ in the dual problem coincides with the potential generating thetransportation mapping T = ∇ ϕ. Ergodic decomposition.
Given a Borel transformation S : X X of thespace X we call a Borel probability measure µ ergodic if any S -invariant measurableset A has the property µ ( A ) = 1 or µ ( A ) = 0. A similar tegrminology is used ifinstead of a single mapping S we deal with a family G of transformations.The ergodic G -invariant measures are extreme points of the set of all G -invariantmeasures, hence any G -invariant measure can be represented as the average of G -invariant ergodic measures. The famous de Finetti theorem establishes decomposi-tion of this type for a class of exchangeable measures, i.e. measures, invariant withrespect to a permutation of a finite number of coordinates. Theorem 2.1.
Let P be the space of Borel probability measures on R equipped withthe weak topology. Then for every Borel exchangeable µ on R ∞ there exists a Borelprobability measure Π on P such that µ ( B ) = Z m ∞ ( B )Π( dm ) , for every Borel B ⊂ R ∞ . Yet another example of the ergodic decomposition where a precise description ispossible is given by rotationally invariant measures (see Example 5.9). .3. Log-concave measures and functional inequalities.
We recall that aprobability measure µ on R n is called log-concave if it has the form e − V · H k | L ,where H k is the k -dimensional Hausdorff measure, k ∈ { , , · · · , n } , L is an affinesubspace, and V is a convex function.In what follows we consider uniformly log-concave measures. Roughly speaking,these are the measures with potential V satisfying V ( x ) − V ( y ) − h∇ V ( y ) , x − y i ≥ K | x − y | , which is equivalent to D V ≥ K · Id in the smooth (finite-dimensional) case. Here K is a positive constant.More precisely, we say that a probability measure µ is K -uniformly log-concave( K >
0) if for any ε > µ = Z e K − ε | x | · µ is log-concave for a suitablerenormalization factor Z . It is well-known (C. Borell) that the projections of log-concave measures are log-concave (this is in fact a corollary of the Brunn-Minkowskitheorem). It can be easily checked that the uniform log-concavity is preserved byprojections as well. We can extend this notion to the infinite-dimensional case.Namely, we call a probability measure µ on a locally convex space X log-concave( K -uniformly log-concave with K >
0) if its images µ ◦ l − , l ∈ X ∗ under linearcontinuous functionals are all log-concave ( K -uniformly log-concave with K >
Theorem 2.2. ( Generalized Talagrand inequality. ) Let m be a K -uniformlylog-concave probability measure with some K > . Then for any couple of probabilitymeasures µ = e − V dx , ν = e − W dx and the corresponding optimal mappings ∇ ϕ µ , ∇ ϕ ν , pushing forward µ , ν onto m respectively, one has the following estimate Ent ν (cid:16) µν (cid:17) = Z log dµdν dµ = Z ( W − V ) dµ ≥ K Z (cid:0) ∇ ϕ µ − ∇ ϕ ν (cid:1) dµ. Another result used in the paper is the Cafarelli’s contraction theorem. Here isthe version from [16] (see also [17]).
Theorem 2.3. (Caffarelli contraction theorem).
Let ∇ Φ be the optimal trans-portation of the probability measure µ = e − V dx into ν = e − W dx . Assume that forsome positive c, C one has D V ≤ C · Id , D W ≥ c · Id . Then ∇ Φ is Lipschitz with k∇ Φ k Lip ≤ q Cc . The quantity Ent ν (cid:16) µν (cid:17) is called the relative entropy or the Kullback-Leiblerdistance between µ and ν .3. Sufficient condition for existence of limits of finite-dimensionaloptimal mappings
Preliminary finite-dimensional estimates.
Let µ and ν be probabilitymeasures on R d and T ( x ) = ∇ ϕ ( x ) be the optimal transportation mapping pushingforward µ onto ν . Let us denote by µ v the images of µ under the shifts x x + v , v ∈ R d .It will be assumed throughout that µ v have densities with respect to µ : dµ v dµ = e β v . emma 3.1. For every p, q ≥ with p + q = 1 , ε ≥ , and e ∈ R d Z | ϕ ( x + te ) − ϕ ( x ) | ε dµ ≤ t ε k |h x, e i| ε k L p ( ν ) · sup ≤ s ≤ t k e β se k L q ( µ ) . Z (cid:0) ϕ ( x + te ) − ϕ ( x ) − t∂ e ϕ ( x ) (cid:1) dµ ≤ t kh x, e ik L p ( ν ) · sup ≤ s ≤ t k e β se − k L q ( µ ) . Proof.
One has ϕ ( x + te ) − ϕ ( x ) = R t ∂ e ϕ ( x + se ) ds. Hence Z | ϕ ( x + te ) − ϕ ( x ) | ε dµ ≤ t ε Z Z t | ∂ e ϕ | ε ( x + se ) ds dµ = t ε Z t hZ | ∂ e ϕ | ε e β se dµ i ds ≤ t ε k| ∂ e ϕ | ε k L p ( µ ) · sup ≤ s ≤ t k e β se k L q ( µ ) = t ε k |h x, e i| ε k L p ( ν ) · sup ≤ s ≤ t k e β se k L q ( µ ) . Applying the same arguments one gets
Z (cid:0) ϕ ( x + te ) − ϕ ( x ) − t∂ e ϕ ( x ) (cid:1) dµ = Z Z t ( ∂ e ϕ ( x + se ) − ∂ e ϕ ( x )) ds dµ = Z hZ t ( e β se − ds i ∂ e ϕ ( x ) dµ ≤ t p k ∂ e ϕ k L p ( µ ) hZ Z t | e β se − | q ds dµ i q . The desired estimate follows from the the change of variables formula and trivialuniform bounds. (cid:3)
In addition, we will apply the following elementary Lemma.
Lemma 3.2.
Assume that a sequence { T n } of measurable mappings T n : R ∞ → R ∞ converges to a mapping T in the following sense: for every e i lim n h T n , e i i = h T, e i i in measure with respect to µ . Then the measures { µ ◦ T − n } converge weakly to µ ◦ T − . Existence theorem.
We consider a couple of Borel probability measures µ and ν on R ∞ , where R ∞ is the space of all real sequences: R ∞ = Q ∞ i =1 R i . We dealwith the standard coordinate system x = ( x , x , · · · , x n , · · · ) and the standardbasis vectors e i = ( δ ij ). The projection on the first n coordinates will be denotedby P n : P n ( x ) = ( x , · · · , x n ). We use notations k x k , h x, y i for the Hilbert spacenorm and inner product: k x k = P ∞ i =1 x i , h x, y i = P ∞ i =1 x i y i . We use notationIE nµ for the conditional expectation with respect to µ and the σ -algebra generatedby x , · · · , x n . For any product measure P = Q ∞ i =1 p i ( x i ) dx i its projection P n = P ◦ P − n has the form Q ni =1 p i ( x i ) dx i and the projection ( f · P ) ◦ P − n = f n · P n of themeasure f · P satisfies f n = IE nP f . Everywhere below we agree that every cylindricalfunction f = f ( x , · · · , x n ) can be extended to R ∞ by the formula x → f n ( P n x ).It will be assumed throughout the paper that the shifts of µ along any vector v = te i are absolutely continuous with respect to µ : dµ v dµ = e β v . In Section 3, moreover, the following assumption holds. ssumption (A). For every basic vector e = e i there exist p ≥ q ≥ p + q = 1, and ε > Z |h x, e i| (1+ ε ) p dν < ∞ and p ( t ) = sup ≤ s ≤ t Z | e β se − | q dµ satisfies lim t → p ( t ) = 0.Let µ n = µ ◦ P − n ( x ), ν n = ν ◦ P − n ( y ) be the projections of µ , ν . For every v = te i let us set d ( µ n ) v dµ n = e β ( n ) v . It is easy to check that the projections of µ, ν satisfy Assumption (A).
Lemma 3.3.
For every n ∈ N and every e = e i one has Z |h P n ( x ) , e i| p dν n ≤ Z |h x, e i| p dν, Z | e β ( n ) e − | q dµ n ≤ Z | e β e − | q dµ. Proof.
The first estimate is trivial. To prove the second one, let us note that e β ( n ) v = IE nµ e β v . The claim follows from the Jensen inequality and convexity of thefunction t → | t − | q . (cid:3) We denote by π n the optimal transportation plan for the couple ( µ n , ν n ). Let ϕ n ( x ) and ψ n ( y ) solve the dual Kantorovich problem. Let us recall that ∇ ϕ n ( ∇ ψ n )is the optimal transportation mapping sending µ n to ν n ( ν n to µ n ). One has ϕ n ( x ) + ψ n ( y ) ≥ h P n x, P n y i for every x, y . The equality is attained on the support of π n . In particular, ϕ n ( x ) + ψ n ( ∇ ϕ n ( x )) = h P n x, ∇ ϕ n ( x ) i . It is easy to check that { π n } is a tight sequence. By the Prokhorov theorem onecan extract a weakly convergent subsequence π n k → π . Note that π n is not theprojection of π .The main result if the section is the following theorem. Theorem 3.4.
Assume that (A) is fulfilled and, in addition, F n ( x, y, ,
0) = ϕ n ( x ) + ψ n ( y ) − h P n x, P n y i → in measure with respect to π . Then there exists a mapping T : R ∞ R ∞ such that T ( x ) = y for π -almost all ( x, y ) . In what follows we will pass several time to subsequences and use for the newsubsequences the same index n again, with the agreement that n takes values inanother infinite set N ′ ⊂ N . Let us fix unit vectors e i , e j for some i, j ∈ N andconsider the following sequence of non-negative functions: F n ( x, y, t, s ) = ϕ n ( x + te i ) + ψ n ( y + se j ) − h P n ( x + te i ) , P n ( y + se j ) i with n > i, n > j . emma 3.5. There exists a L ε ( π ) -weakly convergent subsequence ϕ n k ( x + te i ) − ϕ n k ( x ) → U ( x ) . The following relation holds for the limiting function U ( x ) : (cid:12)(cid:12)(cid:12)Z U ( x ) dµ − t Z h y, e i i dν (cid:12)(cid:12)(cid:12) ≤ Ctp ( t ) . Proof.
Taking into account that R F n ( x, y, , dπ n = 0, one obtains Z F n ( x, y, t, dπ n = Z F n ( x, y, t, dπ n − Z F n ( x, y, , dπ n ≥ . Note that the right-hand side equals Z ( F n ( x, y, t, − F n ( x, y, , dπ n = Z (cid:2) ϕ n ( x + te i ) − ϕ n ( x ) − t h y, e i i (cid:3) dπ n . Taking into account that the projection of π n onto X coincides with µ n and ϕ n depends on the first n coordinates, one finally obtains that for n > i the latter isequal to Z (cid:2) ϕ n ( x + te i ) − ϕ n ( x ) (cid:3) dµ − t Z h y, e i i dν = Z (cid:2) ϕ n ( x + te i ) − ϕ n ( x ) − t∂ e i ϕ n ( x ) (cid:3) dµ. It follows from Lemma 3.1, Lemma 3.3 and Assumption (A) that(1) (cid:12)(cid:12)(cid:12)Z F n ( x, y, t, dπ n (cid:12)(cid:12)(cid:12) ≤ Ctp ( t ) . Since ϕ n depends on a finite number of coordinates ( ≤ n ), one has Z | ϕ n ( x + te i ) − ϕ n ( x ) | ε dµ = Z | ϕ n ( x + te i ) − ϕ n ( x ) | ε dµ n . Hence by Lemma 3.1 U n ( x ) = ϕ n ( x + te i ) − ϕ n ( x ) ∈ L ε ( µ )and, moreover, sup n k U n k L ε ( µ ) < ∞ . Thus there exists function U ∈ L ε ( µ )such that for some subsequence n k ϕ n k ( x + te i ) − ϕ n k ( x ) → U ( x )weakly in L ε ( µ ). Passing to the limit we obtain from (1) that (cid:12)(cid:12)(cid:12)Z U ( x ) dµ − t Z h y, e i i dν (cid:12)(cid:12)(cid:12) ≤ Ctp ( t ) . (cid:3) Lemma 3.6.
Assume that F n ( x, y, , → in measure with respect to π . Then U ( x ) − t h y, e i i ≥ for π -almost all ( x, y ) .Proof. Note that (cid:2) ϕ n ( x + te i ) − ϕ n ( x ) − t h y, e i i (cid:3) + F n ( x, y, ,
0) = ϕ n ( x + te i )+ ψ n ( y ) −h P n y, P n ( x + te i ) i is a non-negative function for every n . Since F n ( x, y, , → F n ) which converges to zero π -almost ev-erywhere. Since f n = ϕ n ( x + te i ) − ϕ n ( x ) − t h y, e i i converges to f = U ( x ) − h y, e i i weakly in L ε ( π ), one can assume (passing again to a subsequence) that N P Nn =1 f n → f π -a.e. Since f n + F n ≥
0, this implies that f ≥ π -a.e. (cid:3) Proposition 3.7.
Assume that there exists a sequence of continuous functions f n ( x , · · · , x n ) , g n ( y , · · · , y n ) ∈ L ( π n ) such that G n = f n ( x ) + g n ( y ) − P ni =1 x i y i has the following properties: G n ≥ , G n ≤ G m , ∀ n ≤ m, x, y ∈ R m ,
3) sup n R G n dπ n < ∞ . Then F n ( x, y, , → in L ( π ) .Proof. We start with the identity R F n ( x, y, , dπ n = 0 and rewrite it in thefollowing way:(2) 0 = Z ( ϕ n − f n ) dµ + Z ( ψ n − g n ) dν + Z (cid:0) f n ( x ) + g n ( y ) − n X i =1 x i y i (cid:1) dπ n . Since ϕ n , ψ n are defined up to a constant, one can assume that R ( ψ n − g n ) dν = 0.Thus − R ( ϕ n − f n ) dµ = R (cid:0) f n ( x ) + g n ( y ) − P ni =1 x i y i (cid:1) dπ n . It follows from 1)and 3) that the right-hand side is a bounded sequence of non-negative numbers.Passing to a subsequence we may assume that the right-hand side has a limit. Itfollows from the weak convergence π n → π and the monotonicity property 2) thatfor every k lim n Z (cid:0) f n ( x ) + g n ( y ) − n X i =1 x i y i (cid:1) dπ n ≥ lim n Z (cid:0) f k ( x ) + g k ( y ) − k X i =1 x i y i (cid:1) dπ n = Z (cid:0) f k ( x ) + g k ( y ) − k X i =1 x i y i (cid:1) dπ. Hencelim n Z (cid:0) f n ( x ) + g n ( y ) − n X i =1 x i y i (cid:1) dπ n ≥ lim k Z (cid:0) f k ( x ) + g k ( y ) − k X i =1 x i y i (cid:1) dπ, where the limit in the right-hand side exists, because the sequence is monotone.Hence we get from (2)0 ≥ lim n Z ( ϕ n − f n ) dµ + lim n Z (cid:0) f n ( x ) + g n ( y ) − n X i =1 x i y i (cid:1) dπ. Taking into account that R g n dπ = R g n dν = R ψ n dν = R ψ n dπ , we obtain0 ≥ lim n Z ( ϕ n − f n )( x ) dµ + lim n Z (cid:0) f n ( x ) + g n ( y ) − n X i =1 x i y i (cid:1) dπ = lim n (cid:16)Z ( ϕ n ( x ) + ψ n ( y ) − n X i =1 x i y i ) dπ (cid:17) ≥ . The proof is complete. (cid:3)
Finally, we obtain a sufficient condition for the existence of an optimal mappingin the infinite-dimensional case. roof. ( Theorem 3.4)
Let us fix e i and choose a sequence of numbers t n →
0. Weget from Lemma 3.5 and Lemma 3.6 that there exist π -a.e. nonnegative functions U t n ( x ) − t n h y, e i i with R (cid:0) U t n ( x ) − t n h y, e i i (cid:1) dπ = o ( t n ). Hence, lim t n → R (cid:0) U tn ( x ) t n −h y, e i i (cid:1) dπ = 0. Taking into account that U tn ( x ) t n − h y, e i i ≥ π -almost all ( x, y ),we conclude that U tn ( x ) t n converges µ -a.e. and in L ( µ ) to a function u i ( x ) satisfying u i ( x ) − h y, e i i ≥ π -a.e. and R ( u i ( x ) − h y, e i i ) dπ = 0. Clearly, u ( x ) = h y, e i i for π -almost all ( x, y ). Repeating these arguments for every i ∈ N , we get theclaim. (cid:3) Application: quasi-product case
The main result of this section is a generalization of the optimal transport ex-istence theorem for Gaussian measures. Recall that by results from [12], [15] thatfor the standard Gaussian measure γ = Q ∞ i =1 γ i ( dx i ), γ i ∼ N (0 ,
1) the existence ofthe optimal transportation mapping pushing forward f · γ into g · γ is established,for instance, under assumption R f log f dγ < ∞ , R g log g dγ < ∞ . We give in thissection a generalization of this result for a wide class of quasi-product measures.Let us consider two product reference measures P = ∞ Y i =1 p i ( x i ) dx i , Q = ∞ Y i =1 q i ( x i ) dx i and fix the diagonal infinite transportation mapping T ( x ) = ( T ( x ) , · · · , T n ( x n ) , · · · )where T i ( x i ) pushes forward p i ( x i ) dx i onto q i ( x i ) dx i . Clearly, T takes P onto Q .The inverse mapping S = T − has the same diagonal structure: S ( x ) = ( S ( x ) , · · · , S n ( x n ) , · · · ) . Theorem 4.1.
Let µ = f · P and ν = g · Q be probability measures satisfying theAssumption (A) of the previous section. Assume, in addition, that there exists K > such that every q i is K -uniformly log-concave; there exists M > such that S ′ i ( x i ) ≤ M ; for all i, x i ; Assume that either a) or b) holds for some constants
C > c > g log g ∈ L ( Q ) , f ∈ L ( P ) , f ≤ C , b) f log f ∈ L ( P ) , c ≤ g ≤ C .Then there exists a transportation mapping T pushing forward µ onto ν which is µ -a.e. limit of finite-dimensional optimal transportation mappings T n .Remark . It follows from Caffarelli’s contraction theorem (see Section 2) thatassumption 2) is satisfied if ( − log p i ( x i )) ′′ ≥ C , ( − log q i ( x i )) ′′ ≤ C for some C , C > i . Of course, there exist many other examples when thisassumption is satisfied. Proof.
Consider the finite-dimensional projections µ n = f n · P n , ν n = g n · Q n , where P n = Q ni =1 p i ( x i ) dx i , Q n = Q ni =1 q i ( x i ) dx i . Here f n and g n are the conditional xpectations of f, g with respect to P, Q and the σ -algebra F n , generated by thefirst n coordinates. Recall that ∇ ϕ n is the optimal transportation of µ n to ν n . Let u i ( x i ) , v i ( y i ) = u ∗ i be the one-dimensional convex potentials associated to the mappings T i , S i , respec-tively: T i = u ′ i , S i = v ′ i . Note that ˜ T n = ( T , · · · , T n ) pushes forward P n onto Q n and ∇ ϕ n pushes forward f n g n ( ∇ ϕ n ) · P n onto Q n .According to Proposition 2.2 one has the following estimate:(3) K Z | ˜ T n − ∇ ϕ n | dP n ≤ Z log (cid:16) g n ( ∇ ϕ n ) f n (cid:17) dP n . To see that the right-hand side is finite, let us estimate Z log (cid:16) g n ( ∇ ϕ n ) f n (cid:17) dP n ≤ Z log 1 f n dP n + 12 Z log g n ( ∇ ϕ n ) f n dP n + 12 Z dP n f n = Z log 1 f n dP n + 12 Z g n log g n dQ n + 12 Z dP n f n . Applying Assumption 3a of the Theorem and the Jensen inequality one can easilyget that the right-hand side is uniformly bounded.We complete the proof by applying Theorem 3.4 and Proposition 3.7. For appli-cation of Proposition 3.7 set f n = n X i =1 u i ( x i ) , g n = n X i =1 v i ( y i ) . We need to estimate P ni =1 R ( u i ( x i ) + v i ( y i ) − x i y i ) dπ n . Taking into account that π n is supported on the graph of ∇ ϕ n , and the relation u i ( x i ) + v i ( T i ( x )) = x i T i ( x )we obtain that the latter equals to Z (cid:0) u i ( x i ) + v i ( ∂ x i ϕ n ) − x i ∂ x i ϕ n ( x ) (cid:1) dµ n = Z h v i ( ∂ x i ϕ n ( x )) − v i ( T i ( x )) − x i ( ∂ x i ϕ n ( x ) − T i ( x )) i dµ n = Z h v i ( ∂ x i ϕ n ( x )) − v i ( T i ( x )) − v ′ i ( T i ( x ))( ∂ x i ϕ n ( x ) − T i ( x )) i dµ n ≤ M Z ( ∂ x i ϕ n ( x ) − T i ) dµ n . Here we use the uniform bound v ′′ i = S ′ i ≤ M . Finally, using the uniform bound f ≤ C and the Jensen inequality we obtain that n X i =1 Z ( u i ( x i ) + v i ( y i ) − x i y i ) dπ n ≤ M C Z |∇ ϕ n − ˜ T n | dP n . We have already shown that the right-hand side is bounded. The result now followsfrom Proposition 3.7.The proof follows the same line under Assumption 3b, but we use another corol-lary of Proposition 2.2: K Z | ˜ T n − ∇ ϕ n | f n g n ( ∇ ϕ n ) dP n ≤ Z log (cid:16) f n g n ( ∇ ϕ n ) (cid:17) f n g n ( ∇ ϕ n ) dP n . he detailes are left to the reader. (cid:3) Symmetric transportation problem and ergodic decomposition ofoptimal transportation plans
Symmetric transportation problem.
In this section we discuss the masstransportation of symmetric (mainly exchangeable) measures, where the word ”sym-metric” means ”invariant under action of a group Γ”.Recall that a probability measure is exchangeable if it is invariant with respectto any permutation of finite number of coordinates. Before we consider R ∞ , let usmake some remarks on the finite-dimensional case.Consider the group S d of all permutations of { , · · · , d } acting on R d as follows: L σ ( x ) = ( x σ (1) , x σ (2) , · · · , x σ ( d ) ) , σ ∈ S d . Let Γ ⊂ S d be any subgroup with the property that for every couple i, j there exists σ ∈ Γ such that σ ( i ) = j .Assume that the source and target measures are both invariant with respectto Γ. Under additional assumption that the cost function c is Γ-invariant (forinstance, c = | x − y | ) one can easily check that the Kantorivich potential ϕ isΓ-invariant as well: ϕ = ϕ ◦ L σ for any σ ∈ Γ see [21], [25]. Consequently, theoptimal transportation T = ∇ ϕ has the following commutation property: T = L ∗ σ ( T ◦ L σ ) = L − σ ◦ T ◦ L σ . Equivalently, L σ ◦ T = T ◦ L σ . The optimal transportation plan π ( dx, dy ) is also Γ-invariant under the followingextension of the action of Γ to R d × R d : L σ ( x, y ) = ( L σ x, L σ y ) . Now let σ ( i ) = j . One has Z x i y i dπ = Z h e i , x ih e i , y i dπ = Z h L σ e i , L σ x ih L σ e i , L σ y i dπ = Z h e j , L σ x ih e j , L σ y i dπ = Z x j y j dπ. Consequently,(4) W ( µ, ν ) = Z k x − y k dπ = d X i =1 Z ( x i − y i ) dπ = d Z ( x i − y i ) dπ, ∀ i. Lemma 5.1.
The standard quadratic Kantorovich problem on R d with Γ -invariantmarginals is equivalent to the transportation problem for the cost | x − y | withadditional constraint that the solution is a Γ -invariant probability measureProof. Let π be the solution to the quadratic Kantorovich problem for the marginals µ, ν and ˜ π be a measure giving the minimum to the functional m R | x − y | dm among of the Γ-invariant measures with the same marginals. By optimality of π Z | x − y | dπ ≤ Z | x − y | d ˜ π. Since π and ˜ π are both Γ-invariant, (4) implies that R | x − y | dπ ≤ R | x − y | d ˜ π. By optimality of ˜ π one gets R | x − y | dπ = R | x − y | d ˜ π , and, finally R | x − y | dπ = | x − y | d ˜ π . This means that ˜ π solves the quadratic Kantorovich problem as welland, vice versa, π solves the Kantorovich problem with symmetric constraints. (cid:3) The conclusion made above helps us to give a variational meaning to the trans-portation problem in the infinite-dimensional case.
Definition 5.2. Symmetric Kantorovich problem.
Let Γ be a group of linearoperators acting on R ∞ and µ, ν be Γ-invariant probability measures. Assume inaddition that • For every i, j ∈ N there exists g ∈ Γ such that g ( e i ) = e j . • The space of probability measures Π Γ ( µ, ν ) on R ∞ × R ∞ which are invariantwith respect to the action ( x, y ) ( g ( x ) , g ( y )), g ∈ Γ of Γ and havemarginals µ, ν , is non-empty and closed in the weak topology.We say that a measure π ∈ Π Γ ( µ, ν ) is a solution to the Γ-symmetric (quadratic)Kantorovich problem if it gives the minimum to the functional(5) Π Γ ( µ, ν ) ∋ m Z ( x − y ) dm. Definition 5.3. Symmetric optimal transportation.
Let m be a solution tothe symmetric Kantorovich problem. A measurable mapping T : R ∞ R ∞ iscalled optimal transportation mapping of µ onto ν if m ( { ( x, T ( x )) } ) = 1 . The standard compactness arguments imply that a solution to the Kantorovichproblem (5) exists provided R x dµ < ∞ , R y dν < ∞ . If, in addition, there existsan optimal transportation mapping T , it commutes with any g ∈ Γ. This meansthat for µ -almost all x and every g ∈ Γ(6) T ◦ g ( x ) = g ◦ T ( x ) . Example . Exchangeable measures.
We denote by S ∞ the group of permuta-tion of N which change only a finite number of coordinates. We consider its naturalaction on R ∞ defined by σ ( x ) = ( x σ ( i ) ) , x = ( x i ) ∈ R ∞ , σ ∈ S ∞ . Consider measures µ and ν which are invariant with respect to any σ ∈ S ∞ : µ = µ ◦ σ − , ν = ν ◦ σ − . The measures of this type are called exchangeable. The basic example is givenby the countable power m ∞ of some Borel measure m on R . The structure ofmappings satisfying (6) in the case µ = m ∞ is very easy to describe. Considerthe function T ( x ) = h T ( x ) , e i and fix the first coordinate x . Then the function F : ( x , x , · · · ) → T ( x ) is invariant with respect to S ∞ (acting on ( x , x , · · · )).Hence F is constant according by the Hewitt–Sawage 0 − µ . Thus T ( x ) = T ( x ) depends on x only (up to a set of measurezero). The same arguments applied to other coordinates imply that T is diagonal:( T ( x ) , T ( x ) , · · · ). Moreover, T i ( x ) = T ( x ) because T commutes with everypermutation of coordinates. xample . Optimal transportation not always exists.
Let µ , µ be count-able powers of two different one-dimensional measures. By the Kakutani dichotomytheorem they are mutually singular. There is no any mass transportation T of µ = µ onto ν = ( µ + µ ) satisfying (6). Indeed, according to Example 5.4 any T satisfying (6) must be diagonal, hence the measure µ ◦ T − must be a productmeasure.Thus, we see that the optimal transportation does not always exist. This examplecan be easily generalized to many other linear groups Γ and Γ-invariant measures.It can be easily understood that T does not exists provided the source measure isergodic, but the target measure is not.5.2. Ergodic decomposition of optimal transportation plans.
The connec-tion between Kantorovich problem and ergodic decomposition has been establishedunder fairy general assumptions by the second-named author in [26]. A particularcase of this result is given in the following theorem.Let Γ be an amenable group acting by continuous one-to-one mappings on aPolish space X . Let Π Γ be the set of all Borel probability Γ-invariant measures and µ, ν ∈ Π Γ . The set of Γ-invariant transportation plans with marginals µ, ν will bedenoted by Π Γ ( µ, ν ). Assume that the cost function c is lower semicontinuous andΠ Γ ( µ, ν ) is non-empty and closed in the weak topology.Let us fix a solution π to the Γ-invariant Kantorovich problem with marginals µ, ν . Denote by ∆( X ) the set all Γ-invariant ergodic measures on X . Assume weare given ergodic decompositions(7) µ = Z ∆( X ) µ x dσ µ , ν = Z ∆( Y ) ν y dσ ν of µ, ν , where X = Y , σ µ , σ ν are probability measures on ∆( X ) , ∆( Y ) and, simi-larly, the ergodic decomposition of π :(8) π = Z ∆( X × Y ) π x,y dδ (recall that the Γ-invariance for π means the invariance with respect to the action( x, y ) ( g ( x ) , g ( y ))). It is straightforward that δ -almost all π x,y have ergodicmarginals and taking the projections of the both sides of (8) we obtain decomposi-tions (7). Moreover, the following statement holds: Theorem 5.6.
For δ -almost all ( x, y ) measure π x,y solves the Γ -symmetric Kan-torovich problem with marginals µ x , ν y : K Γ c ( µ x , ν y ) = inf m ∈ Π Γ ( µ x ,ν y ) Z cdm = Z cdπ x,y and the following representation formula holds: inf π ∈ Π Γ ( µ,ν ) Z cdπ = inf δ ∈ Π( σ µ ,σ ν ) Z K Γ c ( µ x , ν y ) dδ. Remark . In the situation of Theorem 5.6 one can decompose the optimal trans-portation plan for ergodic marginals µ, ν : π = R ∆( X × Y ) π x,y dδ . Ergodicity of themarginals implies immediately that δ -almost all π x,y have the same marginals µ and ν . The optimality of π x,y for the cost c follows from Theorem 5.6. Thus we getthat any solvable symmetric Kantorovich problem with ergodic marginals admits,in particular, an ergodic solution. hus the symmetric transportation problem can be rediced to the following steps:Q1) Construct a solution to the symmetric Kantorovich problem for ergodicmeasures.Q2) Given two non-ergodic measures µ, ν and the corresponding ergodic decom-positions (7) construct a solution to the Kantorovich problem to measures σ µ , σ ν on ∆( X ) with the cost function K Γ c .Consider application of Theorem 5.6 to several classical groups. Example . Exchangeable measures revisited.
Consider invariant trans-portation problem for exchangeable measures and c = ( x − y ) . The answerto Q1) is trivial, because ergodic measures are countable powers and the structureof the corresponding solution is trivial. As for Q2), by the de Finetti theorem thespace of ergodic measures is isomorphic to the space P ( R ) of probability measureson R . Thus to resolve an optimal transportation problem for exchangeable mea-sures, we need to study the optimal transportation problem for a couple of measures µ , ν on P ( R ) arising from the de Finetti decomposition. It is clear that the costfunction c on P ( R ) satisfies c ( p , p ) = W ( p , p ) , where W is the standard Kantorovich distance on R . Example . Rotationally invariant measures.
Consider invariant transporta-tion problem for measures invariant with respect to operators of the type U × Id ,where U is a rotation of R n = P r n ( R ∞ ) and Id is the identical operator on the or-thogonal complement to R n As usual c = ( x − y ) . This is an example where theoptimal transportation problem admits a precise solution. By a well known result(see [14]) every rotationally invariant measure µ on R ∞ admits a representation µ = Z γ t dp µ ( t ) , where γ t is the distribution of the Gaussian i.i.d. with zero mean and variance t and p µ is a measure on R + . The optimal transportation problem is reduced obviouslyto the one-dimensional optimal transportation between p µ and p ν . Example . Stationary measures.
These are the measures which are invatiantwith respect to the shift: T : x = ( x , x , · · · ) ( x , x , · · · ) . Note that the powers of T generates the semigroup { } ∪ N , but not the group.However, it makes no difference for our analysis, we are still able to consider thecorresponding ergodic decompositions. In this case the description of ergodic mea-sures is nontrivial and we do not know any general sufficient conditions for existenceeven in the case when both measures are ergodic. Some sufficient conditions aregiven in Section 7.We conclude the section with the remark that existence of a transportationmapping for (not necessary optimal) symmetric plan π with ergodic X -marginalimplies ergodicity of π . Proposition 5.11.
Let X = Y is be Polish space and Γ be a group of Borelone-to-one transformations acting on X . Assume that π and µ are Γ -invariantBorel probability measures on X × Y and X respectively. Assume, in addition, that r X π = µ , µ is ergodic, and π ( { x, T ( x ) } ) = 1 for some Borel mapping T . Then π is ergodic.Proof. Assuming the contrary we represent π as a convex combinations of two Γ-invariant measures π = λπ + (1 − λ ) π ,π = π , 0 < λ <
1. Clearly, this implies a similar decomposition for the projections µ = λP r X π + (1 − λ ) P r X π . If we show that µ , µ are Γ-invariant and distinct,we will get a contradiction. The Γ-invariance of both measures follows immediatelyfrom the Γ-invariance of π i . Let us show that µ = µ . Assume the contraryand take a Borel set B ⊂ X × Y . We get that π i ( B ) equals to µ i ( A ), where A = Pr X ( B ∩ Graph( T ))) (note that A is universally measurable as a projection ofa Borel set). Then it follows that π i coincide because µ i do coincide. (cid:3) Kantorovich duality
In this section we start to study measures which are invariant under actions ofsome group. The results of this section will not be used in this paper, but they areof independent interest.Let X , Y be Polish spaces, Γ be a locally-compact amenable group with contin-uous actions L X Γ , L Y Γ on X , Y respectively. The action L Γ on the product space X × Y is defined as follows: L g ( x, y ) = ( L g ( x ) , L g ( y )) . where L g is an element of L Γ corresponding to g ∈ Γ.Let us define the space W Γ ⊂ C b ( X × Y ) as the closure of linear span of thefollowing set: { f − f ◦ L g : f ∈ C b ( X × Y ) , g ∈ Γ } . It can be checked that the property(9) Z ωdπ = 0 , ∀ ω ∈ W Γ of a probability measure π ∈ P ( X × Y ) is equivalent to its invariance w.r.t. L Γ .Let µ ∈ P ( X ), ν ∈ P ( Y ) be invariant under the actions L X Γ , L Y Γ respectively.Then a transport plan π ∈ Π( µ, ν ) is invariant iff the property (9) is satisfied. Wedenote the set of all invariant transport plans by Π Γ ( µ, ν ).The following Theorem is a refinement of the duality result, which was provedin [25] (Theorem 2.5). In there we considered only C b ( X × Y ) cost functions (wewarn the reader that the classical duality statement from Section 2 is formulated ina slightly different but equivalent way: in notations of this section Φ = x − ϕ, Ψ = y − ψ ). Theorem 6.1.
Let c ∈ C ( X × Y ) be a nonnegative function such that there exist f ∈ L ( X, µ ) , g ∈ L ( Y, ν ) , and c ( x, y ) ≤ f ( x ) + g ( y ) , ∀ ( x, y ) ∈ X × Y. Then, in the setting described above, inf π ∈ Π Γ Z cdπ = sup Φ+Ψ+ ω ≤ c Z X Φ( x ) dµ + Z Y Ψ( y ) dν, where Φ ∈ L ( X ) , Ψ ∈ L ( Y ) , ω ∈ W Γ . roof. The inequality inf π ∈ Π Γ Z cdπ ≥ sup Φ+Ψ+ ω ≤ c Z Φ dµ + Z Ψ dν can be easily obtained:inf π ∈ Π Γ Z cdπ ≥ inf π ∈ Π Γ (cid:18) sup Φ+Ψ+ ω ≤ c Z (Φ + Ψ + ω ) dπ (cid:19) == inf π ∈ Π Γ (cid:18) sup Φ+Ψ+ ω ≤ c Z Φ dµ + Z Ψ dν (cid:19) = sup Φ+Ψ+ ω ≤ c Z Φ dµ + Z Ψ dν. To obtain the opposite inequality we use the following statement from Theorem 2.5of [25]. inf π ∈ Π Γ Z c b dπ = sup Φ+Ψ+ ω ≤ c b Z X Φ( x ) dµ + Z Y Ψ( y ) dν for c b ∈ C b ( X × Y ), Φ ∈ C b ( X ), Ψ ∈ C b ( Y ), ω ∈ W Γ . Let c n ( x, y ) := min { c ( x, y ) , n } for each n ∈ N . The inequalitysup Φ+Ψ+ ω ≤ c n Z X Φ( x ) dµ + Z Y Ψ( y ) dν ≤ sup Φ+Ψ+ ω ≤ c Z X Φ( x ) dµ + Z Y Ψ( y ) dν is obvious for any natural n . Thus it remains to prove thatlim n →∞ inf π ∈ Π Γ Z c n dπ = inf π ∈ Π Γ Z cdπ. Recall that the functional π → R c b dπ is weakly continuous for every c b ∈ C b ( X × Y ).It follows from the characterization (9) of invariant measures, that Π Γ ( µ, ν ) is aclosed subset of Π( µ, ν ), which is known to be compact. Thus Π Γ ( µ, ν ) is compactin the topology of weak convergence. If π n is the solution forinf π ∈ Π Γ Z c n dπ, the sequence ( π n ) has to have a subsequence converging to some element π ∗ ∈ Π Γ .Since for any fixed m ∈ N the inequality: lim n →∞ R c n dπ ∗ ≥ R c m dπ ∗ is satisfied,and, by monotone convergence theorem, lim m →∞ R c m dπ ∗ = R cdπ ∗ ≤ R ( f ( x ) + g ( y )) dπ ∗ < ∞ , we obtainlim n →∞ Z c n dπ n ≥ lim m →∞ Z c m dπ ∗ = Z cdπ ∗ ≥ inf π ∈ Π Γ Z cdπ. This fact concludes the proof of the theorem. (cid:3)
As one can see, the form of the duality theorem is similar to the well-knownclassic result, but the difference is substantial: dual functionals are related to eachother in a more complicated way. Moreover, there is no existence result for the dualproblem without any additional assumptions.It was shown in [25] (Theorem 5.7) that in case of compact group Γ and underthe assumptions of Theorem 6.1,inf π ∈ Π Γ Z cdπ = sup Φ+Ψ ≤ ¯ c Z X Φ( x ) dµ + Z Y Ψ( y ) dν. where ¯ c := R Γ ( c ◦ g ) dχ ( g ) and χ ( g ) is the probability Haar measure. It is clear thatif cost function is Γ-invariant, the invariant dual problem coincides with the usualone. oameni ([21]) proved that for Γ = Z and an invariant cost function c , thecorresponding invariant dual problem coincides with the usual one, and, moreover,both prime and dual Kantorovich problems have an invariant solution.7. Existence of invariant optimal mapping for stationary measures
Recall that the measures on R ∞ which are invariant with respect to the shift σ ( x , x , . . . ) = ( x , x , . . . )are called stationary measures. Unlike exchangeable measures, the projections ofstationary measures are in general not invariant with respect to some reasonablefamily of linear transformation.As usual we assume that R ∞ is approximated by the sequence of finite-dimensionalspaces R n in the following sense: we identify R n with the subset P n ( R ∞ ) = { x = ( x , x , · · · , x n , , , · · · ) } ⊂ R ∞ . On every finite-dimensional space R n we will apply the following operator of cyclicalshift: σ n ( x , x , · · · , x n ) = ( x , x , · · · , x n , x ) . Let us associate with every stationary measure µ the cyclical average of its projec-tions: ˆ µ n = 1 n n X i =1 ( µ ◦ P − n ) ◦ σ − ( i − n . In addition, let us denote by R m,n the orthogonal complement of R m ⊂ R n : R n = R m × R m,n , m < n. The marginal measures are always assumed to satify the following property:
Assumption A.
The measures µ, ν are stationary Borel probability measuressuch that their projections on every R n µ ◦ P r − n , ν ◦ P r − n have Lebesgue densities and bounded second moments.We consider symmetric Monge-Kantorovich problem(10) Z ( x − y ) dπ → minwhere the infimum is taken among of all stationary measures Π Γ ( µ, ν ) with marginals µ, ν . Remark . Minimizing R ( x − y ) dπ is equivalent to maximizing of R x y dπ ,because R x dπ = R x dµ, R y dπ = R y dν are fixed. Theorem 7.2.
Let µ be a stationary measure which satisfies the following assump-tions: µ is a weak limit of a sequence of σ n -invariant measures µ n on R n . For every m < n there exists a probability measure µ m,n on R m,n such thatthe relative entropy (the Kullback-Leibler distance) between µ m × µ m,n and µ n is uniformly bounded in n : Z log (cid:16) dµ n d ( µ m × µ m,n ) (cid:17) dµ n < C m ith C m satisfying lim m C m m = 0;3) The cyclical average ˆ µ n of the n -dimensional projection µ ◦ P − n has finitesecond moments and admits a density ρ n with respect to µ satisfying sup n Z ρ − εn dµ < ∞ for some ε > .Then there exists a mapping T with the properties • T pushes forward µ onto the standard Gaussian measure on R ∞ : ν = γ. • T a µ -a.e. limit of finite dimensional mappings T n : R n R n such thatevery T n is a solution to an optimal transportation problem on R n .Proof. We consider the sequence of n -dimensional optimal transportation mappings T n with cost function P ni =1 ( x i − y i ) pushing forward µ n onto γ n . It follows fromthe σ n -invariance of µ n and γ n that the mapping T n is cyclically invariant: h T n ◦ σ n , e i i = h T n , e i − i , µ n − a . e . Fix a couple of numbers m, n with n > m . Let T m,n be the optimal transporta-tion mapping for the cost function P mi = n +1 ( x i − y i ) pushing foward µ m,n ontothe standard Gaussian measure on R m,n . We stress that T m and T m,n depend ondifferent collections of coordinates.We extend T m onto R n in the following way: T m ( x ) = T m ( P m x ) + T m,n ( P m,n x ) . Clearly, T m pushes forward µ m × µ m,n onto the standard Gaussian measure on R n .Applying Proposition 2.2 to the couple of mappings T m , T n , we get(11) 12 Z k T n − T m k dµ n ≤ Z log (cid:16) dµ n d ( µ m × µ m,n ) (cid:17) dµ n . This implies(12) m X i =1 Z h T n − T m , e i i dµ n ≤ Z k T n − T m k dµ n ≤ C m for every m, n , m < n .Let us note that for every i one can extract a weakly convergent subsequencefrom a sequence of (signed) measures {h T n , e i i · µ n } . Indeed, for any compact set K (cid:16)Z K c |h T n , e i i| dµ n (cid:17) ≤ Z |h T n , e i i| dµ n · µ n ( K c ) = Z x i dγ · µ n ( K c ) . Using the tightness of { µ n } we get that {|h T n , e i i| · µ n } is a tight sequence. Inaddition, note that for every continuous f lim n (cid:16)Z f |h T n , e i i| dµ n (cid:17) ≤ Z x i dγ · Z f dµ. This implies that any limiting point of {h T n , e i i · µ n } is absolutely continuous withrespect to µ . Applying the diagonal method and passing to a subsequence one canassume that convergence takes place for all i simultaneously. Consequently, there xists a subsequence { n k } and a measurable mapping T with values in R ∞ suchthat h T n k , e i i · µ n k → h T, e i i · µ weakly in the sense of measures for every i . It is easy to check that the standardproperty of L -weak convergence holds also in this case:(13) Z h T, e i i dµ ≤ lim k Z h T n k , e i i dµ n = Z x i dγ = 1 . Finally, we pass to the limit in (12) and get(14) m X i =1 Z h T − T m , e i i dµ ≤ C m . The claim follows from (13) and the fact that lim n R ϕ dµ n = R ϕ dµ for every ϕ ∈ L ( µ ). Indeed, if ϕ is bounded and continuous, this follows from the weakconvergence h T n , e i i · µ n → h T, e i i · µ . For arbitrary ϕ ∈ L ( µ ) we find continuousbounded cylindrical function ˜ ϕ such that k ϕ − ˜ ϕ k L ( µ ) < ε . One has lim n R ϕ dµ n =lim n R ( ϕ − ˜ ϕ ) dµ n + R ˜ ϕ dµ . The claim follows from the estimate (cid:16)Z | ϕ − ˜ ϕ | dµ n (cid:17) ≤ Z ( ϕ − ˜ ϕ ) dµ · Z ρ n dµ ≤ (sup n Z ρ n dµ ) ε . Note that T commutes with the shift σ : h T ◦ σ, e i i = h T, e i − i . Indeed, for everybounded cylindrical ϕ one has Z ϕ h T n , e i − i dµ n = Z ϕ h T n ( σ n ) , e i i dµ n = Z ϕ ( σ − n ) h T n , e i i dµ n = Z ϕ ( σ − ) h T n , e i i dµ n . Here we use that ϕ ( σ − n ) = ϕ ( σ − ) for sufficiently large values of n and the cyclicalinvariance of T n . Passing to the limit in the n k -subsequence one gets Z ϕ h T, e i − i dµ = Z ϕ ( σ − ) h T, e i i dµ = Z ϕ h T ◦ σ, e i i dµ. Hence T ◦ σ = σ ◦ T .Hence by assumptions of the theorem and (14) we getlim sup m m m X i =1 Z h T − T m , e i i dµ = 0 . To prove that T pushes forward µ into γ it is sufficient to show that that h T m , e i i → h T, e i i in measure (see Lemma 3.2). To this end let us approximate T by a bounded function ξ ( x , . . . , x k ) depending on finite number of coordinatesin L ( µ ): R k T − ξ k dµ < ε , where ε is chosen sufficiently small. Set: ξ i = ξ ◦ σ i − .Clearly, we get by the shift invariance1 m Z m X i =1 ( T i − ξ i ) dµ = Z ( T − ξ ) dµ < ε. Hence lim sup m m Z k T m − ξ k dµ ≤ ε, ξ = ( ξ , ξ , . . . ) . Let make the change of variables under the cyclical shift σ n . One has h T m , e i i ◦ σ − ( i − m = T or all 1 ≤ i ≤ m and ξ i ◦ σ − ( i − m = ξ as soon as i − k ≤ m . Hence for the latter values of i one has Z h ξ − T, e i i dµ = Z h ξ − T, e i dµ ◦ σ in . The number of indices which do not satify this property is limited by k . Clearly, itdoses not affect the limit of averages. Finally we obtain ε ≥ lim sup m m Z n X i =1 h ξ − T m , e i i dµ = lim sup m Z h ξ − T m , e i d ˆ µ m . Recall that R ( T − ξ ) dµ ≤ ε . Finallylim sup m Z h T − T m , e i d ˆ µ m ≤ m Z h ξ − T m , e i d ˆ µ m + 2 lim sup m Z ( T − ξ ) d ˆ µ m ≤ ε. Since ε > R h T − T m , e i d ˆ µ m →
0. By the H¨older inequality Z h T − T m , e i p dµ ≤ (cid:16)Z h T − T m , e i d ˆ µ m (cid:17) p (cid:16)Z ρ − p − m dµ (cid:17) q . Take p = 1 + ε we get by the assumption of the theorem that the latter tends tozero. The proof is complete. (cid:3) Remark . In Theorem 7.2 the Gaussian measure γ can be replaced by any count-able power of an uniformly log-concave one-dimensional measure.In the following proposition we prove that the transportation mapping T isindeed optimal under additional assumptions. Proposition 7.4.
Let the assumptions of Theorem 7.2 hold. Assume in additionthat lim n →∞ n W (ˆ µ n , µ n ) = 0 . Then there exists a solution π of problem (10) in the class of stationary measuressuch that π { ( x, T ( x )) , x ∈ R ∞ } = 1 .Proof. We show that the measure π = µ ◦ ( x, T ( x )) − , which is the weak limit ofmeasures π n is optimal. Recall that π n gives minimum to m → R P ni =1 ( x i − y i ) dm and has marginals µ n , γ n , hence measure π has marginals µ, γ . Indeed, Z ( x − y ) dπ = lim n Z ( x − y ) dπ n = lim n n Z n X i =1 ( x i − y i ) dπ n . If π is not optimal, when there exists a stationary measure π with projections µ, ν such that Z ( x − y ) dπ + ε < n Z N X i =1 ( x i − y i ) dπ n or some ε > n . Taking into account stationarityof π we get R x i y i dπ = R x j y j π for every i, j , thus Z n X i =1 ( x i − y i ) d ˆ π + nε = Z n X i =1 ( x i − y i ) dπ + nε < Z n X i =1 ( x i − y i ) dπ n , where ˆ π = n P ni =1 ( π ◦ P r − n ) ◦ σ − ( i − n . The latter inequality implies W (ˆ µ n , γ n ) + nε ≤ W ( µ n , γ n ) . By the triangle inequality W (ˆ µ n , γ n ) + nε ≤ ( W ( µ n , ˜ µ n ) + W (ˆ µ n , γ n )) ≤ W ( µ n , ˆ µ n ) + 2 W (ˆ µ n , γ n ) W ( µ n , ˆ µ n ) + W (ˆ µ n , γ n ) . Hence(15) ε ≤ n (2 W (ˆ µ n , γ n ) W ( µ n , ˆ µ n ) + W (ˆ µ n , µ n )) . The quantity W (ˆ µ n , γ n ) can be trivially estimated by 2 P ni =1 ( R x i d ˆ µ n + R y i dγ n ) ≤ Cn . Then the using the assumption of the theorem we get that the right-hand sideof (15) tends to zero, which contradicts to positivity of ε . (cid:3) We finish this section with a concrete application of Theorem 7.2. We study atransportation of a Gibbs measure µ which can be formally written in the form µ = e − H ( x ) dx, where the potential H admits the following heuristic representation: H ( x ) = ∞ X i =1 V ( x i ) + ∞ X i =1 W ( x i , x i +1 ) . Here V and W are smooth functions and W ( x, y ) is symmetric: W ( x, y ) = W ( y, x ).The existence of such measures was proved in [2].Let us specify the assumptions about V and W . These are a particular case ofassumptions A1-A3 from [2].1) W ( x, y ) = W ( y, x );2) There exist numbers J > L ≥ N ≥ σ >
0, and
A, B, C > | W ( x, y ) | ≤ J (1 + | x | + | y | ) N − , | ∂ x W ( x, y ) | ≤ J (1 + | x | + | y | ) N − | V ( x ) | ≤ C (1 + | x | ) L , | V ′ ( x ) | ≤ C (1 + | x | ) L − ;4) (coercivity assumption) V ′ ( x ) · x ≥ A | x | N + σ − B. Let us define the following probability measure on E n : µ n = 1 Z n exp (cid:16) − n X i =1 (cid:0) V ( x i ) + W ( x i , x i +1 ) (cid:1)(cid:17) , with the convention x n +1 := x . Here Z n is the normalizing constant. roposition 7.5. The sequence µ n admits a weakly convergent subsequence µ n k → µ satisfying the assumptions of Theorem 7.2.Proof. It was proved in Theorem 3.1 of [2] that any sequence of probability measures˜ µ n = c n e − H n dx − n · · · dx n , where H n is obtained from H by fixing a boundary condition ˜ xH n = n X i =1 V ( x i ) + n − X i =1 W ( x i , x i +1 ) + W ( x n , ˜ x ) , has a weakly convergent subsequence ˜ µ n k → ˜ µ . In addition (see [2]), µ satisfies thefollowing a priori estimate: for every λ > k ∈ N Z exp( λ | x k | N ) d ˜ µ < ∞ . The same estimate holds for ˜ µ n uniformly in n .Following the reasoning from [2] it is easy to show that the sequence { µ n } is tightand satisfies the same a priori estimate. Thus, we can pass to a subsequence { µ n ′ } which weakly converges to a measure µ . For the sake of simplicity this subsequencewill be denoted by { µ n } again. The limiting measure µ satisfies(16) sup k ∈ N Z exp( λ | x k | N ) dµ < ∞ , moreover,(17) sup n sup k ∈ N Z exp( λ | x k | N ) dµ n < ∞ . Let us estimate the relative entropy. We note that µ n and µ m ( n > m ) arerelated in the following way: e Z µ n R e Z dµ n = µ m × ν m,n , where Z = − W ( x m , x ) + W ( x m , x m +1 ) + W ( x n , x ), and ν m,n is a probabilitymeasure on E m,n . Set: µ m,n = ν m,n . Then Z log (cid:16) dµ n d ( µ m × µ m,n ) (cid:17) dµ n = Z ( Z − log Z e Z dµ n ) dµ n . The desired bound follows immediately from (17) and the assumptions about W .In order to prove assumption 3) we note that h e W ( x n ,x n +1 )+ W ( x ,x n ) · µ i ◦ P − n R e W ( x n ,x )+ W ( x ,x n ) dµ = e W ( x ,x n ) · µ n R e W ( x ,x n ) dµ n . The normalizing constants can be easily estimated with the help of a priori boundsfor µ and µ n . Applying assumptions on W one can easily get that Ae − B ( | x n | N − + | x q | N − ) ≤ dµ n dµ ◦ P − n ≤ Ae B ( | x n | N − + | x | N − ) where A, B > n . Hence, Assumption 3) follows immediatelyfrom (17), the Jensen inequality and convexity if the function x − ε . (cid:3) emark . Finally, let us briefly discuss when the transportation mapping ob-tained in Proposition 7.5 by Theorem 7.2 solves the corresponding optimal trans-portation problem. To this end we apply Proposition 7.4.Following the estimates obtained in Proposition 7.5 and applying Jensen inequal-ity one can easily show that the sequence of the entropies Z log (cid:16) d ˆ µ n dµ n (cid:17) d ˆ µ n is bounded. Then the assumption of Proposition 7.5 holds, for instance, if every µ n satisfies the Talagrand inequality W ( µ n , ρ · µ n ) ≤ C Z ρ log ρdµ n with constant which does not depends on n . We don’t investigate here sufficientcondition for measures µ n to satisfy this inequality, we just mention that this clearlyholds in many natural situations (e.g. under assumption of uniform log-concavityor finiteness of the log-Sobolev constant).In addition, we emphasize, that in many applications the measures do indeedsatisfy the Talagrand inequality, but Proposition 7.4 should actually work undermuch milder assumptions. References [1] Ambrosio L., Gigli N., Savar´e G., Gradient flows in metric spaces and in the Wassersteinspaces of probability measures, Birkh¨auser, 2008.[2] Albeverio S., Kondratiev Yu.G., R¨ockner M., Tsikalenko T.V., A priori estimates for sym-metrizing measures and their applications to Gibbs states, Journ. of Func. Anal., 171, 366–400, 2000.[3] Beiglb¨ock M., Cyclical monotonicity and the ergodic theorem,. Ergod. Theory and Dynam.Syst., 35(3), 710–713, 2015.[4] Bogachev V.I., Measure theory. V. 1,2. Springer, Berlin – New York, 2007.[5] Bogachev V.I., Kolesnikov A.V., On the Monge–Amp`ere equation in infinite dimensions,Infin. Dimen. Anal. Quantum Probab. Related Topics, 8(4), 547–572, 2005.[6] Bogachev V.I., Kolesnikov A.V., Sobolev regularity for the Monge–Ampere equation in theWiener space. Kyoto Jour. Math., 53(4), 713–738, 2013.[7] Bogachev V.I., Kolesnikov A.V., The Monge–Kantorovich problem: achievements, connec-tions, and perspectives, Russian Mathematical Surveys, 67(5), 785–890, 2012.[8] Cavalletti F., The Monge problem in Wiener Space. Calcul. of Var.and PDE’s, 45(1-2), 101–124, 2012.[9] Contreras G., Lopes A.O., Oliveira E.R., Ergodic Transport Theory, periodic maximizingprobabilities and the twist condition. In: Modeling, Dynamics, Optimization and Bioeco-nomics I, vol. 73 of the series Springer Proceedings in Mathematics and Statistics, 183–219,2014.[10] Fang S., Nolot V., Sobolev estimates for optimal transport maps on Gaussian spaces, Jour.Func. Anal., 266(8), 5045–5084, 2014.[11] Fang S., Shao J. Optimal transport maps for Monge–Kantorovich problem on loop groups,Journ. of Func. Anal., 248(1), 225–257, 2007.[12] Feyel D., ¨Ust¨unel A.S., Monge–Kantorovich measure transportation and Monge–Amp`ereequation on Wiener space, Prob. Theory and Related Fields, 128, 347–385, 2004.[13] Ghoussoub N., Moameni A., Symmetric Monge-Kantorovich problems and polar decomposi-tions of vector fields, Geom. Func. Anal., 24(4), 1129–1166, 2014.[14] Kallenberg O., Probabilistic symmetries and invariance principles, Springer-Verlag New York,2005.[15] Kolesnikov A.V., Convexity inequalities and optimal transport of infinite-dimensional mea-sures, J. Math. Pures Appl. (9), 83(11), 1373–1404, 2004.
16] Kolesnikov A.V., On Sobolev regularity of mass transport and transportation inequalities,Theory of Probability and its Applications, to appear. Translated from Teor. Veroyatnost. iPrimenen., 57(2), 296–321, 2012.[17] Kolesnikov A.V., Mass transportation and contractions, MIPT Proc. 2(4), 90–99, 2010.[18] Kolesnikov A.V., R¨ockner M., On transport equation in infinite dimensions. Jour. Func. Anal.266(7), 4490–4537, 2014.[19] Lopes A.O., Mengue J.K., Duality Theorems in Ergodic Transport, Journ Stat. Phys., 149(5),921–942, 2012.[20] Lopes A.O., Oliveira E.O.,Thieullen P., The Dual Potential, the involution kernel and Trans-port in Ergodic Optimization, Dynamics, Games and Science Vol. 1 of the series CIM Seriesin Mathematical Science, 357–398.[21] Moameni A., Invariance properties of the Monge-Kantorovich mass transport problem, arXiv:1311.7051.[22] R¨uschendorf L., Sei T., On optimal stationary couplings between stationary processesElectr. Journ. Probab., 17, article 17, 2012.[23] Vershik A.M., The problem of describing central measures on the path spaces of gradedgraphs., Func.l Analysis and Its Appl., 48(4), 256–271, 2014.[24] Villani C., Topics in optimal transportation, Amer. Math. Soc. Providence, Rhode Island,2003.[25] Zaev D.A., On the Monge-Kantorovich problem with additional linear constraints, Mat. Za-metki, 98(5), 664–683, 2015.[26] Zaev D.A., On ergodic decompositions related to the Kantorovich problem, Zapiski POMI,437, 100–130, 2015.
Higher School of Economics, Moscow, Russia
E-mail address : [email protected]
Higher School of Economics, Moscow, Russia
E-mail address : [email protected]@gmail.com