Exponential Convergence in L p -Wasserstein Distance for Diffusion Processes without Uniformly Dissipative Drift
aa r X i v : . [ m a t h . P R ] F e b Exponential Convergence in L p -WassersteinDistance for Diffusion Processes withoutUniformly Dissipative Drift Dejun Luo a ∗ Jian Wang b † a Institute of Applied Mathematics, Academy of Mathematics and Systems Science,Chinese Academy of Sciences, Beijing 100190, China b School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China
Abstract
By adopting the coupling by reflection and choosing an auxiliary function whichis convex near infinity, we establish the exponential convergence of diffusion semi-groups ( P t ) t ≥ with respect to the standard L p -Wasserstein distance for all p ∈ [1 , ∞ ). In particular, we show that for the Itˆo stochastic differential equationd X t = d B t + b ( X t ) d t, if the drift term b satisfies that for any x, y ∈ R d , h b ( x ) − b ( y ) , x − y i ≤ ( K | x − y | , | x − y | ≤ L ; − K | x − y | , | x − y | > L holds with some positive constants K , K and L >
0, then there is a constant λ := λ ( K , K , L ) > p ∈ [1 , ∞ ), t > x, y ∈ R d , W p ( δ x P t , δ y P t ) ≤ Ce − λt/p ( | x − y | /p , if | x − y | ≤ | x − y | , if | x − y | > . where C := C ( K , K , L, p ) is a positive constant. This improves the main resultin [13] where the exponential convergence is only proved for the L -Wassersteindistance. Keywords:
Exponential convergence, L p -Wasserstein distance, coupling by reflection,diffusion process MSC (2010): ∗ Email: [email protected]. Partly supported by the Key Laboratory of RCSDS, CAS (2008DP173182),NSFC (11571347) and AMSS (Y129161ZZ1) † Email: [email protected]. Partly supported by NSFC (11201073 and 11522106), NSFFJ(2015J01003) and PNAIA (IRTL1206). Introduction
In this paper we consider the following Itˆo stochastic differential equationd X t = σ d B t + b ( X t ) d t, (1.1)where ( B t ) t ≥ is a standard d -dimensional Brownian motion, σ ∈ R d × d is a non-degenerateconstant matrix, and b : R d → R d is a Borel measurable vector field. Recently there are in-tensive studies on the existence of the unique strong solution to (1.1) with singular drift b .For example, if σ = c Id for some constant c = 0 and b is bounded and H¨older continuous,Flandoli et al. [14] proved that (1.1) generates a unique flow of diffeomorphisms on R d .The results are recently extended by F.-Y. Wang in [28] to (infinite dimensional) stochas-tic differential equations with a nice multiplicative noise and a locally Dini continuousdrift. From these results we see that when the diffusion coefficient σ is non-degenerate,quite weak conditions on b are sufficient to guarantee the well-posedness of (1.1), whichwill be assumed throughout this paper. Moreover, we assume the solution ( X t ) t ≥ hasfinite moments of all orders.Denote by ( P t ) t ≥ the semigroup associated to (1.1). If the initial value X is dis-tributed as µ , then for any t >
0, the distribution of X t is µP t . We are concerned withthe long-time behavior of P t as t tends to ∞ , more exactly, the rate of convergence toequilibrium of δ x P t for any x ∈ R d . This problem is of fundamental importance in thestudy of Markov processes, and has been attacked by a large number of researchers in theliterature. To the authors’ knowledge, there are at least three approaches for obtainingquantitative ergodic properties. The first one is known as Harris’ theorem which combinesLyapunov type conditions and the notion of small set, see [19, 20, 15] for systematic pre-sentations. Recently this method is further developed in [16, 17, 4] with applications tostochastic partial differential equations (SPDE) and stochastic delay differential equations(SDDE). The second method employs functional inequalities to characterize the rate ofconvergence to equilibrium for ( P t ) t ≥ . It is a classical result that for symmetric Markovprocesses the Poincar´e inequality is equivalent to the exponential decay of the semigroup.More general functional inequalities were introduced in [26, 23, 27] to describe differentconvergence rates. It was shown in [1] that the two methods above can be linked togetherby Lyapunov–Poincar´e inequalities. We also would like to mention that Bolley et al. [2]recently studied the exponential decay in the L -Wasserstein distance W via the so-called WJ inequality, by using the explicit formula for time derivative of W along solutions tothe Fokker–Planck equation (see [2, Theorem 2.1]). An application to the granular mediawas given in [3], yielding uniformly exponential convergence to equilibrium in the presenceof non-convex interaction or confinement potentials.Yet there is another approach for studying exponential convergence of the semigroup( P t ) t ≥ corresponding to the SDE (1.1) considered in this paper, that is, the couplingmethod. If the drift vector field b fulfills certain dissipative properties, this latter methodprovides explicit rate of convergence to equilibrium in a straightforward way, see e.g.[5, 12] and the preprint [13]. The present work is motivated by [12, 13] where the authorobtained exponential decay of ( P t ) t ≥ when the drift b is assumed to be only dissipativeat infinity, see the introduction below for more details.A good tool for measuring the deviation between probability distributions is theWasserstein-type distances which are defined as follows. Let ψ ∈ C ([0 , ∞ )) be a strictly2ncreasing function satisfying ψ (0) = 0. Given two probability measures µ and ν on R d ,we define the following quantity W ψ ( µ, ν ) = inf Π ∈C ( µ,ν ) Z R d × R d ψ ( | x − y | ) dΠ( x, y ) , where | · | is the Euclidean norm and C ( µ, ν ) is the collection of measures on R d × R d having µ and ν as marginals. When ψ is concave, the above definition gives rise to aWasserstein distance W ψ in the space P ( R d ) of probability measures ν on R d such that R ψ ( | z | ) ν (d z ) < ∞ . If ψ ( r ) = r for all r ≥
0, then W ψ is the standard L -Wassersteindistance (with respect to the Euclidean norm | · | ), which will be denoted by W ( µ, ν )throughout this paper. We are also concerned with the L p -Wasserstein distance W p forall p ∈ [1 , ∞ ), i.e. W p ( µ, ν ) = inf Π ∈C ( µ,ν ) (cid:18)Z R d × R d | x − y | p dΠ( x, y ) (cid:19) /p . Equipped with W p , the totality P p ( R d ) of probability measures having finite moment oforder p becomes a complete metric space.In this paper, we shall establish the exponential convergence of the map µ µP t with respect to the L p -Wasserstein distance W p for all p ≥
1. We first recall some knownresults.
Theorem 1.1 (Uniformly dissipative case).
Suppose that σ = Id and there exists aconstant K > such that h b ( x ) − b ( y ) , x − y i ≤ − K | x − y | for all x, y ∈ R d . (1.2) Then, for any p ≥ and t > , W p ( µP t , νP t ) ≤ e − Kt W p ( µ, ν ) for all µ, ν ∈ P p ( R d ) . (1.3)The proof of this result is quite straightforward, by simply using the synchronouscoupling (also called the coupling of marching soldiers in [7, Example 2.16]), see e.g. [2,p.2432] for a short proof.In applications, the so-called uniformly dissipative condition (1.2) is too strong. In-deed, it follows from [22, Theorem 1] or [2, Remark 3.6] (also see [5, Section 3.1.2, Theorem1]) that (1.3) holds for any probability measures µ and ν if and only if (1.2) holds for all x , y ∈ R d . The first breakthrough to get rid of this restrictive condition was done recentlyby Eberle in [13], at the price of multiplying a constant C ≥ b : κ ( r ) := sup (cid:26) h σ − ( x − y ) , σ − ( b ( x ) − b ( y )) i | σ − ( x − y ) | : x, y ∈ R d with | σ − ( x − y ) | = r (cid:27) . (1.4)As in [13, (2.3)], we shall assume throughout the paper that Z s κ + ( r ) d r < + ∞ for all s > . This technical condition will be used in Section 2 to construct the auxiliary function.3 heorem 1.2 ([13, Corollary 2.3]).
Suppose that the vector field b is locally Lipschitzcontinuous, and there is a constant c > such that κ ( r ) ≤ − cr for all r > large enough . (1.5) Then there exist positive constants C , λ > such that for any t > and µ, ν ∈ P ( R d ) , W ( µP t , νP t ) ≤ Ce − λt W ( µ, ν ) . In particular, when σ = Id, the condition (1.5) holds true if (1.2) is satisfied only forlarge | x − y | , that is, h b ( x ) − b ( y ) , x − y i ≤ − K | x − y | , | x − y | ≥ L for some constant L > µ µP t converges exponentially with respect to the standard L -Wasserstein distance W under locally non-dissipative drift, see [13, Example 1.1] for more details. The proof of [13,Corollary 2.3] is based on the coupling by reflection of diffusion processes and a carefullyconstructed concave function, cf. [13, Section 2]. A number of direct consequences arepresented in [13, Section 2.2] which indicate that the convergence result as [13, Corollary2.3] is extremely useful.However, [13, Corollary 2.3] is not satisfactory in the sense that no information onthe L -Wasserstein distance W is provided. This fact has also been noted in [5, Section7.1, Remark 19], saying that “the reflection coupling cannot furnish some information on W ”. Our main result of this paper shows that this is not the case. Theorem 1.3.
Assume that there are constants c > and η ≥ such that for all r ≥ η ,one has κ ( r ) ≤ − cr. (1.6) For ε ∈ (0 , c ) , define C ( ε ) = max (cid:26) e ε (cid:16) √ ε (cid:17)r c − ε , √ εε (1 − e − ) (cid:20) √ e p ε ( c − ε ) + 1 c − ε (cid:21)(cid:27) and λ = min { , /ε } C ( ε ) exp (cid:18) − c η − Z η κ + ( s ) d s (cid:19) . Then for any p > , t > and any x , y ∈ R d , it holds W p ( δ x P t , δ y P t ) ≤ Ce − λt/p ( | x − y | /p , if | x − y | ≤ | x − y | , if | x − y | > . (1.7) where C := C ( c, η, ε, p ) > is a positive constant. Theorem 1.3 above does provide new conditions on the drift term b such that the as-sociated semigroup ( P t ) t ≥ is exponentially convergent with respect to the L p -Wassersteindistance W p for all p ≥
1. The reason that we can obtain the exponential convergence in W p for all p ≥
1, not only W , is due to our particular choice of the auxiliary functionwhich is convex near infinity. It is designed by using Chen–Wang’s famous variational4ormula for the principal eigenvalue of one-dimensional diffusion operator, see for instance[10] or [7, Section 3.4]. The reader can refer to [8] and the references therein for recentstudies on this topic.The assertion of Theorem 1.3 can be slightly strengthened if (1.6) is replaced by astronger condition. Theorem 1.4.
Assume that there are constants c > , η > and θ > such that for all r ≥ η , one has κ ( r ) ≤ − cr θ . (1.8) Let λ be defined as in Theorem . with c replaced by cη θ − . Then there is a positiveconstant C such that for all t > and x, y ∈ R d , it holds W p ( δ x P t , δ y P t ) ≤ Ce − λt/p ( | x − y | /p , if | x − y | ≤ | x − y | ∧ t ∧ , if | x − y | > . (1.9)The idea of the proof is to use synchronous coupling for large | x − y | and the couplingby reflection for small | x − y | . For the latter part, we can directly use the result of Theorem1.3, since (1.8) implies that (1.6) holds with cη θ − if η > there is a constant c > such that κ ( r ) ≤ − c for all r large enough . (1.10)It turns out that under mild conditions on κ , the two conditions (1.5) and (1.10) areequivalent, up to changing the constants. More explicitly, we have Proposition 1.5.
Assume that there are constants c, r > such that κ ( r ) ≤ − c and δ := sup ≤ r ≤ r κ ( r ) < + ∞ . Then, the condition (1.5) holds with some other positiveconstant. This result indicates that if the function κ is locally bounded from above, then thefollowing statements are equivalent:(i) there exist constants c, r > κ ( r ) ≤ − c ;(ii) there exist constants c > θ ≤ κ ( r ) ≤ − cr θ for r > c > κ ( r ) ≤ − cr for r > c > θ > r > κ ( r ) ≤ − cr θ , see e.g. b ( x ) = − x for all x ∈ R d .Compared to (1.5), the seemingly much weaker condition (i), i.e. there exist two constants c, r > such that κ ( r ) ≤ − c , has the obvious advantage of being easily verifiable. Thuswe shall sometimes use this formulation in the sequel.The equivalence stated above also indicates that Theorem 1.3 is sharp in some situa-tion, as shown by the next example. Example 1.6.
Assume that σ = Id and b ( x ) = ∇ V ( x ) with V ( x ) = − (1 + | x | ) δ/ forsome δ ∈ (0 , . Then, we have the following statements. If δ ∈ (0 , , then κ ( r ) ≥ for all r large enough, and the inequality (1.7) does nothold for any positive constants C and λ with p = 1 . (2) If δ ∈ [1 , , then κ ( r ) = 0 for all r ≥ , and so for all x, y ∈ R d and t > , W ( δ x P t , δ y P t ) ≤ | x − y | . (1.11) On the other hand, it holds that d T V ( δ x P t , δ y P t ) ≤ r πt | x − y | for all t ≥ and x, y ∈ R d , (1.12) where d T V is the total variation distance between probability measures.
To show the power of Theorem 1.4, we consider the following example which yieldsthe exponential convergence of the semigroup ( P t ) t ≥ with respect to the L p -Wassersteindistance W p ( p >
2) for super-convex potentials. The assertion below improves the resultsmentioned in [5, Section 6.1].
Example 1.7.
Let σ = Id and b ( x ) = ∇ V ( x ) with V ( x ) = −| x | α and α > . It followsfrom [5, Section 6, Example 1] that there is a constant c > such that for all r > , κ ( r ) ≤ − cr α − . (1.13) Then, according to Theorem . , the associated semigroup ( P t ) t ≥ converges exponentiallywith respect to the L p -Wasserstein distance W p for any p ≥ . More explicitly, there is aconstant λ := λ ( α ) > such that for any p ≥ , x , y ∈ R d and t > , W p ( δ x P t , δ y P t ) ≤ Ce − λt (cid:20) | x − y | /p {| x − y |≤ } + (cid:16) | x − y | ∧ t ∧ (cid:17) {| x − y |≥ } (cid:21) holds with some constant C := C ( α, p ) > .Note that (1.13) implies that for all x , y ∈ R d , h b ( x ) − b ( y ) , x − y i ≤ − c | x − y | α . Therefore, the uniformly dissipative condition (1.2) fails when x, y ∈ R d are sufficientlyclose to each other. That is, one cannot deduce directly from Theorem . the exponentialconvergence with respect to the L p -Wasserstein distance W p ( p ≥ . As applications of Theorem 1.3, we consider the existence and uniqueness of the in-variant probability measure, and also the exponential convergence of the semigroup withrespect to the L p -Wasserstein distance W p . For p ∈ (1 , ∞ ), we define φ p ( r ) = ( r /p , if r < p − p/ ( p − ; r − p − p/ ( p − + p − / ( p − , if r ≥ p − p/ ( p − . Finally, let φ ( r ) = r for all r ≥
0. Then for all p ∈ [1 , ∞ ), φ p is a concave C -functionon R + , thus W φ p is a well defined distance on P p ( R d ). Moreover, r ∨ r /p ≤ φ p ( r ) ≤ r + r /p ≤ r ∨ r /p ) for all r ≥ , p ∈ [1 , ∞ ) . orollary 1.8. Suppose that the drift term b is locally bounded on R d , and (1.6) holdsfor all r > large enough with some constant c > . Then, there exists a unique invariantprobability measure µ ∈ ∩ p ≥ P p ( R d ) , such that there is a constant λ := λ ( c ) > such thatfor all p ∈ [1 , ∞ ) and for any probability measure ν ∈ P p ( R d ) , W p ( νP t , µ ) ≤ Ce − λt W φ p ( ν, µ ) , t ≥ holds with some positive constant C := C ( c, p ) . Remark 1.9. (1) Under the assumptions of Corollary 1.8, it is easy to establish thefollowing Foster-Lyapunov type conditions: Lφ ( x ) ≤ − c φ ( x ) + c , x ∈ R d , where L is the generator of the underlying diffusion process, φ ( x ) = | x | and c , c are two positive constants. Due to the existence and uniqueness of the invariantprobability measure, we know that the diffusion process exponentially convergesto the unique invariant probability measure µ with respect to the total variationdistance. That is, there are a constant θ > C ( x ) suchthat for all x ∈ R d and t > d T V ( P ( t, · ) , π ) ≤ C ( x ) e − θt , where P ( t, · ) is the associated transition probability.(2) As was pointed by the referee, the conclusion of Corollary 1.8 for p = 1 also couldbe deduced from Theorem 1.4 and the Foster-Lyapunov type condition above, byHarris’s theorem for the exponential convergence to the invariant measure in theWasserstein metric. See [17, Theorem 4.8] and [4, Theorem 2.4] for more details.The following statement is concerned with symmetric diffusion processes. Though webelieve the assertion below is known (see e.g. [25, Corollary 1.4]), we stress the relationbetween the exponential convergence with respect to L -Wasserstein distance W and thatwith respect to the L -norm, which is equivalent to the Poincar´e inequality. Corollary 1.10.
Let U be a C -potential defined on R d such that its Hessian matrix Hess( U ) ≥ − K for some K > . Assume that µ (d x ) = e − U ( x ) d x is a probability measureon R d . If there exists a constant L > such that inf | x − y | = L (cid:10) ∇ U ( x ) − ∇ U ( y ) , x − y (cid:11) > , (1.14) then µ satisfies the Poincar´e inequality, i.e. µ ( f ) − µ ( f ) ≤ C Z |∇ f ( x ) | d µ, f ∈ C c ( R d ) (1.15) holds for some constant C > . Preliminaries
Similar to the main result in [13], the proof of Theorem 1.3 is based on the reflectioncoupling of Brownian motion, which was introduced by Lindvall and Rogers [18] anddeveloped by Chen and Li [9]. First, we give a brief introduction of the coupling byreflection. Together with (1.1), we also considerd Y t = σ (Id − e t e ∗ t ) d B t + b ( Y t ) d t, t < T, (2.1)where Id ∈ R d × d is the identity matrix, e t = σ − ( X t − Y t ) | σ − ( X t − Y t ) | and T = inf { t > X t = Y t } is the coupling time. For t ≥ T , we shall set Y t = X t .Then, the process ( X t , Y t ) t ≥ is called the coupling by reflection of ( X t ) t ≥ . Under ourassumptions, the refection coupling ( X t , Y t ) t ≥ can be realized as a non-explosive diffusionprocess in R d . The difference process ( Z t ) t ≥ = ( X t − Y t ) t ≥ satisfiesd Z t = 2 Z t | σ − Z t | d W t + ( b ( X t ) − b ( Y t )) d t, t < T, (2.2)where ( W t ) ≤ t
Note that r (1 + r ) ∼ r as r → r (1 + r ) ∼ r as r → ∞ , where ∼ means thetwo quantities are of the same order. By L’Hˆopital’s law,lim r → ψ ( r ) ψ ( r ) = lim r → e cr / R ∞ r e − ( c − ε ) u / d u − r − (cid:0) e εr / − (cid:1) + r − e εr / εr = 2 ε Z ∞ e − ( c − ε ) u / d u = 2 ε r π c − ε ) . Next, using L’Hˆopital’s law twice,lim r →∞ ψ ( r ) ψ ( r ) = lim r →∞ ψ ( r ) r − e εr / = lim r →∞ e cr / R ∞ r e − ( c − ε ) u / d ue εr / [ − r − + εr − ]= lim r →∞ R ∞ r e − ( c − ε ) u / d ue − ( c − ε ) r / r − [ − r − + εr ]= − lim r →∞ h ( r ) , where h ( r ) = − ( c − ε ) r − ( − r − + εr ) − r − ( − r − + εr ) + r − (2 r − + ε ) . Thus, lim r →∞ ψ ( r ) ψ ( r ) = 1 ε ( c − ε ) . Therefore, the required assertion follows from the two limits above.By (2.6) and Lemma 2.2, we have ψ ( r ) ≤ C exp (cid:18) Z η [ κ + ( v ) + cv ] d v (cid:19) e εr / − r (1 + r )and ψ ( r ) ≥ ˆ C exp (cid:18) − Z η (cid:2) κ + ( v ) + cv (cid:3) d v (cid:19) e εr / − r (1 + r ) . Roughly speaking, the above two inequalities imply that the auxiliary function ψ behaveslike c ′ r for small r , and grows exponentially fast as e c ′′ r for large r . Hence, the function ψ ( r ) can be used to control the function r p with p ≥
1. More explicitly, we have10 orollary 2.3.
There is a constant C > such that for all r ≥ , C − (cid:2) r ∨ ( e εr / − (cid:3) ≤ ψ ( r ) ≤ C (cid:2) r ∨ ( e εr / − (cid:3) . (2.7) Consequently, for any p ≥ , there is a constant C = C ( p, ε ) > such that for all r ≥ , r p ≤ C ψ ( r ) . (2.8)Furthermore, we can give an explicit estimate to the constant C in Lemma 2 .
2, whichwill be used in the exponential convergence rate.
Lemma 2.4.
The constant C in Lemma . has the following expression: C = max (cid:26) e ε (cid:16) √ ε (cid:17)r c − ε , √ εε (1 − e − ) (cid:20) √ e p ε ( c − ε ) + 1 c − ε (cid:21)(cid:27) . Proof.
In order to estimate ψ ( r ), we need the following inequality on the tail of standardGaussian distribution (e.g. see [11, (3)]):1 − Φ( r ) ≤ φ ( r ) √ r + r for all r > , where Φ( r ) and φ ( r ) are respectively the distribution and density function of the standardGaussian distribution N (0 , s > Z ∞ s e − ( c − ε ) u / d u = 1 √ c − ε Z ∞√ c − ε s e − v / d v ≤ √ c − ε · e − ( c − ε ) s / p c − ε ) s + √ c − ε s . Substituting this estimate into the expression of ψ leads to ψ ( r ) ≤ √ c − ε Z r e εs / p c − ε ) s + √ c − ε s d s =: 2 √ c − ε ˜ ψ ( r ) . (2.9)Next, we consider two cases. (i) If r ≤ / √ ε , then˜ ψ ( r ) ≤ Z r e √ s = e √ r. (ii) If r > / √ ε , then˜ ψ ( r ) = (cid:18) Z / √ ε + Z r / √ ε (cid:19) e εs / p c − ε ) s + √ c − ε s d s ≤ e r ε + 12 √ c − ε Z r / √ ε e εs / s d s. (2.10)By the integration by parts formula, Z r / √ ε e εs / s d s = 1 ε Z r / √ ε d (cid:0) e εs / (cid:1) s = 1 ε (cid:20) e εr / r − ε e + Z r / √ ε e εs / s d s (cid:21) ≤ e εr / εr + 12 Z r / √ ε e εs / s d s. Z r / √ ε e εs / s d s ≤ e εr / εr . Substituting this estimate into (2.10) yields˜ ψ ( r ) ≤ e r ε + e εr / ε √ c − ε r . Summarizing the above two cases and using (2.9), we obtain ψ ( r ) ≤ q c − ε e r, if r ≤ √ ε ; √ e √ ε ( c − ε ) + e εr / ε ( c − ε ) r , if r > √ ε . (2.11)Furthermore, since e εr / − ≥ εr / , r ≥ , it is easy to show that for all r ∈ [0 , / √ ε ], it holds r c − ε e r ≤ C r (1 + r ) (cid:0) e εr / − (cid:1) , (2.12)where C = 2 e ε (cid:16) √ ε (cid:17)r c − ε . On the other hand, for r > / √ ε , we have r (1 + r ) (cid:20) √ e p ε ( c − ε ) + 2 e εr / ε ( c − ε ) r (cid:21) = (cid:16) r (cid:17)(cid:20) √ e p ε ( c − ε ) r + 2 e εr / ε ( c − ε ) (cid:21) ≤ √ εε (cid:20) √ e p ε ( c − ε ) + 1 c − ε (cid:21) e εr / and e εr / − ≥ (1 − e − ) e εr / . Combining the above two inequalities, we deduce that for all r > / √ ε ,2 √ e p ε ( c − ε ) + 2 e εr / ε ( c − ε ) r ≤ C r (1 + r ) (cid:0) e εr / − (cid:1) . (2.13)with C = 2 + √ εε (1 − e − ) (cid:20) √ e p ε ( c − ε ) + 1 c − ε (cid:21) . Having the two inequalities (2.12) and (2.13) in hand, and using (2.11), we complete theproof.We also need the following simple result.12 emma 2.5.
For all r > , e εr / ≥ ¯ C ψ ( r ) , (2.14) where ¯ C = min { , /ε } . Proof.
It is obvious that if r ≥
1, then ψ ( r ) = e εr / − r (1 + r ) ≤ e εr / . If 0 ≤ r ≤
1, then, using the fact that e r − ≤ re r , r ≥ , we have ψ ( r ) = e εr / − r (1 + r ) ≤ εre εr / ≤ εe εr / . The required assertion follows immediately from these two conclusions.Finally, we present a consequence of all the previous results in this part.
Corollary 2.6.
Let ψ be the function defined by (2.4) , and λ be the constant in Theorem . . Then, for all r > , ψ ′′ ( r ) + κ ( r ) ψ ′ ( r ) ≤ − λψ ( r ) . (2.15) Proof.
By (2.5), we deduce from Lemmas 2.4 and 2.5 that ψ ′′ ( r ) + κ ( r ) ψ ′ ( r ) ≤ − e εr / ≤ − ¯ C ψ ( r ) ≤ − ¯ C C ψ ( r ) ≤ − ¯ C C exp (cid:18) − c η − Z η κ + ( s ) d s (cid:19) ψ ( r ) , where the last inequality follows from (2.6). This, along with the definition of the constant λ in Theorem 1.3, yields the required assertion. We first give the
Proof of Theorem . . Recall that we assume the solution ( X t ) t ≥ to (1.1) has finite mo-ments of all orders. In particular, the left hand side of (1.7) is finite for any x , y ∈ R d and t > ψ be the function defined by (2.4). According to (2.3) and Itˆo’s formula, it holdsthat d ψ ( r t ) = 2 ψ ′ ( r t ) d W t + 2 (cid:20) ψ ′′ ( r t ) + ψ ′ ( r t )2 r t (cid:10) σ − Z t , σ − ( b ( X t ) − b ( Y t )) (cid:11)(cid:21) d t ≤ ψ ′ ( r t ) d W t + 2 (cid:2) ψ ′′ ( r t ) + κ ( r t ) ψ ′ ( r t ) (cid:3) d t. ψ ( r t ) ≤ ψ ′ ( r t ) d W t − λψ ( r t ) d t, (3.1)where λ is the constant in Theorem 1.3.For n ≥
1, define the stopping time T n = inf { t > r t / ∈ [1 /n, n ] } . Then, for t ≤ T n , the inequality (3.1) yieldsd (cid:2) e λt ψ ( r t ) (cid:3) ≤ e λt ψ ′ ( r t ) d W t . Therefore, e λ ( t ∧ T n ) ψ ( r t ∧ T n ) ≤ ψ ( r ) + 2 Z t ∧ T n e λs ψ ′ ( r s ) d W s . Taking expectation in the both hand sides of the inequality above leads to E (cid:2) e λ ( t ∧ T n ) ψ ( r t ∧ T n ) (cid:3) ≤ ψ ( r ) . Since the coupling process ( X t , Y t ) t ≥ is non-explosive, we have T n ↑ T a.s. as n → ∞ ,where T is the coupling time. Thus by Fatou’s lemma, letting n → ∞ in the aboveinequality gives us E (cid:2) e λ ( t ∧ T ) ψ ( r t ∧ T ) (cid:3) ≤ ψ ( r ) . (3.2)Thanks to our convention that Y t = X t for t ≥ T , we have r t = 0 for all t ≥ T . Therefore, E (cid:2) e λ ( t ∧ T ) ψ ( r t ∧ T ) (cid:3) = E (cid:2) e λt ψ ( r t ) { T >t } (cid:3) = E [ e λt ψ ( r t )] . Combining this with (3.2), we arrive at E ψ ( r t ) ≤ ψ ( r ) e − λt . That is, E ψ ( | σ − ( X t − Y t ) | ) ≤ ψ ( | σ − ( x − y ) | ) e − λt . (3.3)If | σ − ( x − y ) | ≤ η , then for any p ≥ t >
0, we deduce from (2.8), (3.3) and(2.7) that E (cid:0) | σ − ( X t − Y t ) | p (cid:1) ≤ C E ψ (cid:0) | σ − ( X t − Y t ) | (cid:1) ≤ C e − λt | σ − ( x − y ) | . (3.4)It is clear that C − | z | ≤ | σ − z | ≤ C | z | , z ∈ R d for some constant C >
1. Therefore, if | x − y | ≤ η/C , then for any p ≥ C > E | X t − Y t | p ≤ C e − λt | x − y | , t > , which implies that for any p ≥ x, y ∈ R d with | x − y | ≤ η/C , W p ( δ x P t , δ y P t ) ≤ C /p e − λt/p | x − y | /p . (3.5)14ow for any x, y ∈ R d with | x − y | > η/C , take n := (cid:2) C | x − y | /η (cid:3) + 1 ≥
2. We have n ≤ n − ≤ C | x − y | η ≤ n. (3.6)Set x i = x + i ( y − x ) /n for i = 0 , , . . . , n . Then x = x and x n = y ; moreover, (3.6)implies | x i − − x i | = | x − y | /n ≤ η/C for all i = 1 , , . . . , n . Therefore, for all p ≥
1, by(3.5), W p ( δ x P t , δ y P t ) ≤ n X i =1 W p ( δ x i − P t , δ x i P t ) ≤ C /p e − λt/p n X i =1 | x i − − x i | /p ≤ C /p e − λt/p n ( η/C ) /p ≤ C e − λt/p | x − y | , where in the last inequality we have used n ≤ C | x − y | /η . The proof of Theorem 1.3 iscompleted.Next, we turn to the Proof of Theorem . . Since we assume η >
0, condition (1.8) implies that (1.6) holdswith c replaced by cη θ − . Then, we can directly apply the assertion of Theorem 1.3 toconclude that there exists a constant λ (which is given in Theorem 1.3 with c replaced by cη θ − ) such that for all p ≥ x, y ∈ R d W p ( δ x P t , δ y P t ) ≤ Ce − λt/p ( | x − y | /p ∨ | x − y | ) . (3.7)To complete the proof, we only need to consider the case that x, y ∈ R d with | σ − ( x − y ) | > η and t > Y t = ( σ d B t + b ( Y t ) d t, ≤ t < T η ,σ (Id − e t e ∗ t ) d B t + b ( Y t ) d t, T η ≤ t < T, where T η = inf { t > | σ − ( X t − Y t ) | = η } and T = inf { t > X t = Y t } is the coupling time. For t ≥ T , we still set Y t = X t .Therefore, the difference process ( Z t ) t ≥ = ( X t − Y t ) t ≥ satisfiesd Z t = ( b ( X t ) − b ( Y t )) d t, t < T η . As a result, d | σ − Z t | = 2 (cid:10) σ − Z t , σ − ( b ( X t ) − b ( Y t )) (cid:11) d t. Still denoting by r t = | σ − Z t | , we getd r t ≤ κ ( r t ) d t ≤ − cr θt d t, t < T η , T η ≤ c (1 − θ ) (cid:0) | σ − ( x − y ) | − θ − η − θ (cid:1) ≤ η − θ c ( θ −
1) =: t (3.8)since θ > x, y ∈ R d with | σ − ( x − y ) | > η , p ≥ t > t , we have E | σ − ( X t − Y t ) | p = E (cid:2) E ( X Tη ,Y Tη ) | σ − ( X t − T η − Y t − T η ) | p (cid:3) ≤ C E (cid:2) | σ − ( X T η − Y T η ) | e − λ ( t − T η ) (cid:3) ≤ C ηe λt e − λt , where in the first inequality we have used (3.4), and the last inequality follows from (3.8).In particular, we have for all | σ − ( x − y ) | > η and t > t , E | X t − Y t | p ≤ C e − λt and so W p ( δ x P t , δ y P t ) ≤ C e − λt . Combining with all conclusions above, we complete the proof of Theorem 1.4.Finally, we present the
Proof of Proposition . . By the definition of κ , for all x, y ∈ R d with | σ − ( x − y ) | = r ,we have (cid:10) σ − ( x − y ) , σ − ( b ( x ) − b ( y )) (cid:11) | σ − ( x − y ) | ≤ − c. (3.9)For any fixed x, y ∈ R d with r = | σ − ( x − y ) | large enough, let n = [ r/r ] be the integerpart of r/r . Denote by x = x and x n +1 = y . We can find n points { x , x , . . . , x n } on the line segment linking x to y , such that | σ − ( x i − − x i ) | = r for i = 1 , , . . . , n and | σ − ( x n − x n +1 ) | = | σ − ( x n − y ) | ≤ r . Then (cid:10) σ − ( x − y ) , σ − ( b ( x ) − b ( y )) (cid:11) | σ − ( x − y ) | = n X i =1 (cid:10) σ − ( x − y ) , σ − ( b ( x i − ) − b ( x i )) (cid:11) | σ − ( x − y ) | + (cid:10) σ − ( x − y ) , σ − ( b ( x n ) − b ( y )) (cid:11) | σ − ( x − y ) | = n X i =1 (cid:10) σ − ( x i − − x i ) , σ − ( b ( x i − ) − b ( x i )) (cid:11) | σ − ( x i − − x i ) | + (cid:10) σ − ( x n − y ) , σ − ( b ( x n ) − b ( y )) (cid:11) | σ − ( x n − y ) | . By (3.9) and our assumption on b , (cid:10) σ − ( x − y ) , σ − ( b ( x ) − b ( y )) (cid:11) | σ − ( x − y ) | ≤ − cn + δ . Next, since r/r ≤ n + 1, we have − c rr ≥ − cn − c − cn ≤ c − cr/r . Therefore (cid:10) σ − ( x − y ) , σ − ( b ( x ) − b ( y )) (cid:11) | σ − ( x − y ) | ≤ δ + 2 c − cr r for all x, y ∈ R d with r = | σ − ( x − y ) | . As a result, the definition of κ ( r ) leads to κ ( r ) ≤ δ + c − cr r ≤ − c r r for all r ≥ r ( δ + 2 c ) /c , and so (1.5) holds with the new constant c/ r . Proof of Example . . (1) Since σ = Id, the supremum in the definition of κ ( r ) is takenover all x, y ∈ R d with | x − y | = r . Thus, to verify κ ( r ) ≥ r > x, y are restricted on one of thecoordinate axes with r = | x − y | large enough, that is, we can assume the dimension is 1.Then V ( x ) = − (1 + x ) δ/ , δ ∈ (0 , , x ∈ R . Now the result follows immediately from the fact that V ′ ( x ) is strictly increasing when | x | is large enough. Indeed, we have V ′′ ( x ) = δ (1 + x ) δ − [(1 − δ ) x − | x | ≥ (1 − δ ) − / .On the other hand, it is easy to see that with the choices of σ and b above, the semi-group ( P t ) t ≥ is symmetric with respect to the probability measure µ (d x ) = Z V e V ( x ) d x .Then, according to (the proof of) Corollary 1.10 below, we know that µ (d x ) fulfills thePoincar´e inequality (1.15) if (1.7) is satisfied with p = 1; however, this is impossible, seee.g. [27, Example 4.3.1 (3)].(2) In this case, V ( x ) = − (1 + | x | ) δ/ with δ ∈ [1 , V is strictlyconcave on R d . Indeed, for all 1 ≤ i, j ≤ d , ∂ V∂x i ∂x j ( x ) = δ (1 + | x | ) δ/ − (cid:2) (2 − δ ) x i x j − δ ij (1 + | x | ) (cid:3) , x ∈ R d . Therefore, for any z ∈ R d with z = 0, d X i,j =1 ∂ V∂x i ∂x j ( x ) z i z j = δ (1 + | x | ) δ/ − d X i,j =1 (cid:2) (2 − δ ) x i x j − δ ij (1 + | x | ) (cid:3) z i z j = δ (1 + | x | ) δ/ − (cid:2) (2 − δ ) h x, z i − (1 + | x | ) | z | (cid:3) ≤ − δ (1 + | x | ) δ/ − | z | < , which implies V ( x ) is strictly concave. Hence κ ( r ) ≤ r ≥
0. On the other hand,to show that κ ( r ) ≥ r ≥
0, as in the proof of (1), we simply look at the onedimensional case: V ( x ) = − (1 + x ) δ/ , δ ∈ [1 , , x ∈ R . r > x > κ ( r ) ≥
12 ( V ′ ( x + r ) − V ′ ( x )) ≥ r x ≤ s ≤ x + r V ′′ ( s )= r x ≤ s ≤ x + r − δ [( δ − s + 1](1 + s ) − δ/ , which implies that κ ( r ) ≥ r x →∞ inf x ≤ s ≤ x + r − δ [( δ − s + 1](1 + s ) − δ/ = 0 . Therefore, κ ( r ) = 0 for all r > V ( x ) = − (1 + | x | ) δ/ with δ ∈ [1 , κ ( r ) = 0 forall r ≥
0, thus for any x, y ∈ R d , h b ( x ) − b ( y ) , x − y i ≤ . Then, the assertion (1.11) immediately follows from (the proof of) Theorem 1.1, by simplyusing the synchronous coupling, see e.g. [2, p.2432].Finally we prove the algebraic convergence rate (1.12). For this, we mainly follow from[9, Section 5] or [5, Section 7.2] (see also [21, Theorem 1.1]). By (2.3),d r t ≤ W t , t < T, where T is the coupling time of the coupling process ( X t , Y t ) t ≥ , and ( W t ) t ≥ is the sameone-dimensional Brownian motion as in (2.2). Hence, r t ≤ | x − y | + 2 W t , t < T. Let τ z := inf { t > W t = z } . Then T ≤ τ −| x − y | / . Denote by W ∗ t = inf ≤ s ≤ t W s which has the same law as that of −| W t | . Thus for any t > P ( r t >
0) = P ( T > t ) ≤ P (cid:0) τ −| x − y | / > t (cid:1) = P (cid:0) W ∗ t > −| x − y | / (cid:1) = P (cid:0) − | W t | > −| x − y | / (cid:1) = 2 Z | x − y | / √ πt e − s / t d s ≤ | x − y |√ πt . f ∈ C b ( R d ) with k f k ∞ ≤
1, we have (cid:12)(cid:12) E ( f ( X t ) − f ( Y t )) (cid:12)(cid:12) = (cid:12)(cid:12) E (cid:2) ( f ( X t ) − f ( Y t )) { r t > } (cid:3)(cid:12)(cid:12) ≤ P ( r t > ≤ r πt | x − y | . In particular, by the definition of total variation distance, d T V ( δ x P t , δ y P t ) = sup (cid:8)(cid:12)(cid:12) E ( f ( X t ) − f ( Y t )) (cid:12)(cid:12) : f ∈ C b ( R d ) , k f k ∞ ≤ (cid:9) ≤ r πt | x − y | . The proof is complete.Finally we present the proofs of the two corollaries of Theorem 1.3.
Proof of Corollary . . Recall that for all p ∈ [1 , ∞ ), P p ( R d ) is the space of all probabilitymeasures ν on ( R d , B ( R d )) satisfying R | z | p ν (d z ) < ∞ . Note that, we assume the solution( X t ) t ≥ to (1.1) has finite moments of all orders. In particular, for any x ∈ R d , t > p ≥ δ x P t ∈ P p ( R d ). According to (1.6) and Theorem 1.3, there is a constant λ > p ∈ [1 , ∞ ) and any ν , ν ∈ P p ( R d ), W p ( ν P t , ν P t ) ≤ C p e − λt W φ p ( ν , ν ) , t > , (4.1)where C p is a positive constant. In particular, for any ν , ν ∈ P ( R d ), W ( ν P t , ν P t ) ≤ C e − λt W ( ν , ν ) , t > . Let t > C e − λ t <
1. Then, the map ν νP t is a contraction on thecomplete metric space ( P ( R d ) , W ). Hence, by the Banach fixed point theorem, thereexists a unique probability measure µ t such that µ t P t = µ t . Let µ := t − R t µ t P s d s .It is easy to see that µP t = µ for all t ∈ [0 , t ] and so for all t ∈ [0 , ∞ ). Therefore, µ is ainvariant probability for the semigroup ( P t ) t ≥ . Moreover, for any ν ∈ P ( R d ) and t > W ( νP t , µ ) = W ( νP t , µP t ) ≤ C e − λt W ( ν, µ ) . The inequality above also yields the uniqueness of the invariant measure.On the other hand, since b is locally bounded and satisfies (1.6), it follows from [20,Theorem 4.3 (ii)] that the unique invariant measure µ ∈ ∩ p ≥ P p ( R d ). Now, replacing ν with µ in (4.1), we arrive at W p ( ν P t , µ ) = W p ( ν P t , µP t ) ≤ C p e − λt W φ p ( ν , µ ) , t > ν ∈ P p ( R d ). The proof is completed. Proof of Corollary . . Let ( P t ) t ≥ be the semigroup generated by L = ∆ − ∇ U · ∇ .Then ( P t ) t ≥ is symmetric with respect to the probability measure µ . Since the Hessianmatrix Hess( U ) ≥ − K , we deduce that κ ( r ) ≤ Kr/ r >
0. Moreover, replacing b by −∇ U in the definition of κ ( r ), we deduce from (1.14) that κ ( L ) <
0. According to19roposition 1.5 and Theorem 1.3, we know that there exist two positive constants C , λ such that for all t > x , y ∈ R d , W ( δ x P t , δ y P t ) ≤ Ce − λt | x − y | . This implies that (e.g. see [6, Theorem 5.10] or [24, Theorem 5.10]) k P t f k Lip ≤ Ce − λt k f k Lip (4.2)holds for any t > f , where k f k Lip denotes theLipschitz semi-norm with respect to the Euclidean norm | · | .On the other hand, for any f ∈ C c ( R d ), by (4.2), we haveVar µ ( f ) = µ ( f ) − µ ( f ) = − Z R d Z ∞ ∂ t ( P t f ) d t d µ = − Z ∞ Z R d P t f LP t f d µ d t = Z ∞ Z R d |∇ P t f | d µ d t ≤ Z ∞ k P t f k d t ≤ C λ k f k . Replacing f with P t f in the equality above, we arrive atVar µ ( P t f ) ≤ C λ k P t f k ≤ C e − λt λ k f k . Next, we follow the proof of [23, Lemma 2.2] to show that the inequality above yieldsthe desired Poincar´e inequality. Indeed, for every f with µ ( f ) = 0 and µ ( f ) = 1. By thespectral representation theorem, we have k P t f k L ( µ ) = Z ∞ e − ut d( E u f, f ) ≥ (cid:20)Z ∞ e − us d( E u f, f ) (cid:21) t/s = k P s f k t/sL ( µ ) , t ≥ s, where in the inequality above we have used Jensen’s inequality. Thus, k P s f k L ( µ ) ≤ (cid:20) C e − λt λ k f k (cid:21) s/t ≤ C s/t (2 λ ) s/t k f k s/t Lip e − λs . Letting t → ∞ , we get that k P s f k L ( µ ) ≤ e − λs , which is equivalent to the desired Poincar´e inequality, see e.g. [27, Theorem 1.1.1]. Acknowledgements.
The authors are grateful to the referee for his valuable commentswhich helped to improve the quality of the paper.20 eferences [1] Bakry, D., Cattiaux, P. and Guillin, A.: Rate of convergence for ergodic continuousMarkov processes: Lyapunov versus Poincar´e.
J. Funct. Anal. (2008), no. 3,727–759.[2] Bolley, F., Gentil, I. and Guillin, A.: Convergence to equilibrium in Wassersteindistance for Fokker–Planck equations.
J. Funct. Anal. (2012), no. 8, 2430–2457.[3] Bolley, F., Gentil, I. and Guillin, A.: Uniform convergence to equilibrium for granularmedia.
Arch. Ration. Mech. Anal. (2013), no. 2, 429–445.[4] Butkovsky, O.: Subgeometric rates of convergence of Markov processes in Wassersteinmetric.
Ann. Appl. Probab. (2014), no. 2, 526–552.[5] Cattiaux, P. and Guillin, A.: Semi log-concave Markov diffusions, S´eminaire de Prob-abilit´es XLVI, Lecture Notes in Mathematics , Springer Verlag, 2015, 231–292.[6] Chen, M.-F.: From Markov Chains to Non-Equilibrium Particle Systems. Secondedition. World Scientific Publishing Co., Inc., River Edge, NJ , 2004.[7] Chen, M.-F.: Eigenvalues, Inequalties, and Ergodic Theory. Probability and its Ap-plications (New York).
Springer–Verlag London, Ltd., London , 2005.[8] Chen, M.-F.: Basic estimates of stability rate for one-dimensional diffusions, Proba-bility Approximations and Beyond, Lecture Notes in Statistics, 2012, 75–99.[9] Chen, M.-F. and Li, S.: Coupling methods for multidimensional diffusion processes.
Ann. Probab. (1989), no. 1, 151–177.[10] Chen, M.-F. and Wang, F.-Y.: Estimation of spectral gap for elliptic operators. Trans. Amer. Math. Soc. (1997), no. 3, 1239–1267.[11] D¨umbgen, L: Bounding standard Gaussian tail probabilities, arXiv:1012.2063v3.[12] Eberle, A.: Reflection coupling and Wasserstein contractivity without convexity.
C.R. Math. Acad. Sci. Paris (2011), no. 19–20, 1101–1104.[13] Eberle, A.: Reflection couplings and contraction rates for diffusions,arXiv:1305.1233v3.[14] Flandoli, F., Gubinelli, M. and Priola, E.: Well-posedness of the transport equationby stochastic perturbation.
Invent. Math. (2010), no. 1, 1–53.[15] Fort, G. and Roberts, G.O.: Subgeometric ergodicity of strong Markov processes.
Ann. Appl. Probab. (2005), no. 2, 1565–1589.[16] Hairer, M. and Mattingly, J.C.: Spectral gaps in Wasserstein distances and the 2Dstochastic Navier–Stokes equations. Ann. Probab. (2008), no. 6, 2050–2091.2117] Hairer, M., Mattingly, J.C. and Scheutzow, M.: Asymptotic coupling and a generalform of Harris’s theorem with applications to stochastic delay equations. Probab.Theory Related Fields (2011), no. 1–2, 223–259.[18] Lindvall, T. and Rogers, L.: Coupling of multidimensional diffusions by reflection.
Ann. Probab. (1986), no. 3, 860–872.[19] Meyn, S.P. and Tweedie, R.L.: Markov Chains and Stochastic Stability. Commu-nications and Control Engineering Series. Springer–Verlag London, Ltd., London ,1993.[20] Meyn, S.P. and Tweedie, R.L.: Stability of Markovian processes (III): Foster-Lyapunov criteria for continuous-time processes.
Adv. Appl. Probab. (1993), no.3, 518–548.[21] Priola, E. and Wang, F.-Y.: Gradient estimates for diffusion semigroups with singularcoefficients. J. Funct. Anal. (2006), no. 1, 244–264.[22] von Renesse, M. and Sturm, K.: Transport inequalities, gradient estimates, entropy,and Ricci curvature.
Comm. Pure Appl. Math. (2005), no. 7, 923–940.[23] R¨ockner, M. and Wang, F.-Y.: Weak Poincar´e inequalities and L -convergence ratesof Markov smigroups. J. Funct. Anal. (2001), no. 2, 564–603.[24] Villani, C.: Optimal Transport: Old and New. Grundlehren der MathematischenWissenschaften [Fundamental Principles of Mathematical Sciences], 338.
Springer–Verlag, Berlin , 2009.[25] Wang, F.-Y.: Existence of the spectral gap for elliptic operators.
Ark. Math. (1999), no. 2, 395–407.[26] Wang, F.-Y.: Functional inequalities for empty essential spectrum. J. Funct. Anal. (2000), no. 1, 219–245.[27] Wang, F.-Y.: Functional Inequalities, Markov Semigroups and Spectral Theory.