[PDF] Approximation of heavy-tailed distributions via stable-driven SDEs

Abstract

Constructions of numerous approximate sampling algorithms are based on the well-known fact that certain Gibbs measures are stationary distributions of ergodic stochastic differential equations (SDEs) driven by the Brownian motion. However, for some heavy-tailed distributions it can be shown that the associated SDE is not exponentially ergodic and that related sampling algorithms may perform poorly. A natural idea that has recently been explored in the machine learning literature in this context is to make use of stochastic processes with heavy tails instead of the Brownian motion. In this paper we provide a rigorous theoretical framework for studying the problem of approximating heavy-tailed distributions via ergodic SDEs driven by symmetric (rotationally invariant) α -stable processes.

Full PDF

aa r X i v : . [ m a t h . P R ] J u l APPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS VIA STABLE-DRIVENSDES

LU-JING HUANG MATEUSZ B. MAJKA JIAN WANG

Abstract.

Constructions of numerous approximate sampling algorithms are based on the well-knownfact that certain Gibbs measures are stationary distributions of ergodic stochastic diﬀerential equations(SDEs) driven by the Brownian motion. However, for some heavy-tailed distributions it can be shownthat the associated SDE is not exponentially ergodic and that related sampling algorithms may performpoorly. A natural idea that has recently been explored in the machine learning literature in this contextis to make use of stochastic processes with heavy tails instead of the Brownian motion. In this paperwe provide a rigorous theoretical framework for studying the problem of approximating heavy-taileddistributions via ergodic SDEs driven by symmetric (rotationally invariant) α -stable processes. Keywords: stochastic diﬀerential equations, symmetric α -stable processes, invariant measures, heavy-tailed distributions, approximate sampling, fractional Langevin Monte Carlo. MSC 2020: Introduction

Suppose we are given a probability distribution µ on R d deﬁned via(1.1) µ ( dx ) = Z − exp ( − V ( x )) dx , where V : R d → R is the potential, and Z := R R d exp ( − V ( x )) dx is the normalizing constant. Thegoal in approximate sampling is to generate a sequence of probability measures ( µ k ) k ≥ such that forsuﬃciently large k the measure µ k constitutes a good approximation of µ . This can be achieved e.g.by utilizing a stochastic process with the unique stationary distribution µ . If we can show that thisprocess is exponentially ergodic, then we can use it to construct an algorithm for approximate samplingfrom µ that, under some assumptions on V in (1.1), converges exponentially fast regardless of its initialcondition.A commonly used example of such a process is the solution ( X t ) t ≥ to the (overdamped) LangevinSDE(1.2) dX t = −∇ V ( X t ) dt + √ dB t , where ( B t ) t ≥ is the standard Brownian motion in R d . If the potential V is suﬃciently regular, itcan be easily shown that µ given by (1.1) is a stationary distribution of ( X t ) t ≥ . Moreover, there aremany results on the exponential ergodicity of (1.2) under relatively weak dissipativity conditions on V , see e.g. [17] and the references therein for approaches based on Lyapunov-type drift conditions,the monographs [1, 4, 40] for methods based on functional inequalities, and [4, 40] for probabilisticcoupling techniques (in particular, [12, 13] for a recent study on this topic).There are numerous sampling algorithms in the literature that are based on Euler discretizationsof (1.2), cf. [14, 26] and the references therein. The analysis of their performance is often carried outby bounding the discretization error between the Euler scheme and the SDE, and then by directlyemploying ergodicity results for SDEs, see e.g. [8, 9, 11, 30]. Hence the analysis of convergence of theSDE is an important ﬁrst step towards evaluating performance of such algorithms, and one usuallycannot expect fast convergence of the algorithm without fast convergence of the associated SDE, see[34] (with some possible exceptions discussed in [15]). L-J Huang:

College of Mathematics and Informatics, Fujian Normal University, 350007 Fuzhou, P.R. China. [email protected] . M. B. Majka:

School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, EH14 4AS, UK. [email protected] . J. Wang:

College of Mathematics and Informatics & Fujian Key Laboratory of Mathematical Analysis and Applic-ations (FJKLMAA) & Center for Applied Mathematics of Fujian Province (FJNU), Fujian Normal University, 350007Fuzhou, P.R. China. [email protected] . However, in [34] (see Theorem 2.4 and Section 2.3 therein) it has been shown that the solution to(1.2) may not be exponentially ergodic if the distribution µ deﬁned in (1.1) is heavy-tailed. Indeed, itis known that the Langevin SDE (1.2) has the generator Lf := ∆ f − ∇ V · ∇ f which is a symmetricoperator on L ( R d ; µ ) , and that the Poincaré inequality for L (which is equivalent to the exponentialergodicity of the SDE (1.2)) implies exponential tails of µ ; see [40, Theorems 1.1.1 and 1.2.5]. However,for heavy-tailed µ , one can only expect weak-Poincaré inequalities, which indicates that the solution to(1.2) only converges with a polynomial or a subexponential rate; see [40, Chapter 4] for more details.A very natural question to ask in this context is whether instead of (1.2) one could use SDEs drivenby other stochastic processes, with tails better suited for the task of approximating heavy-tailed µ .The ﬁrst steps in that direction have been taken in [36, 31] (see also [37, 44] for further extensions).The idea there is based on the fact that µ given by (1.1) can be shown to be a stationary distributionof(1.3) dX t = b ( X t ) dt + dZ t , where ( Z t ) t ≥ is the symmetric (rotationally invariant) α -stable process in R d with d ≥ and α ∈ (1 , ,and the drift b ( x ) is given by(1.4) b ( x ) = − C d, − α e V ( x ) Z R d e − V ( y ) ∇ V ( y ) | x − y | d − (2 − α ) dy , where the potential V ∈ C ( R d ) is such that e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) , and C d,α := Γ(( d − α ) / / (2 α π d/ Γ( α/ . Hence, if the SDE (1.3) is exponentially ergodic, one could use an algorithmbased on its discretization to obtain a new alternative way of approximating µ (possibly faster thanalgorithms based on (1.2) if µ is heavy-tailed). The authors of [36, 31] called their approach FractionalLangevin Monte Carlo due to a possible interpretation of the drift (1.4) in terms of the Riesz potential,which is an inverse operator to the fractional Laplacian, see e.g., [21, Section 2.7] and the referencestherein.There are, however, several challenges to this approach, related both to verifying theoretical prop-erties of the SDE (1.3) and to ﬁnding its appropriate discrete-time counterpart for use in simulations.In the present paper we focus on the former, in response to some questions that were left unansweredin [36, 31]. Indeed, the exponential ergodicity of (1.3) has been checked in [36, 31] only under somevery special and diﬃcult to verify assumptions. As we will see in Section 2, the drift b ( x ) deﬁned by(1.4) seems to be in general only locally (2 − α ) -Hölder continuous, while in the setting of [36, 31] itis assumed to be Lipschitz continuous and diﬀerentiable. Moreover, the authors of [31] assume that b ( x ) satisﬁes a contractivity at inﬁnity condition h b ( x ) − b ( y ) , x − y i ≤ − K | x − y | for all x , y ∈ R d such that | x − y | > R , with some constants K , R > (cf. [31, Assumption (H5) and Proposition 1]),which also seems to be unveriﬁable in the general case. The lack of all these properties of b ( x ) makes itimpossible to prove the exponential ergodicity of (1.3) by utilizing results from the existing literature(see e.g. [22] for some recent developments in this topic). Furthermore, because of the unusual form of(1.4), it is not even immediately clear whether (1.3) has a unique, non-explosive strong solution, whichalso has not been veriﬁed in [36, 31]. Finally, due to non-diﬀerentiability of b ( x ) , the proof that µ given by (1.1) is the unique invariant probability measure for (1.3) cannot be as straightforward as in[36, Theorem 1.1] or [44, Theorem 1.1]. In the present paper we ﬁll all these gaps by carefully derivingappropriate bounds on (1.4), and by proving all the properties of (1.3) mentioned above in a rigorousway. In particular, we study the drift term b ( x ) deﬁned by (1.4) for all d > − α (not only for the caseof d ≥ and α ∈ (1 , ), and we deﬁne a new drift term to treat the case of d ≤ − α . To this end,we will use the notion of the fractional Laplace operator (see e.g. [2, 3, 21] and the references therein),which is deﬁned for all f ∈ C b ( R d ) by − ( − ∆) α/ f ( x ) := c d,α lim ε → Z {| y − x | >ε } f ( y ) − f ( x ) | y − x | d + α dy, where c d,α := 2 α Γ(( d + α ) / / ( π d/ | Γ( − α/ | ) = α α − Γ(( d + α ) / / ( π d/ Γ(1 − α/ . See e.g. [2,formulas (1.3) and (1.35)] or [21, Deﬁnition 2.5], and note that c d,α = | C d, − α | . Then, in order to coverthe case of d ≤ − α , i.e., d = 1 and α ∈ (0 , , we will work with the drift(1.5) b ( x ) = − e V ( x ) Z x −∞ ( − ∆) α/ e − V ( u ) du, x ∈ R . PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 3

Everywhere in this paper, we will be concerned with the SDE (1.3) driven by a symmetric α -stableprocess ( Z t ) t ≥ on R d with α ∈ (0 , , where the drift term b ( x ) is deﬁned by (1.4) when d > − α , andby (1.5) when d ≤ − α . We will refer to b ( x ) as the fractional drift in both cases. We will comment onsome possible approaches to the problem of discretization of (1.3) in Remark 1.5. However, our focusin this paper is the analysis of the SDE (1.3), and we leave a more detailed discussion of discrete-timealgorithms for future work.For our main result, we require that the following assumption on the potential V is satisﬁed. Assumption (A) V is a radial function on R d ( and hence, by a slight abuse of notation, we write V ( x ) = V ( | x | ) for all x ∈ R d ) such that (1.6) lim sup r →∞ [ e − V ( r ) r d + α ] < ∞ , and one of the following two conditions is satisﬁed: (i) when d > − α , V ∈ C ( R d ) , e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) , (1.7) r := sup { r > V ′ ( r ) ≤ } < ∞ , and (1.8) Z ∞ e − V ( r ) | V ′ ( r ) | r d dr < ∞ , Z ∞ e − V ( r ) V ′ ( r ) r d dr > . (ii) when d ≤ − α , V ∈ C ( R ) , e − V ∈ L ( R ; dx ) ∩ C b ( R ) , lim sup x →∞ [ x e − V ( x ) | V ′ ( x ) − V ′′ ( x ) | ] < ∞ , and lim inf x →∞ [ x e − V ( x ) ( V ′ ( x ) − V ′′ ( x ))] ≥ . We have the following result.

Theorem 1.1.

Under Assumption (A) , the

SDE (1.3) with the fractional drift b ( x ) given by (1.4) when d > − α , and by (1.5) when d ≤ − α , has a unique non-explosive strong solution X := ( X t ) t ≥ such that the process X is exponentially ergodic with the unique invariant probability measure µ givenby (1.1) . More explicitly, for any β ∈ [0 , α ) , there is a constant λ > such that for any X ∼ µ withﬁnite β -moment and any t > , k L ( X t ) − µ k Var ,V := sup | f |≤ V (cid:12)(cid:12)(cid:12)(cid:12)Z R d E x f ( X t ) µ ( dx ) − µ ( f ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C ( µ ) e − λt , where V ( x ) = (1 + | x | ) β , C ( µ ) is a positive constant, and L ( X t ) denotes the distribution of X t forevery t > . Note that the weighted total variation distance k · k

Var ,V from Theorem 1.1 dominates both thestandard total variation and the L β -Wasserstein distance (see e.g. [13, Remark 2.3]). Therefore wehave the following immediate corollary. Corollary 1.2.

Under Assumption (A) , the process X := ( X t ) t ≥ solving (1.3) is exponentially ergodicwith the unique invariant probability measure µ given by (1.1) in the total variation norm for all d ≥ and α ∈ (0 , , and in the L -Wasserstein distance when d ≥ and α ∈ (1 , . Let us make some comments on Assumption (A) and Theorem 1.1, as well as the fractional driftsdeﬁned by (1.4) when d > − α and by (1.5) when d ≤ − α . The most important conclusion fromTheorem 1.1 is that the SDE (1.3) with α -stable noise is exponentially ergodic for a large class ofpotentials, for which the corresponding SDE (1.2) with Brownian noise is not. Remark 1.3.

Theorem 1.1 is concerned with rotationally symmetric measures µ (since V is a radialfunction on R d ). Condition (1.6) is a relatively weak condition that we need in order to prove theexponential ergodicity of the process X (indeed, it seems to be optimal as indicated by the exponentialergodicity for Ornstein–Uhlenbeck processes driven by symmetric α -stable processes, cf. [24, 41]). It issatisﬁed, for example, by all potentials V ( x ) = (1 + | x | ) β for any β > , and by V ( x ) = log β (1 + | x | ) for any β > , as well as by V ( x ) = β log(1 + | x | ) for any β ≥ ( d + α ) / . We remark that it hasbeen shown in [34] that for the latter two large classes of potentials, as well as for the potentials V ( x ) = (1 + | x | ) β with β < / , the SDE (1.2) driven by the Brownian motion is not exponentially LU-JING HUANG MATEUSZ B. MAJKA JIAN WANG ergodic. It is also easy to see that assumption (ii) for d ≤ − α , as well as the ﬁrst condition in(1.8) for d > − α , are satisﬁed for all the potentials above. Moreover, when d > − α , we alsorequire condition (1.7), which means that the measure µ is log-concave at inﬁnity. The most restrictivecondition is the second condition in (1.8), which is essentially an assumption about suﬃciently heavytails of µ in relation to its mass in the region where V ′ ≤ , i.e., where µ is not log-concave. In otherwords, if r is not too large and if µ has heavy tails, then R ∞ r e − V ( r ) V ′ ( r ) r d dr can be large enough sothat the second condition in (1.8) holds. Obviously, if µ is log-concave everywhere, then the secondcondition in (1.8) is always satisﬁed. Remark 1.4.

Let us informally discuss how the form of the fractional drifts given by (1.4) and (1.5)is motivated by the requirement that the associated SDE (1.3) has an invariant probability measuregiven by (1.1). Suppose ﬁrst that d > − α . Note that the generator of the process X solving SDE(1.3) is Lf = − ( − ∆) α/ f + b · ∇ f . Hence, informally, its dual operator enjoys the expression L ∗ f = − ( − ∆) α/ f + div( bf ) ; see Remark 3.4. Roughly speaking, the density function e − V ( x ) of the invariantprobability measure (1.1) is the fundamental solution to L ∗ u = 0 ; that is, div( be − V ) = − ( − ∆) α/ e − V .If we write − ( − ∆) α/ e − V = ∆[( − ∆) − (1 − α/ e − V ] = div ∇ [( − ∆) − (1 − α/ e − V ] , then a right choice forthe drift can be b ( x ) = e V ( x ) ∇ ( − ∆) − (1 − α/ e − V ( x ) , which is equivalent to (1.4); see the discussion in thebeginning of Subsection 2.1. When d ≤ − α , ( − ∆) − (1 − α/ is not well deﬁned, but we can informallywrite ∇ ( − ∆) − (1 − α/ = ∇ (∆) − [ − ( − ∆) α/ ] and understand ∇ (∆) − as an integral operator. Withthis in mind, we can see the intuition behind the formula for (1.5). A fully rigorous proof that theprobability measure given by (1.1) is invariant for (1.3) will be given in Proposition 3.3. Remark 1.5.

As we will see in the sequel, the drift term b ( x ) deﬁned by (1.4) when d ≥ and α ∈ (0 , or by (1.5) when d ≤ − α , belongs to C ( R d ) ; however, when d ≥ − α and α ∈ [1 , , b ( x ) deﬁned by (1.4) seems to be only Hölder continuous; cf. Lemma 2.2. This may lead to some issueswhen one wants to consider discretizations of (1.3) in the latter case. When d = 1 and α ∈ (1 , , in[36] some numerical experiments were carried out by employing an Euler discretization of (1.3) thatinvolved approximating the drift (1.4) via a series representation from [32], see Section 4 and formula(7) in [36]. However, in order to rigorously analyse convergence of discretized (1.3) in this case, onecannot rely on classical results for Euler discretizations that utilize the Lipschitz property of the drift,or even results based on taming such as [10, 20], where the one-sided Lipschitz property is required.Nevertheless, there has been some recent work [18, 29] on discretizations of Lévy-driven SDEs withbounded Hölder continuous drifts that could be applicable in our setting after an extension to theunbounded case (cf. Lemma 2.2 below for a proof of the local Hölder property of b ( x ) given by (1.4)).This, however, falls beyond the scope of the present paper and will be considered in a future project.The remaining part of this paper is organised as follows. In Section 2, we obtain some explicitestimates for the fractional drift given by (1.4) when d > − α and by (1.5) when d ≤ − α , underAssumption (A). In particular, under a mild additional assumption, we get that h b ( x ) , x i ≍ − e V ( x ) | x | d + α | x | for | x | large enough. We also claim that the fractional drift term is locally (2 − α ) -Hölder continuouswhen α ∈ (1 , , locally (1 − ε ) -Hölder continuous for any ε > when α = 1 , and belongs to C ( R d ) when α ∈ (0 , . Section 3 is devoted to properties of the SDE (1.3) with the fractional drift terms.We prove that the SDE (1.3) with these drifts has a unique strong solution, and show that µ given by(1.1) is the unique invariant measure for (1.3). Finally, we conclude by proving Theorem 1.1.2. Properties of the fractional drift

The case of d > − α . In this subsection, we always assume that d ≥ and α ∈ (0 , with d > − α . Let V ∈ C ( R d ) such that e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) . We ﬁrst note that for the driftterm b ( x ) deﬁned by (1.4), it holds that(2.1) b ( x ) = e V ( x ) ∇ (( − ∆) − (1 − α/ e − V )( x ) , where ( − ∆) − (1 − α/ is the Green operator corresponding to the symmetric (rotationally invariant) (2 − α ) -stable process on R d , cf. [2, 21] and the references therein. Since d > − α , the symmetric (2 − α ) -stable process is transient on R d , and so ( − ∆) − (1 − α/ is well deﬁned; moreover, ( − ∆) − (1 − α/ f ( x ) = C d, − α Z R d f ( y ) | x − y | d − (2 − α ) dy, f ∈ L ( R d ; dx ) , PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 5 see [21, Deﬁnition 2.11]. Indeed, because V ∈ C ( R d ) and e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) , by thedominated convergence theorem, for any x ∈ R d ,(2.2) ∇ (( − ∆) − (1 − α/ e − V )( x )= C d, − α ∇ (cid:20) Z R d e − V ( y ) | · − y | d − (2 − α ) dy (cid:21) ( x ) = C d, − α ∇ (cid:20) Z R d e − V ( ·− z ) | z | d − (2 − α ) dz (cid:21) ( x )= − C d, − α Z R d e − V ( x − z ) ∇ V ( x − z ) | z | d − (2 − α ) dz = − C d, − α Z R d e − V ( y ) ∇ V ( y ) | x − y | d − (2 − α ) dy. Remark 2.1.

When α = 2 , by (2.1) the drift term b ( x ) becomes −∇ V ( x ) . Moreover, Z t becomes √ B t , and hence the SDE (1.3) is reduced to (1.2).Recall that for any θ ≥ , the Hölder-Zygmund space C θb ( R d ) is deﬁned by C θb ( R d ) = ( f ∈ C b ( R d ) : k f k C θb ( R d ) := k f k ∞ + sup x ∈ R d ,h =0 ∆ [ θ ]+1 h f ( x ) | h | θ < ∞ ) , where ∆ h f ( x ) = f ( x + h ) − f ( x ) , ∆ jh f ( x ) = ∆ h (∆ j − h f )( x ) , j ≥ . Note that when θ ∈ (0 , ∞ ) \ Z + , C θb ( R d ) coincides with the classical Hölder space C θb ( R d ) equippedwith the norm k f k C θb ( R d ) := k f k ∞ + [ θ ] X j =1 X β ∈ Z d and | β | = j k ∂ β f k ∞ + max β ∈ Z d and | β | =[ θ ] sup x = y | ∂ β f ( x ) − ∂ β f ( y ) || x − y | θ − [ θ ] , where Z + = { , , · · · } , Z = Z + ∪ { } , | β | = | β | + · · · + | β d | for β = ( β , β , · · · , β d ) ; see [39, Theorem1 in Section 2.7.2, p. 201]. However, when θ ∈ Z + , the Hölder-Zygmund space C θb ( R d ) is strictly largerthan C θb ( R d ) . In particular, when θ = 1 , C b ( R d ) is strictly larger than the space of bounded Lipschitzcontinuous functions (see [38, Example in Section 4.3.1, p. 148]), which is, in turn, strictly larger than C b ( R d ) . Note also that C b ( R d ) ⊂ C − εb ( R d ) for any ε > .We have the following statement. Lemma 2.2.

Assume that V ∈ C ( R d ) such that e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) . Then, the driftterm b ( x ) deﬁned by (1.4) is locally (2 − α ) -Hölder continuous when α ∈ (1 , , is locally (1 − ε ) -Höldercontinuous for any ε > when α = 1 , and is in C ( R d ) when α ∈ (0 , .Proof. Suppose ﬁrst that α ∈ (1 , . By V ∈ C ( R d ) and e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) , it is easy tosee that b ( x ) deﬁned by (1.4) is locally bounded. Since V ∈ C ( R d ) , from (2.2), to prove the desiredassertion it suﬃces to verify that ( − ∆) − (1 − α/ f ∈ C − αb ( R d ) for all f ∈ L ( R d ; dx ) ∩ B b ( R d ) . Indeed,let p ( t, x, y ) = p ( t, x − y ) and ( P t ) t ≥ be the transition density function and the semigroup of the (2 − α ) -symmetric stable process, respectively. It is known that there is a constant c > such that k∇ P t f k ∞ ≤ c t − / (2 − α ) k f k ∞ , t > , f ∈ B b ( R d ) , which is equivalent to saying that there is a constant c > such that for all t > ,(2.3) Z R d |∇ p ( t, · )( x ) | dx ≤ c t − / (2 − α ) ; see [35, Example 1.5 and Theorem 3.2] or [18, Lemma 4.1 and the proof of Corollary 2.5]. Recall that,for any f ∈ L ( R d ; dx ) ∩ B b ( R d ) , ( − ∆) − (1 − α/ f ( x ) = C d, − α Z R d f ( y ) | x − y | d − (2 − α ) dy = Z R d f ( y ) Z ∞ p ( t, x − y ) dt dy = Z ∞ Z R d f ( y ) p ( t, x − y ) dy dt. Thus, when α ∈ (1 , , for any f ∈ L ( R d ; dx ) ∩ B b ( R d ) and x, h ∈ R d , | ( − ∆) − (1 − α/ f ( x ) − ( − ∆) − (1 − α/ f ( x + h ) |≤ k f k ∞ Z ∞ Z R d | p ( t, x − y ) − p ( t, x + h − y ) | dy dt LU-JING HUANG MATEUSZ B. MAJKA JIAN WANG ≤ k f k ∞ Z | h | − α Z R d ( p ( t, x − y ) + p ( t, x + h − y )) dy dt + k f k ∞ | h | Z ∞| h | − α Z Z R d |∇ p ( t, x + ηh − y ) | dy dη dt ≤ k f k ∞ | h | − α + c k f k ∞ | h | Z ∞| h | (2 − α ) t − / (2 − α ) dt ≤ c k f k ∞ | h | − α , where in the last inequality we used the fact that − α ∈ (0 , due to α ∈ (1 , . In particular, forany f ∈ L ( R d ; dx ) ∩ B b ( R d ) , ( − ∆) − (1 − α/ f ∈ C − αb ( R d ) = C − αb ( R d ) .Next, we consider the case of α ∈ (0 , . According to (2.3) and [18, Lemma 4.1(3)] as well as theiterating procedure, there is a constant c > such that for all t > , Z R d |∇ p ( t, · )( x ) | dx ≤ c t − / (2 − α ) . Then, for any f ∈ L ( R d ; dx ) ∩ B b ( R d ) and x, h ∈ R d , | ∆ h ( − ∆) − (1 − α/ f ( x ) | = | ( − ∆) − (1 − α/ f ( x + 2 h ) − − ∆) − (1 − α/ f ( x + h ) + ( − ∆) − (1 − α/ f ( x ) |≤ k f k ∞ Z ∞ Z R d | p ( t, x + 2 h − y ) − p ( t, x + h − y ) + p ( t, x − y ) | dy dt ≤ k f k ∞ Z | h | − α Z R d ( p ( t, x + 2 h − y ) + 2 p ( t, x + h − y ) + p ( t, x − y )) dy dt + c k f k ∞ | h | Z ∞| h | − α Z (1 − η ) Z R d |∇ p ( t, x + ηh − y ) | dy dη dt ≤ k f k ∞ | h | − α + c k f k ∞ | h | Z ∞| h | − α t − / (2 − α ) dt ≤ c k f k ∞ | h | − α , where in the second inequality we used the Taylor formula. Hence, ( − ∆) − (1 − α/ f ∈ C − αb ( R d ) ,thanks to the fact that ( − ∆) − (1 − α/ f is bounded for any f ∈ L ( R d ; dx ) ∩ B b ( R d ) . The proof iscompleted. (cid:3) Remark 2.3.

From expression (1.4), one may expect that the drift term b ( x ) does not belong to C ( R d ) when α ∈ (1 , . Informally, since the integral Z R d | f ( y ) || x − y | d − (2 − α )+1 dy may diverge for f ∈ L ( R d ; dx ) ∩ B b ( R d ) with α ∈ (1 , , we cannot take the derivative inside theintegral in (1.4).In the rest of this part, we will further assume that V is a radial function. We will present someexplicit estimates for the drift term b ( x ) deﬁned by (1.4), i.e., b ( x ) = − C d, − α e V ( x ) Z R d e − V ( y ) ∇ V ( y ) | x − y | d − (2 − α ) dy = − C d, − α e V ( | x | ) Z R d e − V ( | y | ) V ′ ( | y | ) y | y || x − y | d − (2 − α ) dy . In particular, it holds that b ( x ) = − b ( − x ) and b (0) = 0 , i.e., b ( x ) is an anti-symmetric function on R d .With a slight abuse of notation, in the following we write V ( x ) = V ( | x | ) for all x ∈ R d . Lemma 2.4.

Let V ( x ) = V ( | x | ) for all x ∈ R d such that V ∈ C ( R d ) and e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) . Suppose that r := sup { r > V ′ ( r ) ≤ } < ∞ , (2.4) Z ∞ e − V ( r ) | V ′ ( r ) | r d dr < ∞ PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 7 and (2.5) Z ∞ e − V ( r ) V ′ ( r ) r d dr > . Then, there exist constants c , c > and r ≥ such that for all x ∈ R d , (2.6) h x, b ( x ) i ≤ c {| x |≤ r } − c e V ( | x | ) (1 + | x | ) d + α | x | {| x | >r } . Proof.

For any x ∈ R d , by changing the variables, we ﬁnd that C − d, − α h x, b ( x ) i = − e V ( | x | ) Z R d e − V ( | y | ) V ′ ( | y | ) h y, x i| y || x − y | d − (2 − α ) dy = − e V ( | x | ) Z {h x,y i≥ } e − V ( | y | ) V ′ ( | y | ) h y, x i| y | (cid:18) | x − y | d − (2 − α ) − | x + y | d − (2 − α ) (cid:19) dy = − e V ( | x | ) Z { V ′ ( | y | ) ≤ , h x,y i≥ } e − V ( | y | ) V ′ ( | y | ) h y, x i| y | (cid:18) | x − y | d − (2 − α ) − | x + y | d − (2 − α ) (cid:19) dy − e V ( | x | ) Z { V ′ ( | y | ) ≥ , h x,y i≥ } e − V ( | y | ) V ′ ( | y | ) h y, x i| y | (cid:18) | x − y | d − (2 − α ) − | x + y | d − (2 − α ) (cid:19) dy = : J + J . Note that, for any x, y ∈ R d , we have | x − y | d − (2 − α ) − | x + y | d − (2 − α ) = (cid:0) | x | + | y | − h x, y i (cid:1) − ( d + α − / − (cid:0) | x | + | y | + 2 h x, y i (cid:1) − ( d + α − / , and that for the function ψ ( r ) := r − ( d + α − / , we have ψ ( r − δ ) − ψ ( r + δ ) ≤ − δψ ′ ( r − δ ) , ≤ δ ≤ r, thanks to ψ ′′ ≥ and the mean value theorem. Hence, taking r = | x | + | y | and δ = 2 h x, y i ≥ , weget J ≤ − d + α − e V ( | x | ) Z { V ′ ( | y | ) ≤ , h x,y i≥ } e − V ( | y | ) V ′ ( | y | ) h y, x i | y | | x − y | d + α dy ≤ − d + α − e V ( | x | ) Z { V ′ ( | y | ) ≤ , h x,y i≥ } e − V ( | y | ) V ′ ( | y | ) h y, x i | y | || x | − | y || d + α dy = − ( d + α − e V ( | x | ) Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) h y, x i | y | || x | − | y || d + α dy = − ( d + α − e V ( | x | ) | x | Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) y | y | || x | − | y || d + α dy = − d + α − d e V ( | x | ) | x | Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y ||| x | − | y || d + α dy. Since r := sup { r > V ′ ( r ) ≤ } < ∞ , for any C > and any x ∈ R d with | x | ≥ Cr , we have J ≤ − d + α − d (1 − C − ) − d − α e V ( | x | ) | x | | x | d + α Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy. On the other hand, for any x, y ∈ R d with h x, y i ≥ ,(2.7) | x − y | d − (2 − α ) − | x + y | d − (2 − α ) ≥ d + α − | x | + | y | ) − ( d + α ) / h x, y i . Here we used the fact that for the function ψ ( r ) = r − ( d + α − / , it holds that ψ ( r − δ ) − ψ ( r + δ ) ≥ − ψ ′ ( r ) δ, ≤ δ ≤ r, LU-JING HUANG MATEUSZ B. MAJKA JIAN WANG thanks to the mean value theorem again and the fact that ψ ′′′ ≤ . Combining (2.7) with the fact V ′ ( r ) ≥ for all r ≥ r , we get that for any a > and any x ∈ R d , J ≤ − d + α − e V ( | x | ) Z { V ′ ( | y | ) ≥ , h y,x i≥ } e − V ( | y | ) V ′ ( | y | ) h x, y i | y | ( | x | + | y | ) ( d + α ) / dy = − ( d + α − e V ( | x | ) Z { V ′ ( | y | ) ≥ } e − V ( | y | ) V ′ ( | y | ) h x, y i | y | ( | x | + | y | ) ( d + α ) / dy = − ( d + α − e V ( | x | ) Z { V ′ ( | y | ) ≥ } e − V ( | y | ) V ′ ( | y | ) | x | y | y | ( | x | + | y | ) ( d + α ) / dy = − d + α − d e V ( | x | ) | x | Z { V ′ ( | y | ) ≥ } e − V ( | y | ) V ′ ( | y | ) | y | ( | x | + | y | ) ( d + α ) / dy ≤ − d + α − d e V ( | x | ) | x | Z { V ′ ( | y | ) ≥ , | y |≤ a | x |} e − V ( | y | ) V ′ ( | y | ) | y | ( | x | + | y | ) ( d + α ) / dy ≤ − d + α − d (1 + a ) − ( d + α ) / e V ( | x | ) | x | | x | d + α Z { V ′ ( | y | ) ≥ , | y |≤ a | x |} e − V ( | y | ) V ′ ( | y | ) | y | dy. According to both estimates above for J and J , we ﬁnd that for any x ∈ R d with | x | ≥ Cr , C − d, − α h x, b ( x ) i ≤ − (1 − C − ) − d − α (cid:16) d + α − d (cid:17) e V ( | x | ) | x | | x | d + α × (cid:20) (1 + a ) − ( d + α ) / (1 − C − ) − d − α Z { V ′ ( | y | ) ≥ , | y |≤ a | x |} e − V ( | y | ) V ′ ( | y | ) | y | dy + Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy (cid:21) . Note that, under (2.5), Z R d e − V ( | y | ) V ′ ( | y | ) | y | dy > . Then, by (2.4), there is a constant R > r such that Z {| y |≤ R } e − V ( | y | ) V ′ ( | y | ) | y | dy > . This implies that(2.8) Z { V ′ ( | y | ) ≥ , | y |≤ R } e − V ( | y | ) V ′ ( | y | ) | y | dy + Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy > , where we used the facts that r := sup { r > V ′ ( r ) ≤ } < ∞ and r < R . Furthermore, by (2.8),we can choose ε ∈ (0 , small enough so that M := (1 − ε ) Z { V ′ ( | y | ) ≥ , | y |≤ R } e − V ( | y | ) V ′ ( | y | ) | y | dy + Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy > . Now for these ﬁxed R and ε , we ﬁnd C > large enough and a > small enough such that (1 + a ) − ( d + α ) / (1 − C − ) − d − α ≥ − ε, aCr ≥ R . Then, for any x ∈ R d with | x | ≥ Cr , (1 + a ) − ( d + α ) / (1 − C − ) − d − α Z { V ′ ( | y | ) ≥ , | y |≤ a | x |} e − V ( | y | ) V ′ ( | y | ) | y | dy + Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy ≥ (1 − ε ) Z { V ′ ( | y | ) ≥ , | y |≤ aCr } e − V ( | y | ) V ′ ( | y | ) | y | dy + Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy ≥ (1 − ε ) Z { V ′ ( | y | ) ≥ , | y |≤ R } e − V ( | y | ) V ′ ( | y | ) | y | dy + Z { V ′ ( | y | ) ≤ } e − V ( | y | ) V ′ ( | y | ) | y | dy = M > PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 9 and so(2.9) h x, b ( x ) i ≤ − C d, − α M (1 − C − ) − d − α ( d + α − d e V ( | x | ) | x | | x | d + α . Furthermore, by V ∈ C ( R d ) and e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) , b ( x ) is locally bounded; seeLemma 2.2. Then, for any x ∈ R d with | x | ≤ l , one can ﬁnd a constant C ( l ) > such that | b ( x ) | ≤ C ( l ) ,and so(2.10) h x, b ( x ) i ≤ | x || b ( x ) | ≤ lC ( l ) . Therefore, by (2.9) and (2.10), we can choose the constants c , c > and r > so that (2.6)holds. (cid:3) The following statement indicates that the estimate (2.6) for | x | large enough is indeed optimal,under a mild additional assumption. Lemma 2.5.

Let V ( x ) = V ( | x | ) for all x ∈ R d such that V ∈ C ( R d ) , e − V |∇ V | ∈ L ( R d ; dx ) ∩ C b ( R d ) ,and (2.4) is satisﬁed. If (2.11) lim sup r →∞ [ e − V ( r ) V ′ ( r ) r d +1 ] < ∞ , then there exists a constant c > such that for all x ∈ R d , | b ( x ) | ≤ ce V ( | x | ) (1 + | x | ) d + α − . Proof.

For convenience, we set ˜ b ( x ) = C − d, − α e − V ( | x | ) b ( x ) . Then, for any x ∈ R d , | ˜ b ( x ) | = d X i =1 (cid:16) Z R d e − V ( | y | ) V ′ ( | y | ) y i | y || x − y | d − (2 − α ) dy (cid:17) =: d X i =1 I i . For ﬁxed i , assume that x i ≥ . Then,I i = (cid:16) Z { y i > } e − V ( | y | ) V ′ ( | y | ) y i | y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) ≤ (cid:16) Z { y i > } e − V ( | y | ) | V ′ ( | y | ) | y i | y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) ≤ (cid:16) Z { y i > , | x − y |≤| x | / } e − V ( | y | ) | V ′ ( | y | ) | y i | y | (cid:20) | x − y | d + α − + 1( | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) + 3 (cid:16) Z { y i > , | x − y |≥ | x |} e − V ( | y | ) | V ′ ( | y | ) | y i | y | (cid:20) | x − y | d + α − + 1( | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) + 3 (cid:16) Z { y i > , | x | / ≤| x − y |≤ | x |} e − V ( | y | ) | V ′ ( | y | ) | y i | y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) ≤ (cid:16) Z {| x − y |≤| x | / } e − V ( | y | ) | V ′ ( | y | ) || y i || y | | x − y | d + α − dy (cid:17) + 6 (cid:16) Z {| x − y |≥ | x |} e − V ( | y | ) | V ′ ( | y | ) || y i || y | | x − y | d + α − dy (cid:17) + 3 (cid:16) Z { y i > , | x | / ≤| x − y |≤ | x |} e − V ( | y | ) | V ′ ( | y | ) | y i | y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) =: 6 I i + 6 I i + 3 I i . When x i < , similarly we haveI i = (cid:16) Z { y i < } e − V ( | y | ) V ′ ( | y | ) y i | y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) ≤ (cid:16) Z { y i < } e − V ( | y | ) | V ′ ( | y | ) || y i || y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) ≤ i + 6I i + 3 (cid:16) Z { y i < , | x | / ≤| x − y |≤ | x |} e − V ( | y | ) | V ′ ( | y | ) || y i || y | (cid:20) | x − y | d + α − − | x − y | + 4 x i y i ) ( d + α − / (cid:21) dy (cid:17) =: 6 I i + 6 I i + 3˜ I i . Next, we estimate the above terms respectively. For I i , we have that for x ∈ R d with | x | largeenough, (cid:16) d X i =1 I i (cid:17) / ≤ √ d Z {| x − y |≤| x | / } e − V ( | y | ) | V ′ ( | y | ) | | x − y | d − (2 − α ) dy ≤ √ d sup {| x − y |≤| x | / } { e − V ( | y | ) | V ′ ( | y | ) |} Z | x − y |≤| x | / | x − y | d − (2 − α ) dy ≤ c | x | α − sup | y |≥| x | / { e − V ( | y | ) | V ′ ( | y | ) |} ≤ c d +1 | x | d + α − sup | y |≥| x | / { e − V ( | y | ) | V ′ ( | y | ) || y | d +1 }≤ c | x | d + α − , where the last inequality follows from (2.11).For I i , we have that for x ∈ R d with | x | large enough, (cid:16) d X i =1 I i (cid:17) / ≤ √ d Z {| x − y |≥ | x |} e − V ( | y | ) | V ′ ( | y | ) | | x − y | d − (2 − α ) dy ≤ √ d − d − α | x | d − (2 − α ) Z {| y |≥| x |} e − V ( | y | ) | V ′ ( | y | ) | dy ≤ c | x | d − (2 − α ) Z {| y |≥| x |} | y | − d − dy ≤ c | x | d + α − , where in the third inequality we used (2.11) again.To estimate I i , deﬁne f ( r ) = 1( | x − y | + r ) ( d + α − / , r ≥ . By the Lagrange mean value theorem, for any y ∈ R d with | x | / ≤ | x − y | ≤ | x | and y i > , and any x i ≥ , there exists θ i ∈ [0 , x i y i ] such that f (0) − f (4 x i y i ) = − x i y i f ′ ( θ i ) = d + α −

22 4 x i y i ( | x − y | + θ i ) ( d + α ) / ≤ d + α − | x | y i | x − y | d + α ≤ c y i | x | d + α − . Note that it always holds that f (0) − f (4 x i y i ) > . Therefore, for all x ∈ R d , according to (2.4),I i ≤ c | x | d + α − (cid:16) Z { y i > , | x | / ≤| x − y |≤ | x |} e − V ( | y | ) | V ′ ( | y | ) || y | dy (cid:17) ≤ c | x | d + α − (cid:16) Z {| x | / ≤| x − y |≤ | x |} e − V ( | y | ) | V ′ ( | y | ) || y | dy (cid:17) ≤ c | x | d + α − (cid:16) Z {| y |≤ | x |} e − V ( | y | ) | V ′ ( | y | ) || y | dy (cid:17) ≤ c | x | d + α − and so (cid:16) d X i =1 I i (cid:17) / ≤ c | x | d + α − . Similarly, we also can prove that for all x ∈ R d , (cid:16) d X i =1 ˜ I i (cid:17) / ≤ c | x | d + α − . PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 11

Combining all the estimates above, we can obtain that there exists a constant c > such that forall x ∈ R d with | x | ≥ large enough, | ˜ b ( x ) | ≤ c | x | d + α − ; that is, | b ( x ) | ≤ ce V ( | x | ) | x | d + α − . The proof is completed, since b ( x ) is locally bounded. (cid:3) Remark 2.6. (1) If condition (2.11) is strengthened into lim sup r →∞ [ e − V ( r ) V ′ ( r ) r d +1 ] = 0 , then bothterms (cid:16) P di =1 I i (cid:17) / and (cid:16) P di =1 I i (cid:17) / are o (cid:16) | x | d + α − (cid:17) for all x ∈ R d with | x | large enough. Hence,the remaining term (cid:16) P di =1 I i (cid:17) / or (cid:16) P di =1 ˜ I i (cid:17) / plays the lead role in the estimates above.(2) Under the assumptions of Lemmas 2.4 and 2.5, it holds that, for | x | large enough, h x, b ( x ) i ≍ − e V ( | x | ) (1 + | x | ) d + α | x | . The case of d ≤ − α . In this part, we will consider the case of d ≤ − α , i.e., d = 1 and < α ≤ . Let V ∈ C ( R ) be such that e − V ∈ L ( R ; dx ) ∩ C b ( R ) , and let b ( x ) be deﬁned by (1.5).We ﬁrst show that Lemma 2.7.

Let V ∈ C ( R ) be such that e − V ∈ L ( R ; dx ) ∩ C b ( R ) . If (2.12) lim sup | x |→∞ [ | x | e − V ( x ) | V ′ ( x ) − V ′′ ( x ) | ] < ∞ , then b ( x ) given by (1.5) is well deﬁned.Proof. Since e − V ∈ C b ( R ) , we know that − ( − ∆) α/ e − V ( x ) ∈ C b ( R ) , and so − ( − ∆) α/ e − V ( x ) is locallyintegrable on R . Next, we will estimate ( − ∆) α/ e − V ( x ) for x < − small enough. For x < − , − ( − ∆) α/ e − V ( x ) = Z R (cid:0) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z {| z |≤ } (cid:1) c ,α | z | α dz = Z {| z | < − x/ } (cid:0) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z (cid:1) c ,α | z | α dz + Z {| z |≥− x/ } ( e − V ( x + z ) − e − V ( x ) ) c ,α | z | α dz =: I ( x ) + I ( x ) . Since | I ( x ) | ≤ Z {| z |≥− x/ } e − V ( x + z ) c ,α | z | α dz + Z {| z |≥− x/ } e − V ( x ) c ,α | z | α dz ≤ c (cid:16) | x | − − α + e − V ( x ) (cid:17) , by e − V ∈ L ( R ; dx ) we know that R R | I ( x ) | dx < ∞ . On the other hand, by the mean value theorem, | I ( x ) | ≤ Z {| z | < − x/ } | e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z | c ,α | z | α dz ≤ c ,α h sup x/ ≤ u ≤ x/ | e − V ( u ) ( V ′ ( u ) − V ′′ ( u )) | i Z − x/ z − α dz ≤ c ( − x ) − α sup x/ ≤ u ≤ x/ (cid:2) e − V ( u ) | V ′ ( u ) − V ′′ ( u ) | (cid:3) ≤ c ( − x ) − − α sup x/ ≤ u ≤ x/ (cid:2) | u | e − V ( u ) | V ′ ( u ) − V ′′ ( u ) | (cid:3) ≤ c ( − x ) − − α , (2.13)where in the last inequality we used (2.12). Note that analogous estimates hold also for x > largeenough, and hence we arrive at the desired assertion. (cid:3) Remark 2.8.

From the proof above, we can see that under the assumptions of Lemma 2.7,(2.14) Z R | ( − ∆) α/ e − V ( u ) | du < ∞ and hence Z ∞ x ( − ∆) α/ e − V ( u ) du is also well deﬁned for any x ∈ R .In the following, we always assume that (2.12) holds. We further suppose that V ( x ) = V ( − x ) forall x ∈ R . Then, we claim that Lemma 2.9.

Let V ∈ C ( R ) be such that e − V ∈ L ( R ; dx ) ∩ C b ( R ) . Suppose that (2.12) holds andthat V ( x ) = V ( − x ) for all x ∈ R . Then, b ( x ) given by (1.5) is an anti-symmetric function on R ( i.e., b ( x ) = − b ( − x ) for all x ∈ R ) such that (2.15) b ( x ) =  e V ( x ) Z ∞ x ( − ∆) α/ e − V ( z ) dz, x ≥ , − e V ( x ) Z ∞− x ( − ∆) α/ e − V ( z ) dz, x < . In particular, b (0) = 0 . Moreover, b ( x ) ∈ C ( R ) and is locally bounded.Proof. As mentioned in Remark 2.8, under the assumptions of this lemma, we have (2.14). We willshow that this yields(2.16) − Z R ( − ∆) α/ e − V ( u ) du = 0 , and hence b ( x ) = − e V ( x ) Z x −∞ ( − ∆) α/ e − V ( u ) du = e V ( x ) Z ∞ x ( − ∆) α/ e − V ( u ) du, x ≥ . Indeed, for any ε ∈ (0 , and any x ∈ R , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z {| y − x |≥ ε } ( e − V ( y ) − e − V ( x ) ) | y − x | α dy (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z {| z |≥ ε } (cid:0) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z {| z |≤ } (cid:1) dz | z | α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Z {| z |≤ } (cid:12)(cid:12) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z (cid:12)(cid:12) dz | z | α + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z {| z | > } (cid:0) e − V ( x + z ) − e − V ( x ) (cid:1) dz | z | α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k [ e − V ] ′′ k ∞ Z {| z |≤ } | z | | z | α dz + 2 k e − V k ∞ Z {| z | > } | z | α dz ≤ c < ∞ . On the other hand, for any ε ∈ (0 , and any x ∈ R with | x | > large enough, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z {| y − x |≥ ε } ( e − V ( y ) − e − V ( x ) ) | y − x | α dy (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z {| z |≥ ε } (cid:0) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z {| z |≤| x | / } (cid:1) | z | α dz (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Z {| z |≤| x | / } (cid:12)(cid:12) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z (cid:12)(cid:12) | z | α dz + Z {| z | > | x | / } e − V ( x + z ) | z | α dz + e − V ( x ) Z {| z | > | x | / } | z | α dz ≤ c | x | − (1+ α ) + 2 α | x | α Z R e − V ( z ) dz + c e − V ( x ) , PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 13 where the ﬁrst term in the last inequality follows from (2.12) and the argument for (2.13). Hence,there is a constant c > such that for all x ∈ R , sup ε ∈ (0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z {| y − x |≥ ε } ( e − V ( y ) − e − V ( x ) ) | y − x | α dy (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:16) (1 + | x | ) − − α + e − V ( x ) (cid:17) . Therefore, by using the dominated convergence theorem and changing the order of integration, we ﬁndthat − Z R ( − ∆) α/ e − V ( x ) dx = c ,α Z R lim ε → Z {| y − x |≥ ε } ( e − V ( y ) − e − V ( x ) ) | y − x | α dy dx = c ,α lim ε → Z R Z {| y − x |≥ ε } ( e − V ( y ) − e − V ( x ) ) | y − x | α dy dx = − c ,α lim ε → Z R Z {| y − x |≥ ε } ( e − V ( x ) − e − V ( y ) ) | y − x | α dx dy = − c ,α Z R lim ε → Z {| x − y |≥ ε } ( e − V ( x ) − e − V ( y ) ) | x − y | α dx dy = Z R ( − ∆) α/ e − V ( y ) dy, which proves (2.16).On the other hand, − Z −∞ ( − ∆) α/ e − V ( u ) du = Z −∞ Z R (cid:16) e − V ( u + z ) − e − V ( u ) − (cid:2) e − V ( u ) (cid:3) ′ z {| z |≤ } (cid:17) c ,α | z | α dz du = Z −∞ lim ε → Z {| z |≥ ε } (cid:16) e − V ( u + z ) − e − V ( u ) (cid:17) c ,α | z | α dz du = Z ∞ lim ε → Z {| z |≥ ε } (cid:16) e − V ( − u + z ) − e − V ( − u ) (cid:17) c ,α | z | α dz du = Z ∞ lim ε → Z {| z |≥ ε } (cid:16) e − V ( − u + z ) − e − V ( u ) (cid:17) c ,α | z | α dz du = Z ∞ lim ε → Z {| z |≥ ε } (cid:16) e − V ( − u − z ) − e − V ( u ) (cid:17) c ,α | z | α dz du = Z ∞ lim ε → Z {| z |≥ ε } (cid:16) e − V ( u + z ) − e − V ( u ) (cid:17) c ,α | z | α dz du = Z ∞ Z R (cid:16) e − V ( u + z ) − e − V ( u ) − (cid:2) e − V ( u ) (cid:3) ′ z {| z |≤ } (cid:17) c ,α | z | α dz du = − Z ∞ ( − ∆) α/ e − V ( u ) du, (2.17)where in the third and the ﬁfth equalities we changed the variables, and the fourth and the sixthequalities follow from the symmetry V ( x ) = V ( − x ) for all x ∈ R . Combining (2.16) with (2.17), wehave(2.18) Z ∞ ( − ∆) α/ e − V ( u ) du = 0 and so b (0) = 0 . Furthermore, by R ∞ ( − ∆) α/ e − V ( u ) du = 0 and ( − ∆) α/ e − V ( x ) = ( − ∆) α/ e − V ( − x ) for all x ∈ R (which is also due to the symmetry V ( x ) = V ( − x ) for all x ∈ R ), we can get that forany x < , b ( x ) = − e V ( x ) Z x −∞ ( − ∆) α/ e − V ( u ) du = − e V ( x ) Z ∞− x ( − ∆) α/ e − V ( z ) dz. The desired assertion (2.15) follows.As we mentioned in the proof of Lemma 2.7, since e − V ∈ C b ( R ) , ( − ∆) α/ e − V ( x ) ∈ C b ( R ) . By(2.18), we can easily see that b ( x ) ∈ C ( R ) and is locally bounded. (cid:3) The following statement is analogous to Lemma 2.4.

Lemma 2.10.

Let V ∈ C ( R ) be a symmetric function on R such that e − V ∈ L ( R ; dx ) ∩ C b ( R ) and (2.12) holds. Suppose that (2.19) lim x →∞ xe − V ( x ) = 0 and (2.20) lim inf x →∞ [ x e − V ( x ) ( V ′ ( x ) − V ′′ ( x ))] ≥ . Then there exist constants c , c > and r > such that for all x ∈ R , xb ( x ) ≤ c {| x |≤ r } − c e V ( x ) | x | α | x | {| x | >r } . Proof.

Since b ( x ) is anti-symmetric, we only need to consider x ≥ . According to Lemma 2.9, b ( x ) ∈ C ( R ) and is therefore locally bounded. Hence, in order to prove the desired assertion, it issuﬃcient to verify that there exists a constant c > such that for x > large enough(2.21) − ( − ∆) α/ e − V ( x ) ≥ cx α . To this end, for x > we write − ( − ∆) α/ e − V ( x ) = Z R (cid:0) e − V ( x + z ) − e − V ( x ) + e − V ( x ) V ′ ( x ) z {| z |≤ } (cid:1) c ,α | z | α dz = Z {| z | large enough, we have I ( x ) = Z ∞ x/ ( e − V ( x + z ) − e − V ( x ) ) c ,α | z | α dz + Z − x/ − x ( e − V ( x + z ) − e − V ( x ) ) c ,α | z | α dz + Z − x −∞ ( e − V ( x + z ) − e − V ( x ) ) c ,α | z | α dz ≥ − e − V ( x ) Z ∞ x/ c ,α | z | α dz ! + c ,α x α Z x/ e − V ( z ) dz − e − V ( x ) Z xx/ c ,α | z | α dz ! + (cid:18) − e − V ( x ) Z ∞ x c ,α | z | α dz (cid:19) = c ,α x α Z x/ e − V ( z ) dz − e − V ( x ) Z ∞ x/ c ,α | z | α dz ≥ c x α − c e − V ( x ) x α . On the other hand, by the Taylor theorem, for x > large enough, I ( x ) = Z {| z |

Hence, for x > large enough, − ( − ∆) α/ e − V ( x ) ≥ c x α − c xe − V ( x ) x α + c x α h x inf x/ ≤ z ≤ x/ [ e − V ( z ) ( V ′ ( z ) − V ′′ ( z ))] i . This along with (2.19) and (2.20) yields (2.21). The proof is completed. (cid:3)

Lemma 2.11.

Let V ∈ C ( R ) be a symmetric function on R such that e − V ∈ L ( R ; dx ) ∩ C b ( R ) and (2.12) holds. If (2.19) is satisﬁed, then there exist constants c > and r > such that for all x ∈ R with | x | ≥ r , b ( x ) ≥ − c e V ( x ) | x | α . Proof.

The assertion follows from the conclusion that there exists a constant c > such that for x > large enough(2.22) − ( − ∆) α/ e − V ( x ) ≤ c x α . For (2.22), one can follow the idea for the argument of (2.21). In particular, under (2.12) it holds that(2.23) lim sup x →∞ [ x e − V ( x ) ( V ′ ( x ) − V ′′ ( x ))] < ∞ . Then we can deduce that I ( x ) ≤ c x α by applying (2.23) instead of (2.20). The details are omitted here. (cid:3) Remark 2.12.

Under the assumptions of Lemma 2.10, for | x | large enough, xb ( x ) ≍ − e V ( x ) (1 + | x | ) α | x | . Properties of the SDE with the fractional drift

In this section, we will consider the following stochastic diﬀerential equation (SDE)(3.1) dX t = b ( X t ) dt + dZ t , where ( Z t ) t ≥ is a symmetric (rotationally invariant) α -stable process on R d with α ∈ (0 , and d ≥ ,and b ( x ) is deﬁned by (1.4) when d > − α and by (1.5) when d ≤ − α . Everywhere below, weassume that Assumption (A) is satisﬁed.Suppose ﬁrst that d > − α . According to Lemmas 2.2 and 2.4, for the drift b ( x ) deﬁned by (1.4),we have b ∈ C β ( R d ) with β = 2 − α when α ∈ (1 , , β = 1 − ε for any ε > when α = 1 , and β = 1 when α ∈ (0 , (in particular, b ∈ C β ( R d ) with β ∈ (0 , − α/ for all α ∈ (0 , ), and(3.2) h b ( x ) , x i ≤ K (1 + | x | ) , x ∈ R d for some constant K > , where C β ( R d ) denotes the set of locally β -Hölder continuous functions from R d to R d for β ∈ (0 , . Suppose now that d ≤ − α . Then, by Lemmas 2.9 and 2.10, the drift b ( x ) deﬁned by (1.5) belongs to C ( R ) and satisﬁes (3.2) as well. Here we used the fact that (2.19)holds under condition (1.6) and hence under Assumption (A), all the conditions required in Lemmas2.9 and 2.10 are satisﬁed. Consequently, for all d ≥ and α ∈ (0 , , the equation (3.1) has a uniquenon-explosive strong solution ( X t ) t ≥ , which is a strong Markov process with the generator Lf ( x ) = − ( − ∆) α/ f ( x ) + h b ( x ) , ∇ f ( x ) i , f ∈ C b ( R d ) . For the case of d > − α , the reader can be referred to [43, Theorem 2.4 and Lemma 7.1], while for d ≤ − α one can directly apply e.g. [25, Theorem 1.1], since b ∈ C ( R ) obviously implies that b ( x ) satisﬁes a local Lipschitz condition. Alternatively, for any d ≥ and α ∈ (0 , , we can ﬁrst apply[33, Theorem 1.1] or [6, Corollary 1.4(i)] (with b ∈ C βb ( R d ) , i.e., with b ( x ) being globally β -Höldercontinuous) to get the locally unique strong solution, and then use the additional global one-sidedlinear growth condition (3.2) to obtain the unique non-explosive strong solution; see the proof of [16,Theorem 1] or [25, Theorem 1.1].In the following, we will prove rigorously that (1.1) is indeed the unique invariant measure for theprocess ( X t ) t ≥ deﬁned as the solution to (1.3) with the drift term b ( x ) deﬁned by (1.4) and (1.5).We begin with the following simple lemma. Lemma 3.1.

Under Assumption (A) , for any β ∈ (0 , α ) , there are constants C , C > such that forall x ∈ R d , (3.3) LV ( x ) ≤ C − C e V ( x ) | x | d + α V ( x ) , where V ( x ) = (1 + | x | ) β/ .Proof. According to Lemmas 2.4 and 2.10, we know that under Assumption (A) there are constants λ , λ > such that for all x ∈ R d ,(3.4) h x, b ( x ) i ≤ λ − λ U ( x ) | x | , where U ( x ) = e V ( x ) / (1 + | x | ) d + α . Here, we used again the fact that (2.19) holds true under condition(1.6) .Recall that c d,α | z | d + α is the density function of the Lévy measure for the symmetric α -stable process.Since ∇ V ( x ) = β (1 + | x | ) ( β − / x and k∇ V k ∞ ≤ β (2 − β/ , we ﬁnd that for all x ∈ R d and l ≥ , LV ( x ) = β (1 + | x | ) ( β − / h x, b ( x ) i + Z {| z |≤ l } ( V ( x + z ) − V ( x ) − h∇ V ( x ) , z i ) c d,α | z | d + α dz + Z {| z | >l } ( V ( x + z ) − V ( x )) c d,α | z | d + α dz ≤ β (1 + | x | ) ( β − / h x, b ( x ) i + ( β/

2) (2 − β/ c d,α Z {| z |≤ l } | z | d + α − dz + Z {| z | >l } [(1 + 2 | x | ) β/ + (2 | z | ) β/ ] c d,α | z | d + α dz ≤ β (1 + | x | ) ( β − / ( λ − λ U ( x ) | x | ) + c l − α + c l − α (1 + 2 | x | ) β/ + c , where c i (1 ≤ i ≤ are independent of l and x ∈ R d . Here, in the equality above, we used the factthat R { ≤| z |≤ l } z c d,α | z | d + α dz = 0 ; the ﬁrst inequality follows from the mean value theorem and the factthat V ( x + z ) ≤ (1 + 2 | x | + 2 | z | ) β/ ≤ (1 + 2 | x | ) β/ + (2 | z | ) β/ ; and in the last inequality we used(3.4) and the facts that Z {| z |≤ l } | z | d + α − dz ≤ c l − α , Z {| z | >l } | z | d + α dz ≤ c l − α and Z {| z |≥ } | z | d + α − β dz < ∞ , β ∈ [0 , α ) . From the right hand side of the inequality above, we can see that LV ( x ) is locally bounded, and for | x | large enough, LV ( x ) ≤ − λ β U ( x ) | x | β + c l − α + 4 c l − α | x | β , which is dominated by − λ β U ( x ) | x | β by choosing | x | ≫ l ≫ . Then, (3.3) follows. (cid:3) We also need the following statement.

Lemma 3.2.

Let ( X t ) t ≥ be the unique strong solution to the SDE (3.1) with b ( x ) deﬁned by (1.4) when d > − α and by (1.5) when d ≤ − α , such that Assumption (A) is satisﬁed. Then, (i) The process ( X t ) t ≥ is strong Feller and Lebesgue irreducible; (ii) The transition probability function of the process ( X t ) t ≥ is absolutely continuous with respectto the Lebesgue measure.In particular, the process has a unique invariant probability measure µ ( dx ) = ρ ( x ) dx , where ρ ( x ) > for all x ∈ R d .Proof. For simplicity, we only consider the case of d > − α , since the case of d ≤ − α can be provedsimilarly and easily.(i) For any n ≥ , let b n ( x ) = − C d, − α e V ( x ) ∧ K ( n ) Z R d e − V ( y ) ∇ V ( y ) | x − y | d − (2 − α ) dy, PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 17 where K ( n ) = 1 + sup | x |≤ n | V ( x ) | . Then, according to the proof of Lemma 2.2, the function x R R d e − V ( y ) ∇ V ( y ) | x − y | d − (2 − α ) dy is bounded, andglobally (2 − α ) -Hölder continuous when α ∈ (1 , , globally (1 − ε ) -Hölder continuous for any ε > when α = 1 , and belongs to C ( R d ) when α ∈ (0 , , and hence b n ( x ) also shares these properties.Consider the following SDE(3.5) dX ( n ) t = b n ( X ( n ) t ) dt + dZ t . It follows from [33, Theorem 1.1] or [6, Corollary 1.4(i)] that the SDE (3.5) has a unique strong solution,which will be denoted by X ( n ) := ( X ( n ) t ) t ≥ . Note that the inﬁnitesimal generator of the process X ( n ) is given by L ( n ) f ( x ) = h b n ( x ) , ∇ f ( x ) i − ( − ∆) α/ f ( x ) , f ∈ C b ( R d ) . Hence, according to [7, Theorem 1.5] for α ∈ (1 , and [42, Theorem 1.1] for α = 1 as well as [19,Theorem 2.2] for α ∈ (0 , , the process X ( n ) has a continuous and strictly positive transition densityfunction, which implies that X ( n ) is strong Feller (i.e., for any f ∈ B b ( R d ) and t > , x P ( n ) t f ( x ) := E x f ( X ( n ) t ) is continuous) and Lebesgue irreducible (i.e., for any t > and open set O ∈ B ( R d ) withLeb ( O ) > , P x ( X ( n ) t ∈ O ) > ). Here and in what follows, we assume that X and X ( n ) are deﬁnedon the same probability space (Ω , F , P ) . Let P x ( · ) = P ( ·| X = x ) or P x ( · ) = P ( ·| X ( n )0 = x ) withoutconfusion. Since b n ( x ) = b ( x ) for all | x | ≤ n , the law of X t ∧ τ n is the same as the law of X ( n ) t ∧ τ n for any t > , where τ n := inf { t > | X t | ≥ n } .Now, let ( P t ) t ≥ be the semigroup of the process X . For any f ∈ B b ( R d ) , x ∈ R d and for anysequence { x k } k ≥ ⊆ R d such that x k → x as k → ∞ , we choose n large enough so that { x k } k ≥ ⊂ B (0 , n ) , and then ﬁnd that | P t f ( x k ) − P t f ( x ) | = | E x k f ( X t ) − E x f ( X t ) |≤ | E x k ( f ( X t ) { t<τ n } ) − E x ( f ( X t ) { t<τ n } ) | + k f k ∞ (cid:0) P x k ( τ n ≤ t ) + P x ( τ n ≤ t ) (cid:1) = | E x k ( f ( X ( n ) t ) { t<τ n } ) − E x ( f ( X ( n ) t ) { t<τ n } ) | + k f k ∞ (cid:0) P x k ( τ n ≤ t ) + P x ( τ n ≤ t ) (cid:1) ≤ | E x k f ( X ( n ) t ) − E x f ( X ( n ) t ) | + 2 k f k ∞ (cid:0) P x k ( τ n ≤ t ) + P x ( τ n ≤ t ) (cid:1) ≤ | P ( n ) t f ( x k ) − P ( n ) t f ( x ) | + 4 k f k ∞ sup k ≥ P x k ( τ n ≤ t ) . (3.6)Note that, combining Lemma 3.1 with the standard argument (for example, see the proof of [28,Theorem 2.1]), we can see that for any k ≥ and t > , P x k ( τ n ≤ t ) = P x k ( max s ∈ [0 ,t ] | X s | ≥ n ) = P x k (cid:18) max s ∈ [0 ,t ] (1 + | X s | ) β/ ≥ (1 + | n | ) β/ (cid:19) ≤ c (1 + | x k | ) β/ (1 + | n | ) β/ . Since x k → x as k → ∞ , without loss of generality we may and will assume that x k ∈ B ( x , .Hence, lim n →∞ sup k ≥ P x k ( τ n ≤ t ) = 0 . Letting k → ∞ and then n → ∞ in (3.6), we show that lim k →∞ | P t f ( x k ) − P t f ( x ) | = 0 . Hence, for any f ∈ B b ( R d ) and t > , P t f is a continuous function, i.e., the process X is strong Feller.For any x ∈ R d , t > and open set O ∈ B ( R d ) with Leb ( O ) > , choosing n large enough suchthat Leb ( O ∩ B (0 , n )) > , P x ( X t ∈ O ) ≥ P x ( X t ∈ O, τ n > t ) = P x ( X ( n ) t ∈ O ∩ B (0 , n ) , τ n > t ) . According to (the proof of) [5, Corollary 3.6], the Dirichlet heat kernel of the process X ( n ) is positiveeverywhere, and so the right hand side of the inequality above is positive (even though the setting of[5] is restricted to d ≥ , the proof of [5, Corollary 3.6] is based on the global heat kernel estimates and the Lévy system for X ( n ) , both of which are available for d = 1 too, and so [5, Corollary 3.6] holdstrue for all d ≥ ). Hence, P x ( X t ∈ O ) > and thus the process X is Lebesgue irreducible.Therefore, all compact sets are petite for X (cf. [27, Theorem 4.1(i)]), and hence the existence ofthe invariant probability measure µ follows from (3.3), while the uniqueness is a direct consequence ofthe strong Feller property and irreducibility; see [28, Theorems 5.1 and 5.2].(ii) As we already established in the ﬁrst part of the proof, according to [7, Theorem 1.5], for any t > , the law of X ( n ) t is absolutely continuous with respect to the Lebesgue measure. We will claimthat the law of X t is also absolutely continuous with respect to the Lebesgue measure. Indeed, for anyopen set O ∈ B ( R d ) such that Leb ( O ) = 0 , any t > , x ∈ R d and n large enough, P x ( X t ∈ O ) = P x ( X t ∈ O, τ n > t ) + P x ( X t ∈ O, τ n ≤ t )= P x ( X ( n ) t ∈ O, τ n > t ) + P x ( X t ∈ O, τ n ≤ t ) ≤ P x ( X ( n ) t ∈ O ) + 2 P x ( τ n ≤ t ) = 2 P x ( τ n ≤ t ) . As mentioned above, for any x ∈ R d and t > , P x ( τ n ≤ t ) → as n → ∞ . Hence, P x ( X t ∈ O ) = 0 for any x ∈ R d and t > .Let P ( t, x, · ) be the transition function of the process X . By the argument for the Lebesgue irredu-cibility above, we know that P ( t, x, · ) and the Lebesgue measure are equivalent, so that P ( t, x, A ) = R A p ( t, x, y ) dy for any A ∈ B ( R d ) and p ( t, x, y ) can be chosen to be strictly positive everywhereon R d × R d for any ﬁxed t > . Hence, for the invariant probability measure µ , since µ ( A ) = R R d P ( t, x, A ) dµ ( x ) for A ∈ B ( R d ) and t > , µ is also absolutely continuous with respect to theLebesgue measure and the associated density function can be chosen to be strictly positive every-where. (cid:3) Proposition 3.3.

Let X := ( X t ) t ≥ be the unique strong solution to the SDE (1.3) with b ( x ) deﬁnedby (1.4) when d > − α and by (1.5) when d ≤ − α such that Assumption (A) is satisﬁed. Then, µ ( dx ) := Z − e − V ( x ) dx with Z = R R d e − V ( x ) dx is the unique invariant probability measure for theprocess X .Proof. Recall that the inﬁnitesimal generator of the process ( X t ) t ≥ is given by Lf ( x ) = − ( − ∆) α/ f ( x ) + h b ( x ) , ∇ f ( x ) i . Let D ( L ) be the domain of the operator under the norm k · k ∞ . Then, if µ is an invariant measure for ( P t ) t ≥ , for any f ∈ D ( L ) ,(3.7) µ ( Lf ) = µ (cid:18) lim t → P t f − ft (cid:19) = lim t → µ ( P t f ) − µ ( f ) t = 0 . Actually, (3.7) is equivalent to saying that µ is an invariant probability measure of the process X andthis is still true if we replace D ( L ) with a core; see e.g. [23, Theorem 3.37].According to [7, Theorem 1.5], C b ( R d ) is contained in the domain of the inﬁnitesimal generator ofthe process X ( n ) given by the SDE (3.5). Then, by the localization argument that we used in the proofof the strong Feller property above, we can check that C ∞ c ( R d ) ⊂ D ( L ) . In the following, we take µ ( dx ) := Z − e − V ( x ) dx with Z = R R d e − V ( x ) dx , and verify that for any f ∈ C ∞ c ( R d ) , µ ( Lf ) = 0 .Let us ﬁrst suppose that d > − α . Then, for b ( x ) deﬁned by (1.4) and for any f ∈ C ∞ c ( R d ) , Z R d Lf ( x ) e − V ( x ) dx = − Z R d e − V ( x ) ( − ∆) α/ f ( x ) dx + Z R d e − V ( x ) h∇ f ( x ) , b ( x ) i dx = − Z R d e − V ( x ) ( − ∆) α/ f ( x ) dx + C d, − α Z R d * ∇ f ( x ) , ∇ "Z R d e − V ( y ) | · − y | d − (2 − α ) dy ( x ) + dx. (3.8)On the other hand, by the integration by parts, we ﬁnd that for any f ∈ C ∞ c ( R d ) , C d, − α Z R d * ∇ f ( x ) , ∇ "Z R d e − V ( y ) | · − y | d − (2 − α ) dy ( x ) + dx PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 19 = C d, − α Z R d ( − ∆) f ( x ) Z R d e − V ( y ) | x − y | d − (2 − α ) dy dx = C d, − α Z R d ( − ∆) − α/ (cid:2) ( − ∆) α/ f (cid:3) ( x ) Z R d e − V ( y ) | x − y | d − (2 − α ) dy dx = C d, − α Z R d ( − ∆) α/ f ( x ) · ( − ∆) − α/ "Z R d e − V ( y ) | · − y | d − (2 − α ) dy ( x ) dx = Z R d e − V ( x ) ( − ∆) α/ f ( x ) dx, where in the second equality we used the fact that ( − ∆) = ( − ∆) α/ ( − ∆) − α/ (which can be checkedby the standard Fourier analysis), the third equality follows from the symmetry of ( − ∆) − α/ on L ( R d ; dx ) , and in the fourth equality we used the fact that C d, − α | x − y | d − (2 − α ) is the Green function for thesymmetric (2 − α ) -stable process, and hence for all x ∈ R d , ( − ∆) − α/ "Z R d C d, − α e − V ( y ) | · − y | d − (2 − α ) dy ( x ) = e − V ( x ) , cf. [21, Proposition 7.2]. The equality above along with (3.8) yields that R R d Lf ( x ) e − V ( x ) dx = 0 , andso the desired assertion follows.Now, we consider the case that d ≤ − α ; i.e., d = 1 and α ∈ (0 , . For b ( x ) deﬁned by (1.5), using(2.15), we have for any f ∈ C ∞ c ( R ) , Z R Lf ( x ) e − V ( x ) dx = − Z R e − V ( x ) ( − ∆) α/ f ( x ) dx + Z ∞ f ′ ( x ) Z ∞ x ( − ∆) α/ e − V ( z ) dz dx − Z −∞ f ′ ( x ) Z ∞− x ( − ∆) α/ e − V ( z ) dz dx = − Z R e − V ( x ) ( − ∆) α/ f ( x ) dx + Z ∞ f ( x )( − ∆) α/ e − V ( x ) dx + Z −∞ f ( x )( − ∆) α/ e − V ( − x ) dx. = − Z R e − V ( x ) ( − ∆) α/ f ( x ) dx + Z R e − V ( x ) ( − ∆) α/ f ( x ) dx = 0 , where in the second equality we used the fact that R ∞ ( − ∆) α/ e − V ( z ) dz = 0 (cf. (2.18)) and the thirdequality follows from the fact that ( − ∆) α/ e − V ( − x ) = ( − ∆) α/ e − V ( x ) for all x ≤ due to the symmetryof V ( x ) .Therefore, according to both conclusions above and Lemma 3.2, we prove that µ ( dx ) := Z − e − V ( x ) dx is the unique invariant probability measure of the process X . (cid:3) Remark 3.4.

When d > − α , by some elementary calculations, the dual of the operator L on L ( R d ; dx ) is given by L ∗ f ( x ) = − ( − ∆) α/ f ( x ) − h b ( x ) , ∇ f ( x ) i − div b ( x ) f ( x )= − ( − ∆) α/ f ( x ) − div ( bf )( x ) . Arguing informally, we havediv ( be − V )( x ) = div[ ∇ ( − ∆) − (1 − α/ e − V ]( x ) = [∆( − ∆) − (1 − α/ e − V ]( x )= − ( − ∆)( − ∆) − (1 − α/ e − V ( x ) = − ( − ∆) α/ e − V ( x ) , and so, by (2.1), L ∗ e − V ( x ) = 0 for x ∈ R d , which would imply the inﬁnitesimal invariance of µ givenby (1.1) for the process ( X t ) t ≥ deﬁned by (1.3), cf. the proof of [36, Theorem 1.1]. However, since wedo not know whether ∇ ( − ∆) − (1 − α/ e − V belongs to C ( R d ) or not when α ∈ (1 , (cf. Remark 2.3),div ∇ ( − ∆) − (1 − α/ e − V may be not well deﬁned. Hence the argument above is informal and, in orderto rigorously prove that µ is the unique invariant measure of ( X t ) t ≥ , it is necessary to argue as in theproof of Proposition 3.3. Using Lemma 3.2 and Proposition 3.3, we can now easily prove Theorem 1.1.

Proof of Theorem . . From Lemma 3.2, we know that the process X := ( X t ) t ≥ obtained as theunique solution to the SDE (3.1) is strong Feller and irreducible. Hence, due to [27, Theorem 4.1(i)],all compact sets are petite for X . Moreover, according to Lemma 3.1, we have the Lyapunov condition(3.3). As a consequence, [28, Theorem 6.1] applies, and so there is a constant λ > such that for any x ∈ R d and t > , k P ( t, x, · ) − µ k Var ,V ≤ C ( x ) V ( x ) e − λt , where V ( x ) = (1 + | x | ) β/ with β ∈ (0 , α ) , C ( x ) is a non-negative and locally bounded function on R d , and µ is the unique invariant probability measure for X . Finally, from Proposition 3.3 we knowthat µ is given by (1.1), and the proof is concluded. (cid:3) Acknowledgement.

Mateusz B. Majka would like to thank Aleksandar Mijatović for discussionsregarding Fractional Langevin Monte Carlo, and Jian Wang would like to thank Professor RenmingSong and Dr. Longjie Xie for helpful comments on heat kernel estimates for SDEs with Lévy jumps.The research of Lu-Jing Huang is supported by the National Natural Science Foundation of China (No.11901096). A part of this work was completed while Mateusz B. Majka was aﬃliated to the Universityof Warwick and supported by the EPSRC grant no. EP/P003818/1. The research of Jian Wang issupported by the National Natural Science Foundation of China (No. 11831014), the Program forProbability and Statistics: Theory and Application (No. IRTL1704), and the Program for InnovativeResearch Team in Science and Technology in Fujian Province University (IRTSTFJ).

References [1] Bakry, D., Gentil, I. and Ledoux, M.:

Analysis and Geometry of Markov Diﬀusion Operators , Grundlehren derMathematischen Wissenschaften, vol. , Springer, Cham, 2014.[2] Bogdan, K., Byczkowski, T., Kulczycki, T., Ryznar, M., Song, R. and Vondraček, Z.:

Potential Analysis of StableProcesses and its Extensions , Lecture Notes in Mathematics, vol. , Springer-Verlag, Berlin, 2009.[3] Bucur, C. and Valdinoci, E.:

Nonlocal Diﬀusion and Applications , Lecture Notes of the Unione Matematica Italiana,vol. , Springer, Bologna, 2016.[4] Chen, M.-F.: Eigenvalues, Inequalities, and Ergodic Theory , Springer-Verlag, London, 2005.[5] Chen, Z.-Q., Kim, P. and Song, R.: Dirichlet heat kernel estimates for fractional Laplacian with gradient perturba-tion,

Ann. Probab. , (2012), 2483–2538.[6] Chen, Z.-Q., Song, R., Zhang, X.: Stochastic ﬂows for Lévy processes with Hölder drifts, Revista MatemáticaIberoamericana , (2018), 1755–1788.[7] Chen, Z.-Q. and Zhang, X.: Heat kernels for time-dependent non-symmetric stable-like operators, J. Math. Anal.Appl. , (2018), 1–21.[8] Cheng, X., Chatterji, N.S., Abbasi-Yadkori, Y., Bartlett, P.L. and Jordan, M.I.: Sharp convergence rates forLangevin dynamics in the nonconvex setting, arXiv:1805.01648.[9] Dalalyan, A.S.: Theoretical guarantees for approximate sampling from smooth and log-concave densities, J. R. Stat.Soc. Ser. B. Stat. Methodol. , (2017), 651–676.[10] Dareiotis, K., Kumar, C. and Sabanis, S.: On tamed Euler approximations of SDEs driven by Lévy noise withapplications to delay equations, SIAM J. Numer. Anal. , (2016), 1840–1872.[11] Durmus, A. and Moulines, E.: Nonasymptotic convergence analysis for the unadjusted Langevin algorithm, Ann.Appl. Probab. , (2017), 1551–1587.[12] Eberle, A.: Reﬂection couplings and contraction rates for diﬀusions, Probab. Theory Related Fields , (2016),851–886.[13] Eberle, A., Guillin, A., and Zimmer, R.: Quantitative Harris-type theorems for diﬀusions and McKean-Vlasovprocesses, Trans. Amer. Math. Soc. , (2019), 7135–7173.[14] Eberle, A. and Majka, M.B.: Quantitative contraction rates for Markov chains on general state spaces, Electron. J.Probab. , (2019), Paper no. 26, 36 pages.[15] Erdogdu, M. A. and Hosseinzadeh, R.: On the convergence of Langevin Monte Carlo: the interplay between tailgrowth and smoothness, arXiv:2005.13097.[16] Gyöngy, I. and Krylov, N.-V.: On stochastic equations with respect to semimartingales. I, Stochastics , (1980/81),1–21.[17] Khasminskii, R.: Stochastic Stability of Diﬀerential Equations , 2nd ed., Stochastic Modelling and Applied Probab-ility, vol. , Springer, Heidelberg, 2012.[18] Kühn, F. and Schilling, R.L.: Strong convergence of the Euler–Maruyama approximation for a class of Lévy-drivenSDEs, Stoch. Proc. Appl. , (2019), 2654–2680.[19] Kulik, A.M.: On weak uniqueness and distributional properties of a solution to an SDE with α -stable noise, Stoch.Proc. Appl. , (2019), 473–506.[20] Kumar, C. and Sabanis, S.: On explicit approximations for Lévy driven SDEs with super-linear diﬀusion coeﬃcients, Electron. J. Probab. , (2017), Paper no. 73, 19 pages. PPROXIMATION OF HEAVY-TAILED DISTRIBUTIONS 21 [21] Kwaśnicki, M.: Ten equivalent deﬁnitions of the fractional Laplace operator,

Fract. Calc. Appl. Anal. , (2017),7–51.[22] Liang, M., Majka, M.B. and Wang, J.: Exponential ergodicity for SDEs and McKean-Vlasov processes with Lévynoise, arXiv:1901.11125.[23] Liggett, T.M.: Continuous Time Markov Processes. An Introduction.

Graduate Studies in Mathematics, vol. ,Providence, RI: Amer. Math. Soc, 2010.[24] Masuda, H.: On multidimensional Ornstein-Uhlenbeck processes driven by a general Lévy process,

Bernoulli , (2004), 97–120.[25] Majka, M.B.: A note on existence of global solutions and invariant measures for jump SDEs with locally one-sidedLipschitz drift, Probab. Math. Statist. , (2020), 37–55.[26] Majka, M.B., Mijatović, A. and Szpruch, L.: Non-asymptotic bounds for sampling algorithms without log-concavity, Ann. Appl. Probab. , in press, 2020.[27] Meyn, S.P. and Tweedie, R.L.: Stability of Markovian processes II: Continuous-time processes and sampled chains,

Adv. Appl. Probab. , (1993), 487–517.[28] Meyn, S.P. and Tweedie, R.L.: Stability of Markovian processes III: Foster–Lyapunov criteria for comtinuous timeprocesses, Adv. Appl. Probab. , (1993), 518–548.[29] Mikulevičius, R. and Xu, F.: On the rate of convergence of strong Euler approximation for SDEs driven by Lévyprocesses, Stochastics , (2018), 569–604.[30] Mou, W., Flammarion, N., Wainwright, M.J. and Bartlett, P.L.: Improved bounds for discretization of Langevindiﬀusions: near-optimal rates without convexity, arXiv:1907.11331.[31] Nguyen, T.H., Şimşekli, U. and Richard, G.: Non-asymptotic analysis of Fractional Langevin Monte Carlo fornon-convex optimization. In Chaudhuri, K. and Salakhutdinov, R. editors, Proceedings of the th InternationalConference on Machine Learning , vol. of Proceedings of Machine Learning Research , pages 4810–4819, LongBeach, California, USA, 09–15 Jun 2019. PMLR.[32] Ortigueira, M.D.: Riesz potential operators and inverses via fractional centred derivatives,

Int. J. Math. Math. Sci. ,Art. ID (2006), 12 pages.[33] Priola, E.: Pathwise uniqueness for singular SDEs driven by stable processes,

Osaka Journal of Mathematics , (2012), 421–447.[34] Roberts, G.O. and Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approxim-ations, Bernoulli , (1996), 341–363.[35] Schilling, R.L., Sztonyk, P., Wang, J.: Coupling property and gradient estimates of Lévy processes via the symbol, Bernoulli , (2012), 1128–1149.[36] Şimşekli, U.: Fractional Langevin Monte Carlo: Exploring Lévy driven stochastic diﬀerential equations for Markovchain Monte Carlo. In Precup, D. and Teh, Y.W. editors, Proceedings of the th International Conference onMachine Learning , vol. of Proceedings of Machine Learning Research , pages 3200–3209, International ConventionCentre, Sydney, Australia, 06–11 Aug 2017. PMLR.[37] Şimşekli, U., Zhu, L., Teh, Y.W. and Gürbüzbalaban, M.: Fractional underdamped Langevin dynamics: retargetingSGD with momentum under heavy-tailed gradient noise, arXiv:2002.05685.[38] Stein, E.M.:

Singular Integrals and Diﬀerentiability Properties of Functions , Princeton Univ. Press, 1970.[39] Triebel, H.:

Interpolation Theory, Function Spaces, Diﬀerential Operators , North-Holland Pub. Co, 1978.[40] Wang, F.-Y.:

Functional Inequalities, Markov Processes and Spectral Theory , Science Press, Beijing, 2005.[41] Wang, F.-Y. and Wang, J.: Functional inequalities for stable-like Dirichlet forms,

J. Theor. Probab. , (2015),423–448.[42] Xie, L. and Zhang, X.: Heat kernel estimates for critical fractional diﬀusion operator, Studia Math. , (2014),221–263.[43] Xie, L. and Zhang, X.: Ergodicity of stochastic diﬀerential equations with jumps and singular coeﬃcients, Ann.Inst. H. Poincaré Probab. Statist. , (2020), 175–229.[44] Ye, N. and Zhu, Z.: Stochastic fractional Hamiltonian Monte Carlo. In Lang, J. editor, Proceedings of the th In-ternational Joint Conference on Artiﬁcial Intelligenceth In-ternational Joint Conference on Artiﬁcial Intelligence