[PDF] On weak uniqueness and distributional properties of a solution to an SDE with α -stable noise

Abstract

For an SDE driven by a rotationally invariant α -stable noise we prove weak uniqueness of the solution under the balance condition α+γ>1 , where γ denotes the Holder index of the drift coefficient. We prove existence and continuity of the transition probability density of the corresponding Markov process and give a representation of this density with an explicitly given "principal part", and a "residual part" which possesses an upper bound. Similar representation is also provided for the derivative of the transition probability density w.r.t. the time variable.

Full PDF

aa r X i v : . [ m a t h . P R ] O c t On weak uniqueness and distributional properties of asolution to an SDE with α -stable noise Alexei Kulik ∗ Abstract

For an SDE driven by a rotationally invariant α -stable noise we prove weak uniquenessof the solution under the balance condition α + γ > , where γ denotes the Hölder indexof the drift coeﬃcient. We prove existence and continuity of the transition probabilitydensity of the corresponding Markov process and give a representation of this densitywith an explicitly given “principal part”, and a “residual part” which possesses an up-per bound. Similar representation is also provided for the derivative of the transitionprobability density w.r.t. the time variable. Keywords:

SDE, martingale problem, transition probability density, parametrix method,approximate fundamental solution, approximate harmonic function.

MSC 2010:

Primary: 60J35. Secondary: 60J75, 35S05, 35S10, 47G30.

In this paper we study the SDE dX t = b ( X t ) dt + σ ( X t − ) dZ ( α ) t , (1.1)driven by a symmetric α -stable process Z ( α ) in R d . If α = 2 , i.e. Z ( α ) is a Brownianmotion, it is well known that for (1.1) to have unique weak solution it is suﬃcient that b ismeasurable and locally bounded and σ is continuous and non-degenerate; see [SV79], Chapter7. In particular, it is a kind of a “common knowledge” that an SDE with Hölder continuouscoeﬃcients and non-degenerate diﬀusion coeﬃcient possesses unique weak solution which,in addition, deﬁnes a time-homogeneous Markov process. This heuristically means thatthe Brownian noise possesses a kind of regularization feature: though the deterministicdynamics which corresponds to the drift part of (1.1) may fail to be well deﬁned, adding anon-degenerate diﬀusion part makes the entire stochastic dynamics well determined. Onecan expect that the same “stochastic regularization” eﬀect should appear also in systemswith more general Lévy noises. However, even in a particular (but important) case of anSDE driven by an α -stable noise substantially new phenomena may appear. ∗ Institute of Mathematics, NAS of Ukraine, 3, Tereshchenkivska str., 01601 Kiev, Ukraine, [email protected] α < and the Hölder index γ of the drift coeﬃcient b ( x ) is positive but small,equation (1.1) may even fail to possess the weak uniqueness property. A natural example forthis dates back to the paper [TTW74], where in the second part of Theorem 3.2 it is shownthat for a one-dimensional SDE (1.1) with σ ( x ) ≡ , b ( x ) = | x | γ sign x , the minimal and themaximal (weak) solutions to (1.1) are diﬀerent as soon as α + γ < . (1.2)This example well illustrates the fact that an SDE driven by an α -stable process requires somespeciﬁc tools, when compared with the diﬀusive one, to provide its solvability and to analyzethe properties of the solution. The ﬁrst step in this direction was made in [TTW74], wherethe ﬁrst part of Theorem 3.2 states the weak uniqueness for a one-dimensional SDE (1.1)with σ ≡ and γ -Hölder continuous non-decreasing drift coeﬃcient under the additionalcondition α + γ > , (1.3)which in what follows we call the balance condition . The method of proof in [TTW74]strongly relies both on the fact that X is a gradient perturbation of Z ( α ) , and apparentlycan not be applied in a general setting.In this paper we propose a method which makes it possible both to prove the weakuniqueness of the solution to (1.1) and to analyze the properties of the transition probabilitydensity of the corresponding Markov process. Our standing assumptions are that that thecoeﬃcient σ is non-degenerate, b, σ are Hölder continuous, and the balance condition (1.3)holds. We note that our weak uniqueness result is quite sharp, since the only “gap” between(1.2) and the balance condition (1.3) is the “critical case” α + γ = 1 . We postpone the studyof the critical case for a further research; our conjecture is that the weak uniqueness in thiscase still holds true. An important reference concerning the transition probability density isthe recent paper [DF13], where under the balance condition and the assumption of the weakuniqueness of the solution it was shown that the solution to SDE (1.1) possesses a distributiondensity and this density belongs to certain Besov space. The class of Lévy noises allowed in[DF13] is wider than our α -stable one; on the other hand, in this particularly important casewe give a much more detailed information about the transition probability density, especiallyabout its small time behavior. We also note that the weak uniqueness in the “super-critical”case α < is a non-trivial property which can not be derived from suﬃcient conditionsavailable in the ﬁeld so far; we postpone the detailed discussion to Section 2.3 below.Let us explain the heuristics behind the balance condition. First, we note that for α ≥ this condition holds true for any Hölder continuous b , which makes this case similar to thediﬀusive one. This vaguely can be interpreted as follows: in this case, the stochastic part ofthe equation dominates the drift part. For α < such a domination fails, and the balancecondition (1.3) reﬂects the necessity to cooperate somehow the partial regularity property ofthe drift with the regularization properties of the noise. Namely, if b is γ -Hölder continuous,the ODE which corresponds to the deterministic part in (1.1) may fail to have the uniquenessproperties, but it is still possible to bound the distance at time t between any two solutionsto the Cauchy problem with same initial conditions; see Section 3.2 below. This boundhas the form Ct / (1 − γ ) , and the natural scale for the α -stable process is t /α . The balancecondition (1.3) is equivalent to / (1 − γ ) > /α ; that is, Ct / (1 − γ ) ≪ t /α , t → . Therefore21.3) is essentially the condition for the noise to be “intensive enough” to make negligible an“analytical uncertainty” caused by the deterministic part of the equation.Both our proof of the weak uniqueness and our estimates for the transition probabilitydensity are based on an analytical construction, which interprets the transition probabilitydensity as a fundamental solution for the parabolic Cauchy problem associated with the(formal) generator of the Markov process X deﬁned by (1.1), and provides this solutionby means of a certain version of the parametrix method . Some part of this constructionwas developed recently in [KK14]. In the modiﬁcation of the parametrix method for thesuper-critical stable SDEs, proposed in [KK14], Case C , the “zero order approximation” ofthe unknown fundamental solution combines the heat kernel for the stable part with thedeterministic ﬂow which corresponds to the drift term. The drift term was supposed thereinto be Lipschitz continuous, hence the respective deterministic ﬂow was well deﬁned. Inthe current paper we ﬁnalize this construction and develop the proper substitute for thedeterministic ﬂow term, which is well deﬁned for a Hölder continuous drift coeﬃcient andstill makes the entire parametrix construction operational. We also clarify the structure ofthe law of the solution to (1.1) in a small time. Namely, we show that the principal part theprobability density of X t with X = x can be chosen as the law of e X t,x = υ t ( x ) + σ ( x ) Z ( α ) t , (1.4)where υ t ( x ) denotes some solution to the Cauchy problem for the ODE which correspondsto the deterministic part of the initial equation.The paper is organized as follows. In Section 2 we formulate the main results of thepaper and discuss the related results available in the literature. Section 3 is a preliminarilyone for the proofs, and contains an outline of the parametrix method and constructionsand estimates for approximate solutions to ODEs with Hölder continuous coeﬃcients. InSection 4 the parametrix construction is speciﬁed, and the weak uniqueness of solution to(1.1) and estimates for the transition probability density for this solution are proved. InSection 5 we evaluate the properties of the derivative ∂ t p t ( x, y ) . In this section we formulate the main results of the paper. The proofs are postponed to therest of the paper.

Through the paper we use the following notation. By C ∞ ( R d ) we denote the class of contin-uous functions vanishing at inﬁnity; clearly, C ∞ ( R d ) is a Banach space with respect to the sup -norm k · k ∞ . By C k ∞ ( R d ) , k ≥ , (respectively, C kb ( R d ) ) we denote the class of k -timescontinuously diﬀerentiable functions, vanishing at inﬁnity (respectively, bounded) togetherwith their derivatives. In the case this does not cause misunderstandings we omit R d in theabove notation; e.g. we often write C ∞ instead of C ∞ ( R d ) . As usual, a ∧ b := min( a, b ) , a ∨ b := max( a, b ) . By | · | we denote both the modulus of a real number and the Euclidean3orm of a vector. By c and C we denote positive constants, the value of which may varyfrom place to place. Relation f ≍ g means that cg ≤ f ≤ Cg. By Γ( · ) we denote the Euler Gamma-function. We write L x f ( x, y ) in the case we need toemphasize that an operator L acts on a function f ( x, y ) with respect to variable x . We usethe following notation for space and time-space convolutions of functions respectively: ( f ∗ g ) t ( x, y ) := Z R d f t ( x, z ) g t ( z, y ) dz, ( f ⊛ g ) t ( x, y ) := Z t Z R d f t − s ( x, z ) g s ( z, y ) dzds. In what follows we specify the conditions on the objects involved in (1.1), which we assumeto hold true throughout the entire paper. Since our aim is to explain the main results andthe methodology of the proofs in a most transparent way, we do not strive for imposingmost general conditions possible. A the end of this subsection, a list of possible extensionsis brieﬂy discussed.Let Z ( α ) , α ∈ (0 , , be a Lévy process in R d with E e i ( ξ,Z ( α ) t ) = e − t | ξ | α , ξ ∈ R d ; that is, a rotationally invariant α -stable process. It is well known that the generator of the C ∞ -semigroup associated with Z ( α ) is an extension of the operator L ( α ) f ( x ) = P.V. Z R d (cid:16) f ( x + u ) − f ( x ) (cid:17) c α,d | u | d + α du, f ∈ C ∞ , (2.1)where c α,d is a constant which we do not need to specify here. The operator L ( α ) is alsocalled a fractional Laplacian , and is denoted by − ( − ∆) α/ .By g ( α ) ( x ) we denote the distribution density of the variable Z ( α )1 . Note that L ( α ) is ahomogeneous operator of the order α and the process Z ( α ) is self-similar: for any c > , theprocess c − /α Z ( α ) ct , t ≥ has the same law as Z ( α ) . Consequently, the transition probability density of Z ( α ) equals t − d/α g ( α ) ( t − /α ( y − x )) .The drift coeﬃcient b : R d → R d is assumed to be bounded and Hölder continuous withthe index γ ∈ (0 , : | b ( x ) | ≤ C, | b ( x ) − b ( y ) | ≤ C | x − y | γ , x, y ∈ R d . (2.2)The coeﬃcient σ is assumed to be scalar-valued. We denote by a ( x ) = | σ ( x ) | α the jumpintensity coeﬃcient. We assume this coeﬃcient to be bounded and separated from zero andHölder continuous with some index η ∈ (0 , , i.e. c ≤ a ( x ) ≤ C, | a ( x ) − a ( y ) | ≤ C | x − y | η , x, y ∈ R d . (2.3)4inally, consider the Cauchy problem to the ODE dυ t = b ( υ t ) dt, υ = x. (2.4)By the Peano theorem, this problem has a solution, but if γ < such a solution may fail tobe unique. Denote by Υ( x ) the set of all such solutions.As we have already mentioned, most of the above assumptions can be weakened. Forthe weak uniqueness result, respective conditions on coeﬃcients can hold true only locally.The coeﬃcient σ can be taken matrix-valued, and instead of the rotationally invariant stablenoise one can consider a symmetric stable noise such that its spectral measure has a densityw.r.t. the surface measure on unit sphere in R d , which is bounded and bounded away from0. A thorough proof of such an extension should require an extended version of Proposition4.1 below; we do not discuss these technical issues here, in a separate research we plan totreat these issues in a maximal generality. In all the results stated below we assume coeﬃcients b, a = | σ | α to satisfy (2.2), (2.3) andthe balance condition (1.3) to hold true. Our ﬁrst main result concerns the weak uniquenessof the solution to (1.1) and the basic properties of the corresponding Markov process. Theorem 2.1.

Equation (1.1) possesses unique weak solution X t , t ≥ , and this solutionis a Markov process. This process is a Feller one; that is, it generates a strongly continuoussemigroup P t , t ≥ in C ∞ : P t f ( x ) = E x f ( X t ) , t ≥ , f ∈ C ∞ . The generator ( A, D ( A )) of this semigroup is an extension of the operator ( L, C ∞ ) deﬁnedby Lf ( x ) = (cid:16) b ( x ) , ∇ f ( x ) (cid:17) + a ( x ) L ( α ) f ( x ) , f ∈ C ∞ . (2.5)Our second main result concerns the properties of the transition probability density ofthe process X . Theorem 2.2.

I. The Markov process X has a transition probability density p t ( x, y ) , i.e. P t f ( x ) = Z R d p t ( x, y ) f ( y ) dy, f ∈ C ∞ , t > . (2.6) This density is a continuous function of ( t, x, y ) ∈ (0 , ∞ ) × R d × R d .II. Fix a solution υ · ( x ) ∈ Υ( x ) of (2.4) and denote e p t ( x, y ) = 1 t d/α a d/α ( x ) g ( α ) (cid:18) y − υ t ( x ) t /α a /α ( x ) (cid:19) . (2.7) Denote by e r t ( x, y ) the residue term in the decomposition p t ( x, y ) = e p t ( x, y ) + e r t ( x, y ) , (2.8)5 nd put δ = 1 − /α + γ/α (which is positive by the balance condition). Then for any χ ∈ (0 , α ∧ η ) and T > , the following estimate for the residue term holds true: | e r t ( x, y ) | ≤ C (cid:16) t ζ + | y − υ t ( x ) | χ ∧ (cid:17)e p t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d , (2.9) where ζ = min n δ, χ, χα o . The transition probability density p t ( x, y ) itself possesses the following two-sided esti-mate: for any T > , there exist positive c, C such that c e p t ( x, y ) ≤ p t ( x, y ) ≤ C e p t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d . (2.10) The constants c, C above does not depend on the choice of υ · ( x ) ; that is, the estimates (2.9) and (2.10) are uniform over the class Υ( x ) .Remark . It follows directly from (2.9) and the formula for e p t ( x, y ) that Z R d | e r t ( x, y ) | dy ≤ Ct ζ . Hence the residue term in (2.8) is indeed negligible, and e p t ( x, y ) represents the principalpart of p t ( x, y ) as t → . Note that the residue term is negligible in the integral sensebut not uniformly, since the right hand side term in (2.9) is comparable to e p t ( x, y ) when | y − υ t ( x ) | ≥ . Remark . The decomposition (2.7) will be obtained in two steps: ﬁrst, by means ofthe parametrix method we will construct similar representation with another principal part p t ( x, y ) (see (4.1) below); second, we will make “ﬁne tuning” of this representation in orderto re-arrange the principal part. The choice of the principal part e p t ( x, · ) in (2.8) has a clearlyseen advantage that it has a good stochastic interpretation as the distribution density of e X t,x deﬁned by (1.4).On the other hand, the set of solutions Υ( x ) is implicit and may have a complicatedstructure. Hence it might be useful for further applications to have a representation of p t ( x, y ) in a form similar to (2.8), but with υ t ( x ) changed to a more explicit term. Here webrieﬂy outline two possibilities to arrange such a representation. First, consider a sequenceof Picard-type approximations υ ,t ( x ) ≡ x, υ k,t ( x ) = x + Z t b ( υ k − ,s ( x )) ds, k ≥ , t ≥ . (2.11)Though such a procedure now typically fails to give a successful approximation for a solution,it still can be used in a comprehensive representation of the transition probability density p t ( x, y ) . Namely, denote ρ k = 1 + · · · + γ k +1 − α , k ≥ , and assume that for a given k ρ k > . (2.12)6e will show in Section 4.2 below that p t ( x, y ) have a representation similar to (2.8) with e p t ( x, y ) replaced by b p t ( x, y ) = 1 t d/α a d/α ( x ) g ( α ) (cid:18) y − υ k,t ( x ) t /α a /α ( x ) (cid:19) and corresponding residue term b r t ( x, y ) satisfying an analogue of (2.9) with e p t ( x, y ) changedto b p t ( x, y ) and ζ changed to min { ζ , ρ k / (1 − γ ) } .Note that for any k ≥ condition (2.12) is strictly stronger than the balance condition(1.3), and under (1.3) there exists k such that (2.12) holds. We note that the heuristicexplanation of the balance condition given in the Introduction well corresponds to condition(2.12): if (2.12) holds, then the “approximation error” for υ k,t ( x ) is Ct /α + ρ k ≪ t /α , t → (see Section 3.2 below), hence the noise is “intensive enough” to negate this error. Wemention that conditions (2.12) with k = 0 , k = 1 are exactly the assumptions imposed in[KK14], Case A and Case B , respectively.Another possible option is to take an approximate solution υ t ( x ) instead of υ t ( x ) , seeSection 3.2 below; in this case ζ in the analogue of (2.9) remains unchanged.In the last main result of this paper, we establish the properties of the derivative of thetransition probability density w.r.t. the time variable. The aim of the latter theorem is two-fold. On one hand, the estimates on the derivative ∂ t p t ( x, y ) have natural applications instochastic approximation problems, e.g. [GL08], [KMN14], [GK14]. On the other hand, theproof of Theorem 2.3 below clariﬁes the methodology: we will see below that the particularchoice of the “zero approximation term” to p t ( x, y ) in the parametrix method is a subtlequestion, which can be solved in various ways. Several such ways may be successful if weare only interested in assertions of Theorem 2.1 and Theorem 2.2; see Section 4.4 below.Considering in addition the derivative ∂ t actually shows that most of these choices are notsuitable for the purposes of studying sensitivities of this density. Here we consider thesensitivity w.r.t. t , only, but we expect that similar methods can be applied to other typesof sensitivities (e.g. w.r.t. x, y or w.r.t. additional parameters involved into the coeﬃcients).This is the subject of our forthcoming research. Theorem 2.3.

The function p t ( x, y ) possesses a continuous derivative ∂ t p t ( x, y ) on the set (0 , ∞ × R d × R d ) . Denote by e R t ( x, y ) the residue term in the representation ∂ t p t ( x, y ) = ∂ t e p t ( x, y ) + e R t ( x, y ) , (2.13) where e p t ( x, y ) is given by (2.7) , and denote α ′ = min { , α } . Then for any

T > and χ ∈ (0 , α ∧ η ) the following estimates for the principal and theresidue terms in (2.13) hold true: | ∂ t e p t ( x, y ) | ≤ Ct − /α ′ e p t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d , | e R t ( x, y ) | ≤ Ct − /α ′ (cid:16) t ζ + | y − υ t ( x ) | χ ∧ (cid:17)e p t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d . As a corollary, the derivative ∂ t p t ( x, y ) itself possesses the estimate | ∂ t p t ( x, y ) | ≤ Ct − /α ′ e p t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d , (2.14)7e mention that if the original equation does not contain the drift ( b ( x ) ≡ ) then in theabove estimates α ′ can be changed to 1. We also remark that the statement of Theorem 2.3is even stronger than the one formulated in Theorem 2.6 [KK14], Case C under the strongerassumption that b is Lipscghitz continuous. Such an improvement is a result of a well chosen“zero approximation term” in the parametrix construction. The weak uniqueness problem for Lévy driven SDEs is closely related to the well-posednessof the martingale problem for integro-diﬀerential operators of certain type. A large groupof papers in that direction is available, e.g. [Ba88],[Ko84a], [Ko84b], [MP92a]–[MP12], thislist is far form being complete. The weak uniqueness results available to the author, invarious forms, rely on a typical assumption which vaguely means that the “jump part” of theoperator dominates the entire operator (which, from the analytical point of view, is a kindof the sectorial condition). For the formal generator (2.5) with α < and non-zero drift,this condition fails and the SDE (1.1) in that case is far away from the domain where theavailable weak uniqueness results are applicable. Of course, if α ∈ (1 , the weak uniquenessfollows easily from the available results, e.g. [Ba88].An interesting counterpart to our weak uniqueness result is contained in the recentpreprint [CSZ15], where the strong uniqueness for an SDE of the form (1.1) with Höldercontinuous drift coeﬃcient is proved under the following assumption which looks similar tothe balance condition (1.3): α + γ/ > . Though, the results of [CSZ15] are not fullycomparable with ours because therein σ ( x ) ≡ .Our proof of the weak uniqueness is based on the parametrix construction of an (approx-imate) fundamental solution to a Cauchy problem for ∂ t − L . The argument is insensitivew.r.t. the structure of the model, and is actually based on the fact that, as soon as theparametrix construction is completed, one can construct a large family of approximate har-monic functions for the operators ∂ t − L, ∂ t + L ; see Section 4.3. We feel that the idea behindthis argument is close to the one developed in the diﬀusion setting in [BP09] and extendedin [M11], [HM14], though the particular form of the argument is diﬀerent. We mention alsothat using properly chosen approximate harmonic functions one can extend the classicalargument based on the Positive Maximum Principle for L in order to get the positivity andother properties of the semigroup P t , t ≥ ; see Section 4 in [KK14]. In the current settingthis “analytical” argument applies as well, but in order to explain all available possibilitieswe give another proof. This “probabilistic” proof is shorter but less explicit and requiresmore preliminaries about Markov processes being solutions to a martingale problem.For the background of the parametrix construction in the classical diﬀusive setting, werefer to the monograph by Friedman [Fr64]; see also the original paper by E.Levi [Le1907]and the paper by W. Feller [Fe36]. This construction was extended to equations with pseudo-diﬀerential operators in [ED81], [Ko89] and [Ko00], see also the reference list and an extensiveoverview in the monograph [EIK04]. The list of subsequent and related publications is large,and we cannot discuss it here in details. Let us only mention three recent papers: [KM11],where the discrete-time analogue of the parametrix construction for the Eurer scheme forstable-driven SDEs was developed, [CZ13], where two-sided estimates, more precise thanthose in [Ko89], were obtained, and [BK14], where the probabilistic interpretation of the8arametrix construction and its application to the Monte-Carlo simulation was developed.In all the references listed above it is required that either the stability index α is > , orthe gradient term is not involved in the equation: this is the same “sectorial type” assumptionwhich was mentioned above. If α < , because of the lack of domination of the “jump part”of the generator, the proper construction of the “zero approximation term” should involvean additional correction which corresponds to the drift, two versions of such a constructionwere proposed in [KK14], Case B and Case C . This eﬀect has a similar nature with the onerevealed in [M11], where a chain of equations is considered, where only the last equationcontains the diﬀusive term and which consequently corresponds to a degenerate diﬀusion.Because of the degeneracy, the diﬀusive term therein also lacks the domination property,and this motivates extra correction terms in the parametrix construction. We mention alsothe recent preprints [HM14], where a chain of equations driven by a stable process is studiedin a similar manner, and [H15] where an SDE driven by tempered α -stable process with α < and possibly singular spectral measure is considered. In all these references thedrift coeﬃcient is assumed to be Lipschitz continuous, which makes corresponding ODEs forcorrecting terms to be well solvable. The only exception is the Case B in [KK14], howevertherein a condition stronger than the balance condition (1.3) is imposed on the Hölder indexfor b .For brevity, we omit a discussion here and refer to [KK14] for other related topics: theheuristics for the choice of the zero-order term in the parametrix expansion, the relatedpapers which concern SDEs with singular drift terms ([Po94], [PP95], [BJ07], [KS14]), therelated results of [KS14] and [CW13] on weak uniqueness for gradient perturbations of stablegenerators, and the large group of results, focused on the construction of a semigroup fora Markov process with a given symbol rather than of the transition probability density p t ( x, y ) , which relies on the symbolic calculus approach for the parametrix construction([Ja94], [Ja96], [Ho98a], [Ho98b], [B05], [Bo08], [Ku81], [Ts74], [Iw77]). In this section we outline the parametrix method, which is our key tool for proving the mainstatements given in Section 2.2. We also develop an auxiliary construction and evaluate someresults about approximate solutions to ODEs with Hölder continuous coeﬃcients, which willbe used in the subsequent proofs. If X is a Markov process solution to (1.1), by the Itô formula one can naturally expect thatits generator ( A, D ( A )) is an extension of ( L, C ∞ ) ; the operator L is deﬁned in (2.5). Thenone can try to seek for the unknown transition probability density of X assuming it is a fundamental solution to the parabolic Cauchy problem associated with L . Recall that afunction p t ( x, y ) is said to be a fundamental solution to the Cauchy problem for an operator ∂ t − L, (3.1)9f for t > it is diﬀerentiable in t , belongs to the domain of L as a function of x , and satisﬁes (cid:16) ∂ t − L x (cid:17) p t ( x, y ) = 0 , t > , x, y ∈ R d , (3.2) p t ( x, · ) → δ x , t → , x ∈ R d ; (3.3)see [Ja02, Def. 2.7.12] in the case of a general pseudo-diﬀerential operator, which is the gen-eralization of the corresponding deﬁnition (cf. [Fr64], for example) in the parabolic/ellipticsetting.A classical tool for constructing fundamental solutions is the parametrix method , belowwe explain the version of this method which is used in the sequel. Fix some function p t ( x, y ) ,which will be considered as a “zero order approximation” to the unknown fundamental solu-tion p t ( x, y ) . Denote by r t ( x, y ) the residue term with respect to this approximation: p t ( x, y ) = p t ( x, y ) + r t ( x, y ) . (3.4)If p t ( x, y ) belongs to C and C ∞ in the variables t and x , respectively, we can put Φ t ( x, y ) := − (cid:16) ∂ t − L x (cid:17) p t ( x, y ) , t > , x, y ∈ R d . (3.5)Because p t ( x, y ) is supposed to be the fundamental solution for the operator (3.1) and A extends L , we have (cid:16) ∂ t − L x (cid:17) r t ( x, y ) = Φ t ( x, y ) . If p t ( x, y ) satisﬁes an analogue of (3.3), then a formal solution to this equation can be givenin terms of the unknown fundamental solution p t ( x, y ) , and then using (3.4) we get thefollowing equation for r t ( x, y ) : r t ( x, y ) = ( p ⊛ Φ) t ( x, y ) = ( p ⊛ Φ) t ( x, y ) + ( r ⊛ Φ) t ( x, y ) . The formal solution to this equation is given by the convolution r t ( x, y ) = ( p ⊛ Ψ) t ( x, y ) , (3.6)where Ψ is the sum of ⊛ -convolution powers of Φ : Ψ t ( x, y ) = X k ≥ Φ ⊛ kt ( x, y ) . (3.7)If the series (3.7) converges and the convolution (3.6) is well deﬁned, we obtain the requiredfunction p t ( x, y ) in the form p t ( x, y ) = p t ( x, y ) + X k ≥ ( p ⊛ Φ ⊛ k ) t ( x, y ) . (3.8)Clearly, the above argument is yet purely formal; to make it rigorous, we need to provethat the parametrix construction is feasible, i.e. that the sum in the r.h.s. of (3.8) is welldeﬁned, and then to associate p t ( x, y ) with the initial operator L . This last step is far from10eing trivial, hence we postpone its discussion to Section 4.3. Here we just mention that p t ( x, y ) which we actually obtain is not a fundamental solution in the classical sense exposedabove. However, it still can be interpreted as an (approximate) fundamental solution in asense, which is completely suﬃcient for all our purposes. The ﬁrst step is more direct, andwe give here a generic calculation which we use to analyse the convolution powers involvedin (3.7) in a uniﬁed way.We say that a non-negative kernel { H t ( x, y ) , t > , x, y ∈ R d } has a sub-convolutionproperty , if for every T > there exists a constant C H,T > such that ( H t − s ∗ H s )( x, y ) ≤ C H,T H t ( x, y ) , t ∈ (0 , T ] , s ∈ (0 , t ) , x, y ∈ R d . (3.9)For the following general estimate we refer to [KK14], Lemma 3.2 (the estimate (3.15)below slightly diﬀers from (3.29) in [KK14], but its proof is completely the same). Lemma 3.1.

Suppose that function Φ t ( x, y ) satisﬁes (cid:12)(cid:12) Φ t ( x, y ) (cid:12)(cid:12) ≤ C Φ ,T (cid:16) t − δ H t ( x, y ) + t − δ H t ( x, y ) (cid:17) , t ∈ (0 , T ] , x, y ∈ R d , (3.10) with some δ , δ ∈ (0 , and some non-negative kernels H t ( x, y ) , H t ( x, y ) . Assume also thatthe kernels H it ( x, y ) , i = 1 , , have the sub-convolution property with constant C H,T , and H t ( x, y ) ≥ H t ( x, y ) . (3.11) Then for any t ∈ (0 , T ] , x, y ∈ R d , we havea) (cid:12)(cid:12)(cid:12) Φ ⊛ kt ( x, y ) (cid:12)(cid:12)(cid:12) ≤ C C k Γ( kζ ) t − k − ζ (cid:16) t δ H t ( x, y ) + t δ H t ( x, y ) (cid:17) , k ≥ , (3.12) where C = (3 C H,T ) − , C = 3 C Φ ,T C H,T Γ( ζ ) , and ζ = δ ∧ δ ; (3.13) b) the series P ∞ k =1 Φ ⊛ kt ( x, y ) is absolutely convergent and (cid:12)(cid:12)(cid:12) ∞ X k =1 Φ ⊛ kt ( x, y ) (cid:12)(cid:12)(cid:12) ≤ C (cid:16) t − δ H t ( x, y ) + t − δ H t ( x, y ) (cid:17) ; (3.14) c) (cid:12)(cid:12)(cid:12)(cid:16) H ⊛ ∞ X k =1 Φ ⊛ kt (cid:17) ( x, y ) (cid:12)(cid:12)(cid:12) ≤ C (cid:16) t δ H t ( x, y ) + t δ H t ( x, y ) (cid:17) . (3.15)Hence to make the parametrix construction feasible it is suﬃcient to choose zero orderapproximation p t ( x, y ) in such a way that the corresponding function Φ , deﬁned by (3.5),satisﬁes (3.10) and p t ( x, y ) ≤ C T H t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d . (3.16)If the kernels H , H have the sub-convolution property and satisfy (3.11), it will follow fromLemma 3.1 that all the convolution powers in (3.7) are well deﬁned, the series converges,and the residue term (i.e. the sum of the series in the right hand side of (3.8)) possesses anupper bound of the form | r t ( x, y ) | ≤ C (cid:16) t δ H t ( x, y ) + t δ H t ( x, y ) (cid:17) . (3.17)11 .2 Approximate solutions to ODEs with Hölder continuous coeﬃ-cients The proper choice of the zero order approximation p t ( x, y ) in the construction outlined aboveis a delicate point. In [KK14], in the case of a Lipschitz continuous drift coeﬃcient b , thisapproximation was chosen in the form p t ( x, y ) = 1 t d/α a d/α ( y ) g ( α ) (cid:18) θ t ( y ) − xt /α a /α ( y ) (cid:19) , (3.18)with θ t ( y ) being the unique solution to the Cauchy problem dθ t = − b ( θ t ) dt, θ = y. (3.19)We will use similar p t ( x, y ) in the current setting, where b is assumed to be Hölder con-tinuous, only. Because now (3.19) may have multiple solutions, this leads to a necessity tochoose the “correcting term” θ t ( y ) more carefully. In the current section we provide auxiliaryconstruction and give some results which will be used in such a choice and the subsequentanalysis of the series (3.8).Consider the following “molliﬁed” family generated by the drift coeﬃcient b : b ( t, x ) = (2 π ) − d/ t − d/α Z R d b ( z ) e −| z − x | / t /α dz, t ≥ . Because b is γ -Hölder continuous, we have | b ( x ) − b ( t, x ) | ≤ Ct − d/α Z R d | z − x | γ e −| z − x | / t /α dz ≤ Ct γ/α . (3.20)On the other hand, for positive t respective b ( t, · ) is smooth and ∇ b ( t, x ) = (2 π ) − d/ t − d/α Z R d b ( z ) (cid:18) z − xt /α (cid:19) e −| z − x | / t /α dz = (2 π ) − d/ t − d/α Z R d (cid:0) b ( z ) − b ( x ) (cid:1) (cid:18) z − xt /α (cid:19) e −| z − x | / t /α dz. Hence |∇ b ( t, x ) | ≤ Ct − d/α − /α Z R d | z − x | γ e −| z − x | / t /α dz ≤ Ct γ/α − /α . (3.21)In particular, b t satisﬁes the Lipschitz condition with the Lipschitz constant L ( t ) = Ct γ/α − /α .Observe that by the balance condition (1.3), Z T L ( t ) dt < ∞ , T > . (3.22)Then for any ﬁxed s ∈ R the following Cauchy problems have unique solutions: ddt υ t = b ( | t − s | , υ t ) , υ = x, dt θ t = − b ( | t − s | , θ t ) , θ = y. We denote these solutions by υ st ( x ) and θ st ( y ) , respectively.For any υ · ( x ) ∈ Υ( x ) and ≤ s ≤ t ≤ T we have υ t ( x ) − υ st ( x ) = Z t (cid:16) b ( υ r ( x )) − b ( | r − s | , υ sr ( x )) (cid:17) dr = Z t (cid:16) b ( | r − s | , υ r ( x )) − b ( | r − s | , υ sr ( x )) (cid:17) dr + h st ( x ) , where by (3.20) | h st ( x ) | = (cid:12)(cid:12)(cid:12)(cid:12)Z t (cid:16) b ( υ r ( x )) − b ( | r − s | , υ r ( x )) (cid:17) dr (cid:12)(cid:12)(cid:12)(cid:12) ≤ Ct γ/α . Then by (3.22) and the Gronwall inequality, | υ t ( x ) − υ st ( x ) | ≤ Ct γ/α = Ct /α + δ , (3.23)see the statement II of Theorem 2.2 where δ is deﬁned. This means that υ st ( x ) approximatesthe value υ t ( x ) for any solution υ · ( x ) ∈ Υ( x ) with the accuracy Ct /α + δ Clearly, it followsfrom (3.23) that for any t ∈ [0 , T ] , s, r ∈ [0 , t ] | υ st ( x ) − υ rt ( x ) | ≤ Ct /α + δ . (3.24)Similarly, we have for any t ∈ [0 , T ] , s, r ∈ [0 , t ] | θ st ( x ) − θ rt ( x ) | ≤ Ct /α + δ . (3.25)To explain the heuristics behind the following lemma, assume for a while that b is Lipschitzcontinuous, then the usual argument based on the uniqueness properties for correspondingODEs shows that { υ t } and { θ t } are mutually inverse ﬂows of solutions to (2.4) and (3.19),respectively. Moreover, each υ t , θ t : R d → R d is a mapping which diﬀers from the identityby function which satisﬁes Lipschitz conbdition with the constant Ct , hence for any t ∈ [0 , T ] , s ∈ [0 , t ] e − Ct | x − θ t ( y ) | ≤ | υ t − s ( x ) − θ s ( y ) | ≤ e Ct | x − θ t ( y ) | . (3.26)Exactly this inequality was used in [KK14] as a key ingredient in the proof of the sub-convolution property of the kernels associated with p t ( x, y ) deﬁned by (3.18). In the currentsetting, we will use the following analogue of this inequality for the family of approximatesolutions to (2.4) and (3.19). Lemma 3.2.

For any

T > , there exists C > such that for any t ∈ [0 , T ] , s ∈ [0 , t ] e − Ct δ | x − θ t ( y ) | − Ct /α + δ ≤ | υ t − st − s ( x ) − θ s ( y ) | ≤ e Ct δ | x − θ t ( y ) | + Ct /α + δ . (3.27)13 roof. It follows from (3.21) and the fact that L ( t ) is locally integrable, that υ rt ( x ) is diﬀer-entiable in x , and its derivative satisﬁes the linear ODE ddt ( ∇ υ rt ( x )) = ( ∇ b )( | t − r | , υ rt ( x ))( ∇ υ rt ( x )) , ∇ υ r ( x ) = I R d . This implies that for t ≤ T , r ∈ [0 , t ] |∇ υ rt ( x ) − I R d | ≤ C Z t | s − r | γ/α − /α ds ≤ Ct γ/α − /α = Ct δ , and therefore for any x, x ′ e − Ct δ | x − x ′ | ≤ | υ rt ( x ) − υ rt ( x ′ ) | ≤ e Ct δ | x − x ′ | . For s, t being ﬁxed, take in the above inequality t = t − s, r = r − s : e − Ct δ | x − x ′ | ≤ | υ t − st − s ( x ) − υ t − st − s ( x ′ ) | ≤ e Ct δ | x − x ′ | . (3.28)Next, consider the approximate solution f ( · ) := υ t − s · ( θ st ( y )) , it is easy to see that g ( · ) = f ( t − · ) satisﬁes g ′ ( τ ) = b ( | τ − s | , g ( τ )) , g ( t ) = θ st ( y ) . The function θ s · ( y ) satisﬁes the same Cauchy problem, and because the solution to thisproblem is unique we have υ t − st − s (cid:16) θ st ( y ) (cid:17) = θ ss ( y ) . Taking in (3.28) x ′ = θ st ( y ) and recalling that by (3.25) we have | θ t ( y ) − θ st ( y ) | ≤ Ct /α + δ , | θ s ( y ) − θ ss ( y ) | ≤ Ct /α + δ , we complete the proof of (3.27).Finally, we brieﬂy discuss the Picard-type iteration procedure (2.11) for the ODE (2.4);see the notation introduced in Remark 2.2.Deﬁne υ ,t ( x ) ≡ x, υ k,t ( x ) = x + Z t b ( υ k − ,s ( x )) ds, k ≥ , t ≥ . For any υ · ( x ) ∈ Υ( x ) , we have | υ ,t ( x ) − υ t ( x ) | ≤ Ct and | υ k,t ( x ) − υ t ( x ) | ≤ C Z t | υ k − ,s ( x ) − υ s ( x ) | γ ds, k ≥ . Then it is easy to show by induction that | υ k,t ( x ) − υ t ( x ) | ≤ Ct γ + ··· + γ k = Ct /α + ρ k . (3.29)14 Proofs of Theorem 2.1 and Theorem 2.2

Our ﬁrst step in the proof of Theorem 2.1 and Theorem 2.2 is to specify the parametrixconstruction outlined before; that is, to choose p t ( x, y ) and to prove that the sum of theseries (3.8) is well deﬁned. We denote κ t = θ t and put p t ( x, y ) = 1 t d/α a d/α ( y ) g ( α ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) . (4.1)That is, the structure of p t ( x, y ) is similar to the one proposed in (3.18), but an exact solutionto (3.19) therein is replaced by an approximate solution θ st with s = 0 . Such a choice of p t ( x, y ) is not the only possible, and it is far from being evident which choice is the best.We discuss this subtle point in Section 4.4.Below we evaluate the kernel Φ and give an upper bound for it. The calculations here aresimilar to those made in Section 3.1 [KK14]. In order to make the exposition self-suﬃcient,we explain the key points of the main argument, referring for technicalities to [KK14].First, we formulate auxiliary statements we use in the calculation. For the proofs we referto Appendix A, [KK14]. Denote for λ > G ( λ ) ( x ) = (cid:0) | x | ∨ − d − λ , x ∈ R d . (4.2) Proposition 4.1.

1. For any λ > , c > there exists C > such that G ( λ ) ( cx ) ≤ CG ( λ ) ( x ) . (4.3)

2. For any λ > λ , G ( λ ) ( x ) ≤ G ( λ ) ( x ) . (4.4)

3. For any ε ∈ (0 , λ ) , | x | ε G ( λ ) ( x ) ≤ CG ( λ − ε ) ( x ) . (4.5)

4. For any α ∈ (0 , , g ( α ) ( x ) ≍ G ( α ) ( x ) , (4.6) (cid:12)(cid:12)(cid:12) ( ∇ g ( α ) )( x ) (cid:12)(cid:12)(cid:12) ≤ CG ( α +1) ( x ) , (4.7) (cid:12)(cid:12)(cid:12) ( L ( α ) g ( α ) )( x ) (cid:12)(cid:12)(cid:12) ≤ CG ( α ) ( x ) , (4.8) (cid:12)(cid:12)(cid:12) ( ∇ L ( α ) g ( α ) )( x ) (cid:12)(cid:12)(cid:12) ≤ CG ( α +1) ( x ) , (4.9) (cid:12)(cid:12)(cid:12) ( ∇ g ( α ) )( x ) (cid:12)(cid:12)(cid:12) ≤ CG ( α +2) ( x ) . (4.10)Denote Q ( λ ) t ( x, y ) := (cid:18)(cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12) λ ∧ t − λ/α (cid:19) t d/α G ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) , λ ∈ [0 , α ) . (4.11)15 emma 4.1. Let χ ∈ (0 , α ∧ η ) , T > . Then | Φ t ( x, y ) | ≤ C (cid:16) t − χ/α Q ( χ ) t ( x, y ) + ( t − χ + t − δ ) Q (0) t ( x, y ) (cid:17) , t ∈ (0 , T ] , x, y ∈ R d . (4.12) Proof.

To improve the readability, here and below we assume that

T > is ﬁxed and, if it isnot stated otherwise, in any formula containing t, x , or y we assume t ∈ (0 , T ] , x ∈ R d , y ∈ R d .Operators ∇ and L ( α ) are homogeneous with respective orders and α . From the identity ( ∂ t − L ( α ) ) (cid:2) t − d/α g ( α ) ( t − /α x ) i = 0 , we derive ∂ t p t ( x, y ) = h a ( y ) 1 a d/α ( y ) t d/α ( L ( α ) g ( α ) ) (cid:18) wa ( y ) t /α (cid:19) + (cid:16) ∂ t κ t ( y ) , a d/α +1 ( y ) t d/α +1 ( ∇ g ( α ) ) (cid:18) wa ( y ) t /α (cid:19) (cid:17)i(cid:12)(cid:12)(cid:12) w = κ t ( y ) − x . On the other hand, L x p t ( x, y ) = h a ( x ) 1 a d/α ( y ) t d/α ( L ( α ) g ( α ) ) (cid:18) wa ( y ) t /α (cid:19) − (cid:16) b ( x ) , a d/α +1 ( y ) t d/α +1 ( ∇ g ( α ) ) (cid:18) wa ( y ) t /α (cid:19) (cid:17)i(cid:12)(cid:12)(cid:12) w = κ t ( y ) − x . Because ∂ t κ t ( y ) = − b ( t, κ t ( y )) , we ﬁnally get Φ t ( x, y ) = (cid:0) L x − ∂ t (cid:1) p t ( x, y )= (cid:16) a ( x ) − a ( y ) (cid:17) t d/α +1 a d/α +1 ( y ) ( L ( α ) g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) + 1 t ( d +1) /α a ( d +1) /α ( y ) (cid:18) b ( t, κ t ( y )) − b ( x ) , ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19)(cid:19) =: Φ t ( x, y ) + Φ t ( x, y ) . (4.13)We estimate Φ t ( x, y ) , Φ ( x, y ) separately. Because a is η -Hölder continuous and bounded,we have for any χ ≤ η | a ( x ) − a ( y ) | ≤ C (cid:16) | x − y | χ ∧ (cid:17) . Since b is bounded, each b ( t, · ) is bounded by the same constant, and therefore | κ t ( y ) − y | ≤ ct .Then by an elementary inequality | u + v | χ ∧ ≤ C ( | u | χ ∧ | v | χ ∧ we get ﬁnally | a ( x ) − a ( y ) | ≤ c ( | κ t ( y ) − x | χ ∧

1) + ct χ . (4.14)Then by (4.3), (4.4), and (4.8) | Φ t ( x, y ) | ≤ Ct − χ/α (cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12) χ t d/α G ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) + Ct − χ t d/α G ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) Ct − χ/α Q ( χ ) t ( x, y ) + Ct − χ Q (0) t ( x, y ) . Next, recall that b ( t, x ) is Lipschitz continuous with the constant L ( t ) = Ct γ/α − /α = Ct − δ , and (3.20) holds. Then | b ( t, κ t ( y )) − b ( x ) | ≤ C (cid:16) t − δ | κ t ( y ) − x | + t γ/α (cid:17) = Ct − /α + δ (cid:18) | κ t ( y ) − x | t /α + 1 (cid:19) . (4.15)Observe that by (4.3), (4.5), and (4.7) we have (cid:18) | κ t ( y ) − x | t /α + 1 (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ CG ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) . (4.16)This gives | Φ t ( x, y ) | ≤ Ct − δ Q (0) t ( x, y ) . Combining the above estimates for Φ t ( x, y ) , Φ ( x, y ) , we complete the proof.To estimate the convolution powers Φ ⊛ kt ( x, y ) , k ≥ inductively, we modify slightly theabove upper bound for Φ . For λ ∈ [0 , α ) , deﬁne H ( λ ) t ( x, y ) := (cid:18)(cid:18)(cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12) λ ∨ (cid:17) ∧ t − λ/α (cid:19)(cid:19) t d/α G ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) . (4.17)Clearly, Q ( λ ) t ( x, y ) ≤ H ( λ ) t ( x, y ) , and therefore a (weaker) analogue of (4.11) with Q ( χ ) t ( x, y ) , Q (0) t ( x, y ) replaced by H ( χ ) t ( x, y ) , H (0) t ( x, y ) holds true. The advantage of this weaker estimate is that the kernels H ( χ ) t ( x, y ) , H (0) t ( x, y ) have the sub-convolution property, and therefore we can use Lemma 3.1. Lemma 4.2.

For every λ ∈ [0 , α ) , the kernel H ( λ ) t ( x, y ) has the sub- and super-convolutionproperties.Proof. We prove the sub-convolution property, only: the proof of the super-convolution oneis completely analogous and is omitted. Denote K ( λ ) t ( x, y ) = (cid:18)(cid:18)(cid:12)(cid:12)(cid:12) y − xt /α (cid:12)(cid:12)(cid:12) λ ∨ (cid:17) ∧ t − λ/α (cid:19)(cid:19) t d/α G ( α ) (cid:18) y − xt /α (cid:19) , then K ( λ ) t ( x, y ) has the sub-convolution property, see Proposition 3.3 in [KK14], Case A . Wehave H ( λ ) t ( x, y ) = K ( λ ) t ( x, θ t ( y )) . Note that K ( λ ) t ( x, y ) is a piece-wise power type function of | x − y | /t /α . Then one can easilyderive from (3.27) that, for a given T > , there exist positive constants C , C such that forany T ∈ [0 , T ] , s ∈ [0 , t ] C H ( λ ) t ( x, y ) ≤ K ( λ ) t ( υ t − st − s ( x ) , θ s ( y )) ≤ C H ( λ ) t ( x, y ) . (4.18)17ow, in order to obtain the required property of H ( λ ) t ( x, y ) , we apply ﬁrst the right handside inequality in (4.18) with t ′ = t − s and s ′ = 0 , then the sub-convolution property of K ( λ ) t ( x, y ) , and then the right hand side inequality in (4.18): ( H t − s ∗ H s )( x, y ) ≤ C Z R d K ( λ ) t − s ( υ t − st − s ( x ) , z ) K ( λ ) s ( z, θ s ( y )) dz ≤ CK ( λ ) t ( υ t − st − s ( x ) , θ s ( y )) ≤ CH ( λ ) t ( x, y ) . Now we have all the conditions of Lemma 3.1 satisﬁed with H = H ( χ ) , H = (0) , δ = χ/α, δ = δ ∧ χ. In addition, we have (3.16), hence by Lemma 3.1 the convolution powers Φ ⊛ k , k ≥ are well deﬁned and the series in the right hand side of (3.8) converge absolutely.In addition, for the residue term r t ( x, y ) = p t ( x, y ) − p t ( x, y ) we have the upper bound (3.17)with H t ( x, y ) = H ( χ ) t ( x, y ) , H t ( x, y ) = H (0) t ( x, y ) , δ = χα , δ = χ ∧ δ. We remark that this upper bound actually gives the following estimate for r t ( x, y ) , similarto (2.9): | r t ( x, y ) | ≤ C (cid:16) t ζ + | y − υ t ( x ) | χ ∧ (cid:17) p t ( x, y ) , ζ = min n δ, χ, χα o . (4.19)Indeed, by (4.6) we have directly H (0) t ( x, y ) ≤ Cg t ( x, y ) . (4.20)On the other hand, H ( χ ) t ( x, y ) = (cid:18)(cid:18)(cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12) χ ∨ (cid:17) ∧ t − χ/α (cid:19)(cid:19) H (0) t ( x, y ) ≤ (cid:18)(cid:18)(cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12) χ + 1 (cid:17) ∧ t − χ/α (cid:19)(cid:19) H (0) t ( x, y ) , hence t χ/α H ( χ ) t ( x, y ) ≤ (cid:16) ( | κ t ( y ) − x | χ + t χ/α ) ∧ (cid:17) H (0) t ( x, y ) ≤ (cid:16) t χ/α + | κ t ( y ) − x | χ ∧ (cid:17) H (0) t ( x, y ) . Combined with (4.20) and (3.17), this provides (4.19).The estimates, which we have just obtained, actually yield that the series in the righthand side of (3.8) converge uniformly for t ∈ [0 , T ] , x, y ∈ R d for every T . Because p t ( x, y ) and Φ t ( x, y ) are continuous in ( t, x, y ) which possess explicit upper bounds, it is a routinecalculation to show that each term p ⊛ Φ ⊛ k , k ≥ is continuous in ( t, x, y ) ; e.g. [KK14],Section 3.3. This proves the assertion of Theorem 2.1 on the continuity of p t ( x, y ) . The sameargument actually shows that the identity (2.6) deﬁnes a family P t , t > of bounded linearoperators on C ∞ , which is strongly continuous; that is, k P t f − P s k ∞ → , t → s, s > , f ∈ C ∞ . .2 “Fine tuning” of the decomposition of p t ( x, y ) We have obtained p t ( x, y ) in the form (3.4) with p t ( x, y ) deﬁned by (4.1). Now we re-arrangethis decomposition to the form (2.8) and prove the estimate (2.9) for the respective residueterm.First, we note that (3.23) with s = t and (3.27) with s = 0 yield e − Ct δ | x − κ t ( y ) | − Ct /α + δ ≤ | υ t ( x ) − y | ≤ e Ct δ | x − κ t ( y ) | + Ct /α + δ . (4.21)Next, we obtain the following property of the function g ( α ) . Lemma 4.3.

For any

V > there exists a constant C V such that for any v ∈ [0 , V ] andany x, y ∈ R d such that e − v | x | − v ≤ | y | ≤ e v | x | + v, (4.22) the following inequality holds: e − C V v g ( α ) ( x ) ≤ g ( α ) ( y ) ≤ e C V v g ( α ) ( x ) . (4.23) Proof.

We note that g ( α ) is rotationally invariant, hence without loss of generality we canrestrict ourselves to the case where x and y have same direction. We have g ( α ) ( x ) g ( α ) ( y ) = exp (cid:26)Z (cid:16)(cid:0) ∇ log g ( α ) (cid:1) ( sx + (1 − s ) y ) , x − y (cid:17) ds (cid:27) . By (4.6), (4.7), and (4.5), (cid:12)(cid:12) ∇ log g ( α ) ( x ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) ∇ g ( α ) ( x ) g ( α ) ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C

11 + | x | , hence (cid:12)(cid:12)(cid:12)(cid:12)Z (cid:16)(cid:0) ∇ log g ( α ) (cid:1) ( sx + (1 − s ) y ) , x − y (cid:17) ds (cid:12)(cid:12)(cid:12)(cid:12) ≤ C | x − y | s ∈ [0 , | sx + (1 − s ) y | . Because x, y have the same direction, it follows from (4.22) that | x − y | ≤ C (cid:18) ( e v −

1) min s ∈ [0 , | sx + (1 − s ) y | + v (cid:19) . Together with the previous inequality, this yields (cid:12)(cid:12)(cid:12)(cid:12)Z (cid:16)(cid:0) ∇ log g ( α ) (cid:1) ( sx + (1 − s ) y ) , x − y (cid:17) ds (cid:12)(cid:12)(cid:12)(cid:12) ≤ C V v, which completes the proof of the required statement.As a corollary of (4.21) and (4.23), we get e − Ct δ e p t ( x, y ) ≤ p t ( x, y ) ≤ e Ct δ e p t ( x, y ) , t ∈ (0 , T ] , (4.24)19here the constant C depends on T , only. This yields that the residue term r t ( x, y ) satisﬁesanalogue of (4.19) with p t ( x, y ) in the right hand side replaced by e p t ( x, y ) . This also yieldsthat | p t ( x, y ) − e p t ( x, y ) | ≤ Ct δ e p t ( x, y ) , t ∈ (0 , T ] . Together with the above analogue of (4.19), this completes the proof of (2.9).The same strategy can be used in order to prove the two-sided estimate (2.10) for p t ( x, y ) .First, we prove similar two-sided estimate with p t ( x, y ) instead of e p t ( x, y ) : the upper boundis straightforward, the proof of the lower bound is completelky analogous to the proof ofTheorem 2.5 [KK14] and thus is omitted. Then (2.10) follows by (4.24).The same argument can be also applied to provide the representations of p t ( x, y ) outlinedin Remark 2.2. Namely, by (3.24) and (3.27), e − Ct δ | x − κ t ( y ) | − Ct /α + δ ≤ | υ t ( x ) − y | ≤ e Ct δ | x − κ t ( y ) | + Ct /α + δ . This yields an analogue of (2.9), when υ t ( x ) in the deﬁnition of e p t ( x, y ) is replaced by υ t ( x ) .On the other hand, (3.27) and (3.29) yield e − Ct δ | x − κ t ( y ) | − Ct /α + δ − Ct /α + ρ k ≤ | υ k,t ( x ) − y | ≤ e Ct δ | x − κ t ( y ) | + Ct /α + δ + Ct /α + ρ k , which provides the required error bound in the case where υ t ( x ) in the deﬁnition of e p t ( x, y ) is replaced by υ k,t ( x ) .Finally, we mention that e p t ( x, y ) is the distribution density of e X t,x deﬁned by (1.4), and e X t,x weakly converge to x when t → . Combined with (2.9), this means that if we extendthe family of operators P t , t > (see (2.6)) by the natural convention that P is the identityoperator in C ∞ , we get a strongly continuous family P t , t ≥ . Here we outline the construction of the approximate fundamental solution, introduced in[KK14], which shows that p t ( x, y ) constructed above indeed solves the Cauchy problem (3.2),(3.3) in a certain approximate sense. The main diﬃculty here is caused by the following.Because D ( L ) = C ∞ , to apply L x to p t ( x, y ) one should preliminarily specify ∇ x p t ( x, y ) , ∇ xx p t ( x, y ) . The function p t ( x, y ) is presented in (3.8), and the naive argument would be toapply ∇ x , ∇ xx to each term in this representation. The functions ∇ x p t ( x, y ) , ∇ xx p t ( x, y ) are explicit and well deﬁned, however in general they exhibit strongly singular behavior (e.g.(4.1) in [KK14]) which makes it impossible to apply, say, ∇ xx to any integral term in (3.8).Motivated by this observation, we introduce the family of functions { p ε t ( x, y ) , ε > } , p t,ε ( x, y ) = p t + ε ( x, y ) + Z t Z R d p t − s + ε ( x, z )Ψ s ( z, y ) dzds, (4.25)and deﬁne P t,ε f ( x ) := Z R d p t,ε ( x, y ) f ( y ) dy, t ≥ , x ∈ R d , f ∈ C ∞ ( R d ) . (4.26)20he additional time shift by positive ε removes the singularity at the point s = t and resolvesthe diﬃculty outlined above. Namely, we have the following properties, which easily followsfrom the explicit representation for p t ( x, y ) , and the parametrix estimates for the kernels Φ t ( x, y ) ⊛ k ; see [KK14], Lemma 4.1 for the detailed proof which actually does not relies onthe speciﬁc properties of the model. Lemma 4.4.

1. For every f ∈ C ∞ ( R d ) , ε > the function P t,ε f ( x ) belongs to C as afunction of t , to C ∞ as a function of x , and the functions ∂ t P t,ε f ( x ) , L x P t,ε f ( x ) arecontinuous w.r.t. ( t, x ) .2. For every f ∈ C ∞ ( R d ) , T > , k P t,ε f − P t f k ∞ → , ε → , (4.27) uniformly in t ∈ [0 , T ] , and for every ε > P t,ε f ( x ) → , | x | → ∞ (4.28) uniformly in t ∈ [0 , T ] . The following lemma shows that the family p ε,t ( x, y ) , ε > satisﬁes (3.2), (3.3) in a (weak)approximate sense. This is the reason for us to call this family an approximate fundamentalsolution to (3.2), (3.3).Denote ∆ t,ε f ( x ) = (cid:0) ∂ t − L x (cid:1) P t,ε f ( x ) , f ∈ C ∞ ( R d ) . (4.29) Lemma 4.5.

For any f ∈ C ∞ ( R d ) we have1. ∆ t,ε f ( x ) → , ε → , (4.30) uniformly in ( t, x ) ∈ [ τ, T ] × R d for any τ > , T > τ ;2. lim t,ε → k P t,ε f − f k ∞ = 0 . The second assertion easily follows from the formula (4.25), estimates on Ψ from Section4.1, and the facts that e p t ( x, y ) → δ x ( y ) , t → and p t ( x, y ) − e p t ( x, y ) posses an estimatesimilar to (2.9). For the ﬁrst assertion, we refer to the proof of Lemma 5.2 in [KK14], whichis not speciﬁc with respect to the model and relies only on the smoothness properties of p t ( x, y ) , Φ t ( x, y ) , Ψ t ( x, y ) , the estimates for these functions, and the fact that the function Ψ satisﬁes the integral equation Φ t ( x, y ) = Ψ t ( x, y ) − Z t Z R d Φ t − s ( x, z )Ψ s ( z, y ) dzds. All these ingredients are already available in the current setting due to the parametrixconstruction of Section 4.1. 21 orollary 4.1.

Let f ∈ C ∞ , T > be ﬁxed, deﬁne for t ∈ [0 , T ] , x ∈ R d , ε > g ( t, x ) = P t f ( x ) , g ε ( t, x ) = P t,ε f ( x ) , g T ( t, x ) = P T − t f ( x ) , g Tε ( t, x ) = P T − t,ε f ( x ) . Then for any

T > • for every ε > , the functions ∂ t g ε ( t, x ) , L x g ε ( t, x ) , ∂ t g Tε ( t, x ) , L x g Tε ( t, x ) are well de-ﬁned, continuous and bounded on [0 , T ] × R d ; • for any τ ∈ (0 , T ) , ( ∂ t − L x ) g ε ( t, x ) → , ( ∂ t + L x ) g Tε ( t, x ) → , ε → uniformly w.r.t. t ∈ [ τ, T ] , x ∈ R d and t ∈ [0 , T − τ ] , respectively; • g ε ( t, x ) → g ( t, x ) , g Tε ( t, x ) → g T ( t, x ) , ε → uniformly w.r.t. t ∈ [0 , T ] , x ∈ R d . This corollary well illustrates the heuristics which motivates the notion of the approxi-mative fundamental solution. If we were able to prove that p t ( x, y ) is smooth enough andis a classical fundamental solution, we would be typically able to prove that the functions g ( t, x ) , g T ( t, x ) deﬁned above are harmonic functions for the operators ∂ t − L and ∂ t + L ,respectively. Proving the required smoothness, if even being possible, is a tough problemwhich would require a more delicate analysis of the structure of the initial equation. Onthe other hand, using the basic parametrix structure only, we are able to approximate thesefunctions by some families in such a way that the respective operators ∂ t − L and ∂ t + L onthese families vanish, in a sense. This makes it possible to treat the functions g ( t, x ) , g T ( t, x ) as approximate harmonic , which appears to be quite fruitful. A good illustration for thispoint is given by the simple proof of the weak uniqueness of the solution to (1.1) which weexplain further on.Let X be any weak solution to (1.1), without loss of generality we can assume it hastrajectories in the space of cádlág functions D ( R + , R d ) . It follows by the Itô formula (e.g.[IW81], Chapter II) that for any f ∈ C ∞ the process f ( X t ) − Z t Lf ( X s ) ds (4.31)is a martingale; that is, the distribution of X in D ( R + , R d ) is a solution to the martingaleproblem ( L, C ∞ ) . Hence to prove the weak uniqueness it is suﬃcient to show that thismartingale problem is well posed in D ( R + , R d ) , which means that for any probability measure µ on R d there exists at most one measure P on D ( R + , R d ) such that P ( X ∈ du ) = µ ( du ) andfor any f ∈ C ∞ the process (4.31) is a martingale w.r.t. P . By Corollary 4.4.3 in [EK86], todo this it is suﬃcient to prove that for any two such measures corresponding one-dimensionalprojections coincide. Below we ﬁx arbitrary solution P of martingale problem ( L, C ∞ ) in D ( R + , R d ) with P ( X ∈ du ) = µ ( du ) and specify its one-dimensional projections. In whatfollows, we denote by E P the expectation w.r.t. P .22irst, we mention that P corresponds to a stochastically continuous process. Indeed, forany pair of open sets U, V such that their closures are compact and disjoint, there exists afunction f ∈ C ∞ such that f ( x ) = 0 , x ∈ U and f ( x ) = 1 , x ∈ V . Then for every s ≥ wehave P ( X s ∈ U, X t ∈ V ) ≤ E P ( f ( X t ) − f ( X s )) = E P Z ts Lf ( X r ) dr → , t → s + . On the other hand, since P is a measure on a Polish space D ( R + , R d ) , it is tight, and inparticular for any ε > , T > there exists a compact set K ε,T ⊂ R d such that P ( X s ∈ K ε,T ) ≥ − ε, s ∈ [0 , T ]; see [EK86], Section 3.5. Combining these two observations we easily get the required stochas-tic continuity. Now we can apply assertion (a) of Lemma 4.3.4 in [EK86] and get that forany function g ( t, x ) , t ∈ [0 , T ] , x ∈ R d which is diﬀerentiable w.r.t. t , belongs to D ( L ) = C ∞ w.r.t. x , and has continuous ∂ t g ( t, x ) , L x g ( t, x ) , the following process is a martingale w.r.t.measure P : g ( t, X t ) − Z t (cid:16) ∂ s g ( s, X s ) + L x g ( s, X s ) (cid:17) ds. Now all the preliminaries are complete, and we present the cornerstone of the proof.Fix any f ∈ C ∞ and T > , and consider the function g T ( t, x ) = P T − t f ( x ) . If we wouldknew that this function is harmonic and ∂ t g ( t, x ) , L x g ( t, x ) are continuous, this would implydirectly E P f ( X T ) (cid:16) = E P g T ( T, X T ) = E P g T (0 , X ) (cid:17) = E P P T f ( X ) . We avoid proving that g T ( t, x ) is smooth by using its interpretation as an approximateharmonic function, explained above. The argument remains essentially the same. Namely,we take the family g Tε ( t, x ) , ε > deﬁned in Corollary 4.1 and observe that for every τ ∈ (0 , T ) E P P τ,ε f ( X T − τ ) − E P P T,ε f ( X ) = E P g Tε ( T − τ, X T − τ ) − E P g Tε (0 , X )= E P Z T − τ ( ∂ t + L x ) g Tε ( t, X t ) dt → , ε → . This yields E P P τ f ( X T − τ ) = E P P T f ( X ) , τ ∈ (0 , T ) . Since X is stochastically continuous and P t , t ≥ is a strongly continuous family of operators,we can take τ → and get E P f ( X T ) = E P P T f ( X ) . (4.32)Since T > , f ∈ C ∞ in the above identity are arbitrary and X has a prescribed law µ w.r.t. P , this uniquely deﬁnes the one-dimensional distributions of P and hence completesthe proof of the fact that the martingale problem ( L, C ∞ ) in D ( R + , R d ) is well posed.The only assertion left unproved in our main Theorem 2.1 and Theorem 2.2 is thatthe weak solution to (1.1) exists, and it is a Markov process with the strongly continuoussemigroup in C ∞ deﬁned by (2.6). 23he existence of the weak solution follows by the standard approximation argument:approximate b, σ by smooth b n , σ n , n ≥ , consider the corresponding strong solutions X n , n ≥ , and prove that (a) the sequence ( X n , Z ( α ) ) is weakly compact in D ( R + , R d × R d ) ;(b) any weak limit point of this sequence is a weak solution to (1.1). We refer to Section 5in [KK14] for a detailed exposition of this argument in a similar setting.We have just proved that the solution to the martingale problem ( L, C ∞ ) in D ( R + , R d × R d ) both exists and unique, and corresponds to the unique weak solution to (1.1). Now we usethe general fact that if a martingale problem in D ( R + , R d ) is well posed, then (under atechnical measurablity assumption, see Theorem 4.2.4 (c), [EK86], which now easily followsby Theorem 4.2.6, [EK86]) its solution X is a strong Markov process for every s < t P (cid:16) X t ∈ A (cid:12)(cid:12)(cid:12) F s (cid:17) = P X s ( X t − s ∈ A ) , A ∈ B ( R d ); (4.33)here we denote by F s , s ≥ the natural ﬁltration on D ( R + , R d ) and by P x , x ∈ R d thesolution to the martingale problem with the initial distribution µ ( du ) = δ x ( du ) . Denote P t ( x, dy ) = p t ( x, y ) dy, t > , P ( x, dy ) = δ ( dy ) , x ∈ R d , and observe that by (4.32) and(4.33) applied to P with P ( X = x ) = 1 , for any f ∈ C ∞ : Z R d f ( y ) P t ( x, dy ) = E P f ( X t ) = E P Z R d f ( y ) P t − s ( X s , dy ) = Z R d Z R d f ( y ) P t − s ( z, dy ) P s ( x, dz ) . This proves the Chapman-Kolmogorov equality for P t ( x, dy ) , t ≥ . Hence P t , t ≥ isa semigroup of operators in C ∞ , recall that we have already proved that it is stronglycontinuous. Applying (4.32) and (4.33) once again we derive that P t ( x, dy ) , t ≥ is thetransition probability density for X , and respectively P t , t ≥ is the (Feller) semigroupcorresponding to this Markov process, and in particular each operator P t , t ≥ is non-negative. Because P x ( D ( R + , R d × R d )) = 1 , x ∈ R d , this semigroup is conservative. p t ( x, y ) Now when the proof is complete and its structure is clearly visible, we can discuss someother possibilities for the choice of the “zero order approximation” in (3.4). One such achoice is (4.1) with κ t ( y ) equal to an exact solution to (3.19). Under such a choice theformula (4.13) for Φ becomes even simpler: b ( t, κ t ( y )) changes to b ( κ t ( y )) . All the subsequentcalculation remain literally the same, with the deﬁnition of the kernels H ( λ ) t ( x, y ) respectivelychanged. A subtle technical point here is that for a given y exact solution to (3.19) is notunique and we have take to care about choosing the function κ t ( y ) in a measurable way.This diﬃculty however is not substantial and can be resolved by using a proper version ofthe Kuratovskii and Ryll-Nardzevski measurable selection theorem, e.g. [SV79], Theorem12.1.10. Another non-trivial part here is to prove the sub-convolution property for themodiﬁed kernels H ( λ ) t ( x, y ) , but this can be made in a completely similar way as we didthat for original kernels; here one should use an easy consequence of (3.27) that the sametwo-sided inequality holds true for any pair of exact solutions to (2.4) and (3.19).Similarly, for k such that (2.12) holds one can take κ t ( y ) equal to the k -th Picard typeapproximation θ k,t ( y ) for (3.19). Under such a choice, the measurability issues do not arise,and the same calculation as above can be made, with extra error terms with come from the24iﬀerence between θ k,t ( y ) and θ ( y ) . Due to these terms the index ζ in the upper bound forthe corresponding error term will be changed to min { ζ , ρ k / (1 − γ ) } .Hence, there is no visible diﬀerence between these two possibilities and the one we usedabove, if one is aimed only at the basic properties of the solution to (1.1) (weak existenceand uniqueness and the estimates for the transition probability density itself). The situationis changed drastically when the sensitivity of the transition probability density is studied:in Remark 5.1 below we indicate the point which makes our choice of κ t ( y ) apparently theonly possible one to use in the study of the derivative ∂ t p t ( x, y ) . The proof repeats the general lines of the proof of Theorem 2.6 [KK14]. Hence here weonly outline the main steps of the proof and explain in details the key point which makes itpossible both to improve the statement of Theorem 2.6 [KK14]; that is to cover the case of b ( x ) which is not Lipschitz continuous. We omit all the technicalities for which it is possibleto refer to analogous parts of the proof of Theorem 2.6 [KK14].First, it is a direct calculation to show that p t ( x, y ) deﬁned by (4.1) is continuouslydiﬀerentiable w.r.t. t ∈ (0 , ∞ and | ∂ t p t ( x, y ) | ≤ Ct − /α ′ H (0) t ( x, y ) , t ∈ (0 , T ] , x, y ∈ R d for any T > ; cf. Proposition 4.1 [KK14]. Our goal is to extend these properties of p t ( x, y ) to similar properties of p t ( x, y ) . We have the integral representation p t ( x, y ) = p t ( x, y ) + Z t Z R d p t − s ( x, z )Ψ s ( z, y ) dzds, (5.1)which is just the other form of (3.8); cf. (3.7). Because of the non-integrable singularity ( t − s ) − /α ′ of the upper bounnd for the function ∂ t p t − s ( x, z ) , we can not take ∂ t at the righthand side of (5.1) directly. Instead of that, we do the following standard trick (e.g. proof ofTheorem 2.3 in [KM02]). Rewrite (5.1) in the following way: p t ( x, y ) = p t ( x, y ) + Z t/ Z R d p t − s ( x, z )Ψ s ( z, y ) dzds + Z t/ Z R d p s ( x, z )Ψ t − s ( z, y ) dz ds, (5.2)then we avoid the singularity at s = t related to p t ( x, y ) , but instead we have to establishthe diﬀerential properties of Ψ with respect to t . This can be done in the same fashion withthe basic parametrix method. Namely, we will study the diﬀerential properties of Φ andthen extend them to k -fold convolutions Φ ⊛ k , k ≥ and their sum Ψ . We explain in detailsthe ﬁrst step, which contains the most subtle point. In what follows we use the notationfrom Section 4.1. Lemma 5.1.

The function Φ t ( x, y ) deﬁned by (3.5) has a continuous derivative ∂ t Φ t ( x, y ) , on (0 , ∞ ) × R d × R d . In addition, for any χ ∈ (0 , η ∧ α ) and T > | ∂ t Φ t ( x, y ) | ≤ Ct − /α ′ (cid:16) t − χ/α Q ( χ ) t + ( t − χ + t − δ ) Q (0) t ( x, y ) (cid:17) , t ∈ (0 , T ] , x, y ∈ R d . (5.3)25 roof. We take ∂ t separately for Φ t ( x, y ) and Φ t ( x, y ) ; see (4.13). Obviously, both thesederivatives are well deﬁned and continuous. Next, we have | ∂ t Φ t ( x, y ) | ≤ C (cid:12)(cid:12)(cid:12) a ( x ) − a ( y ) (cid:12)(cid:12)(cid:12)(cid:26) t d/α +2 (cid:12)(cid:12)(cid:12) ( L ( α ) g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12) + 1 t d/α +1+1 /α (cid:12)(cid:12)(cid:12) ( ∇ L ( α ) g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12) | ∂ t κ t ( y ) | + 1 t d/α +2 (cid:12)(cid:12)(cid:12) ( ∇ L ( α ) g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12)(cid:12) (cid:27) , Now the estimates are similar to those which we used for Φ t ( x, y ) itself. Recall that ∂ t κ t ( y ) = − b ( t, κ t ( y )) , which is bounded. Then by (4.3)–(4.5), (4.8), (4.9), and (4.14) we get | ∂ t Φ t ( x, y ) | ≤ C (cid:12)(cid:12)(cid:12) a ( y ) − a ( x ) (cid:12)(cid:12)(cid:12)(cid:16) t d/α +2 + 1 t d/α +1+1 /α (cid:17) G ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) ≤ Ct − /α ′ (cid:16) t − χ/α Q ( χ ) t ( x, y ) + t − χ Q (0) t ( x, y ) (cid:17) . Next, | ∂ t Φ t ( x, y ) | ≤ Ct − d/α − /α − (cid:12)(cid:12)(cid:12) b ( t, κ t ( y )) − b ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12) + Ct − d/α − /α − (cid:12)(cid:12)(cid:12) b ( t, κ t ( y )) − b ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) κ t ( y ) − xt /α (cid:12)(cid:12)(cid:12)(cid:12) + Ct − d/α − /α (cid:12)(cid:12)(cid:12) b ( t, κ t ( y )) − b ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12) | ∂ t κ t ( y ) | + Ct − d/α − /α (cid:12)(cid:12)(cid:12) ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ t (cid:16) b ( t, κ t ( y )) (cid:17)(cid:12)(cid:12)(cid:12) =: X j =1 Υ jt ( x, y ) . By (4.15) and (4.16), we have Υ t ( x, y ) ≤ Ct − δ Q (0) t ( x, y ) . Next, similarly to (4.16) we have (cid:18) | κ t ( y ) − x | t /α + 1 (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) ( ∇ g ( α ) ) (cid:18) κ t ( y ) − xt /α a /α ( y ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ CG ( α ) (cid:18) κ t ( y ) − xt /α (cid:19) , see Proposition 4.1. Then using (4.15) and the fact that ∂ t κ t ( y ) is bounded, we get Υ t ( x, y ) ≤ Ct − δ Q (0) t ( x, y ) , Υ t ( x, y ) ≤ Ct − − /α + δ Q (0) t ( x, y ) . Finally, we have similarly to (3.21) | ∂ t b ( t, x ) | ≤ Ct − d/α Z R d | z − x | γ (cid:18) | z − x | t /α +1 + 1 t (cid:19) e −| z − x | / t /α dz ≤ Ct γ/α − = Ct − δ +1 /α . ∂ t (cid:16) b ( t, κ t ( y )) (cid:17) = ( ∂ t b )( t, κ t ( y )) + (cid:16) ∇ b ( t, κ t ( y )) , ∂ t κ t ( y ) (cid:17) and ∂ t κ t ( y ) is bounded, we get by (3.21) (cid:12)(cid:12)(cid:12) ∂ t (cid:16) b ( t, κ t ( y )) (cid:17)(cid:12)(cid:12)(cid:12) ≤ C (cid:16) t − δ +1 /α + t − δ (cid:17) . Then by (4.4), (4.7), Υ t ( x, y ) ≤ C (cid:16) t − δ + t − δ − /α (cid:17) Q (0) t ( x, y ) . Summarizing the above estimates for Υ jt ( x, y ) , j = 1 , . . . , we get ﬁnally | ∂ t Φ t ( x, y ) | ≤ Ct − /α ′ t − δ Q (0) t ( x, y ) . Remark . Now we can emphasize the only, but important point which makes our currentchoice of the term κ t ( y ) in the “zero order approximation” (4.1) better than all the othersdiscussed in Section 4.4 above. In the calculation which corresponds to Υ jt ( x, y ) , the followingterm arise: (cid:16) ∇ b ( t, κ t ( y )) , ∂ t κ t ( y ) (cid:17) . If we take κ t ( y ) equal to an exact solution to (3.19), a similar term has the form (cid:16) ∇ b ( κ t ( y )) , ∂ t κ t ( y ) (cid:17) , which is badly deﬁned because b is assumed to be Hölder continuous, only. The sameproblem arises if one takes κ t ( y ) equal θ k,t ( y ) , the k -th Picard type iteration for (3.19). Thisillustrates the point at which the “mollifying of the drift coeﬃcient” procedure, adopted inour construction, is particularly useful: it improves the smoothness property of the driftcoeﬃcient without loss of the accuracy of the method.Now the rest of the proof is completely similar to the proof of Theorem 2.6 [KK14], andwe just outline the main steps. Clearly, we can weaken (5.3) by changing Q ( χ ) , Q (0) thereinto H ( χ ) , H (0) . We re-arrange the formula for the k -fold convolution power of Φ in the formsimilar to (5.2): Φ ⊛ ( k ) t ( x, y ) = Z t/ Z R d Φ ⊛ ( k − t − s ( x, z )Φ s ( z, y ) dzds + Z t/ Z R d Φ ⊛ ( k − s ( x, z )Φ t − s ( z, y ) dzds. Then by induction we show that each Φ ⊛ kt ( x, y ) is continuously diﬀerentiable in t and foreach T > there exist C , C such that for t ∈ (0 , T ] | ∂ t Φ ⊛ kt ( x, y ) | ≤ C C k Γ( kζ ) t − /α ′ t ( k − ζ (cid:16) t − χ/α H ( χ ) t ( x, y ) + ( t − χ + t − δ ) H (0) t ( x, y ) (cid:17) , k ≥ Ψ of these convolution powers possesses the sameproperties: it has a continuous derivative in t and | ∂ t Ψ t ( x, y ) | ≤ Ct − /α ′ (cid:16) t − χ/α H ( χ ) t ( x, y ) + ( t − χ + t − δ ) H (0) t ( x, y ) (cid:17) , t ∈ (0 , T ] . This allows us to take a derivative ∂ t at the right hand side of (5.2), and to prove that thereexists continuous derivative ∂ t p t ( x, y ) with | ∂ t p t ( x, y ) − ∂ t p t ( x, y ) | ≤ Ct − /α ′ (cid:16) t χ/α ( H ( χ ) ⊛ H ) t ( x, y ) + ( t χ + t δ )( H (0) ⊛ H ) t ( x, y ) (cid:17) . Because H ( χ ) ≥ H (0) and H ( χ ) , H (0) have the sub-convolution property, we get then | ∂ t p t ( x, y ) − ∂ t p t ( x, y ) | ≤ Ct − /α ′ ≤ Ct − /α ′ (cid:16) t χ/α H ( χ ) t ( x, y ) + ( t χ + t δ ) H t ( x, y ) (cid:17) , which after a simple re-arrangement yields an estimate on | ∂ t p t ( x, y ) − ∂ t p t ( x, y ) | similarto the estimate for e R t ( x, y ) in the assertion of the theorem, with e p t ( x, y ) in the right handside replaced by p t ( x, y ) . Repeating the “ﬁne tuning” argument from the Section 4.2, wecomplete the proof. Acknowledgement.

The author thanks R. Schilling, A.Kohatsu-Higa, and K.Bogdan forinspiring discussions and helpful remarks, and A.Pilipenko for bringing the paper [TTW74]to his attention. The DFG Grant Schi 419/8-1 and the joint DFFD-RFFI project No. 09-01-14 and gratefully acknowledged.

References [BK14] V. Bally, A. Kohatsu-Higa. A probabilistic interpretation of the parametrix method.Ann. Appl. Probab., , No. 6 (2015), 3095-3138.[Ba88] R.F. Bass, Uniqueness in Law for Pure Jump Markov Processes. Probab. Th. Rel.Fields (1988) 271–287.[BP09] R.F. Bass, Perkins, E. A new technique for proving uniqueness for martingale prob-lems, Astérisque No. 327, 2009, 47–53.[BJ07] K. Bogdan, T. Jakubowski. Estimates of heat kernel of fractional Laplacian perturbedby gradient operators. Comm. Math. Phys. (2007), 179–198.[B05] B. Böttcher. A parametrix construction for the fundamental solution of the evolutionequation associated with a pseudo-diﬀerential operator generating a Markov process.

Math. Nachr. (2005), 1235–1241.[Bo08] B. Böttcher. Construction of time inhomogeneous Markov processes via evolutionequations using pseudo-diﬀerential operators.

J. London Math. Soc. (2008), 605–621.[CSZ15] Z.-Q. Chen, R. Song, X. Zhang. Stochastic ﬂows for Lévy process with Hölder drifts.Preprint 2015. Available at http://arxiv.org/pdf/1501.04758.pdf28CW13] Z.-Q.Chen, L. Wang. Uniqueness of stable processes with drift. Preprint 2013. Avail-able at http://arxiv.org/pdf/1309.6414v1.pdf[CZ13] Z.-Q. Chen, X. Zhang. Heat kernels and analyticity of non-symmetric jump diﬀusionsemigroups Available at http://arxiv.org/abs/1306.5015[DF13] A. Debussche, N. Fournier. Existence of densities for stable-like driven SDE’s withHölder continuous coeﬃcients. J. Funct. Anal.

264 (8) (2013), 1757–1778.[ED81] Ja. M. Drin’, S. D. Eidelman, Construction and investigation of classical fundamentalsolution of the Cauchy problem for uniformly parabolic pseudo-diﬀerential equations.(Russian)

Mat. Issled. (1981), 18–33.[EIK04] S. D. Eidelman, S. D. Ivasyshen and A. N. Kochubei. Analytic Methods in the Theoryof Diﬀerential and Pseudo-Diﬀerential Equations of Parabolic Type.

BirkhÃďuser,Basel, 2004.[EK86] S. N. Ethier, T. G. Kurtz.

Markov Processes: Characterization and Convergence.

Wiley, New York. 1986.[Fe36] W. Feller. Zur Theorie der stochastischen Prozesse. (Existenz- und Ein-deutigkeitssätze).

Math. Ann. (1936), 113–160.[Fr64] A. Friedman.

Partial diﬀerential equations of parabolic type.

Prentice-Hall, New-York,1964.[GK14] Iu. Ganychenko, A. Kulik. Rates of approximation of nonsmooth integral-type func-tionals of Markov processes.

Modern Stochastics: Theory and Applications, (2014),117–126.[GL08] E. Gobet, C. Labart, Sharp estimates for the convergence of the density of theEuler scheme in small time , Elect. Comm. in Probab., (2008), 352–363.[Ho94] W. Hoh. The martingale problem for a class of pseudo diﬀerential operators. Math.Ann. (1994), 121–147.[Ho95] W. Hoh. Pseudodiﬀerential operators with negative deﬁnite symbols and the mar-tingale problem.

Stoch. Stoch. Rep.

55 (3-4) (1995), 225–252.[Ho98a] W. Hoh.

Pseudo diﬀerential operators generating Markov processes.

Habilitationss-chrift, Bielefeld, 1998.[Ho98b] W. Hoh. A symbolic calculus for pseudo diﬀerential operators generating Fellersemigroups.

Osaka Math. J. (1998), 789–820.[H15] L. Huang. Density estimates for SDEs driven by tempered stable processes. Preprint2015. Available at http://arXiv:1504.04183[HM14] L. Huang, S. Menozzi. A parametrix approach for some degenerate stable drivenSDEs. Preprint 2014. Available at http://arXiv:1402.399729IW81] N. Ikeda, S. Watanabe. Stochastic Diﬀerential Equations and Diﬃsion Processes.

Kodansha, Tokyo, 1981.[Iw77] Ch. Iwasaki (Tsutsumi). The fundamental solution for pseudo-diﬀerential operatorsof parabolic type.

Osaka Math. J. (1977), 569–592.[Ja94] N. Jacob. A class of Feller semigroups generated by pseudo diﬀerential operators.

Math. Z. (1994), 151–166.[Ja96] N. Jacob.

Pseudo-diﬀerential operators and Markov processes.

Akademie Verlag, 1996.[Ja01] N. Jacob.

Pseudo diﬀerential operators and Markov processes, I: Fourier analysis andsemigroups.

Imperial College Press, London, 2001.[Ja02] N. Jacob.

Pseudo diﬀerential operators and Markov processes, II: Generators andtheir potential theory.

Imperial College Press, London, 2002.[Ja05] N. Jacob.

Pseudo diﬀerential operators and Markov processes, III: Markov Processesand Applications.

Imperial College Press, London, 2005.[KS14] P. Kim, R. Song. Stable process with singular drift.

Stoch. Proc. Appl. (2-14),2479–2516.[KK14] V. Knopova, A. Kulik. Parametrix construction of the transition probability den-sity of the solution to an SDE driven by α -stable noise. Preprint 2014. Available athttp://arxiv.org/abs/1412.8732.[Ko89] A. N. Kochubei. Parabolic pseudo-diﬀerential equations, hypersingular integrals andMarkov processes. Math. URSS Izestija. (1989), 233–259.[KMN14] A. Kohatsu-Higa, A. Makhlouf, H.L. Ngo , Approximations of non-smoothintegral type functionals of one dimensional diﬀusion precesses . Stochastic Processesand their Applications, (2014), issue 5, 1881–1909.[Ko00] V. N. Kolokoltsov. Symmetric Stable Laws and Stable-like Jump-Diﬀusions.

Proc.London Math. Soc. (2000), 725–768.[Ko84a] T. Komatsu. On the martingale problem for generators of stable processes withperturbations. Markov processes associated with certain integro-diﬀerential operators. Osaka J. Math. (1984), 113–132[Ko84b] T. Komatsu. Pseudo-diﬀerential operators and Markov processes. J. Math. Soc.Japan (1984), 387–418.[KM02] V. Konakov, E. Mammen. Edgeworth type expansions for Euler schemes for stochas-tic diﬀerential equations.

Monte Carlo Methods Appl. 8(3) (2002), 271–285.[KM11] V. Konakov, S. Menozzi. Weak error for stable driven SDES: expansion of the den-sities.

Journal of Theoretical Probability . , No. 2 (2011), p. 454-478.30Ku81] H. Kumano-go. Pseudo-diﬀerential operators . MIT Press, Cambridge, Mass., 1981.[Le1907] E. E. Levi. Sulle equazioni lineari totalmente ellittiche alle derivate parziali.

Rend.del Circ. Mat.

24 (1) (1907), 275–317.[M11] S. Menozzi. Parametrix techniques and martingale problems for some degenerate Kol-mogorov equations, Electronic Communications in Probability (2011), 234-250.[MP92a] R. Mikulevicius, H. Pragarauskas. On the martingale problem associated with non-degenerate Lévy operators. Lith. Math. J.

32 (3) (1992), 297–311.[MP92b] R. Mikulevicius, H. Pragarauskas. On the Cauchy problem for certain integro-diﬀerential operators in Sobolev and Hölder spaces.

Lith. Math. J.

32 (2) (1992),238–264.[MP12] R. Mikulevicius and H. Pragarauskas. On the Cauchy problem for integro-diﬀerentialoperators in Hölder classes and the uniqueness of the martingale problem.

Potentialanal. (2014), 539–563.[Po94] N. I. Portenko. Some perturbations of drift-type for symmetric stable processes. Ran-dom Oper. Stoch. Equ. (1994), 211–224.[PP95] S. I. Podolynny, N. I. Portenko. On multidimensional stable processes with locallyunbounded drift.

Random Oper. Stoch. Equ. (1995), 113–124.[SV79] Stroock, D.W., Varadhan, S.R.S.: Multidimensional diﬀusion processes, Springer,Berlin (1979).[TTW74] H. Tanaka, M. Tsuchiya, and S. Watanabe, Perturbation of drift-type for Lévyprocesses, J. Math. Kyoto Univ. (1974), 73-92.[Ts74] Ch. Tsutsumi. The fundamental solution for a degenerate parabolic pseudo-diﬀerential operator. Proc. Japan Acad.50