On inference for fractional differential equations
aa r X i v : . [ m a t h . P R ] A p r ON INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS
ALEXANDRA CHRONOPOULOU AND SAMY TINDEL
Abstract.
Based on Malliavin calculus tools and approximation results, we show howto compute a maximum likelihood type estimator for a rather general differential equa-tion driven by a fractional Brownian motion with Hurst parameter
H > / . Rates ofconvergence for the approximation task are provided, and numerical experiments showthat our procedure leads to good results in terms of estimation. Introduction
In this introduction, we first try to motivate our problem and outline our results. Wealso argue that only a part of the question can be dealt with in a single paper. We brieflysketch a possible program for the remaining tasks in a second part of the introduction.1.1.
Motivations and outline of the results.
The inference problem for diffusion pro-cesses is now a fairly well understood problem. In particular, during the last two decades,several advances have allowed to tackle the problem of inference based on discretely ob-served diffusions [10, 36, 40], which is of special practical interest.More specifically, consider a family of stochastic differential equations of the form Y t = a + Z t µ ( Y s ; θ ) ds + d X l =1 Z t σ l ( Y s ; θ ) dB ls , t ∈ [0 , T ] , (1)where a ∈ R m , µ ( · ; θ ) : R m → R m and σ ( · ; θ ) : R m → R m,d are smooth enough functions, B is a d -dimensional Brownian motion and θ is a parameter varying in a subset Θ ⊂ R q .If one wishes to identify θ from a set of discrete observations of Y , most of the methodswhich can be found in the literature are based on (or are closely linked to) the maximumlikelihood principle. Indeed, if B is a Brownian motion and Y is observed at some equallydistant instants t i = iτ for i = 0 , . . . , n , then the log-likelihood of a sample ( Y t , . . . , Y t n ) can be expressed as ℓ n ( θ ) = n X i =1 ln (cid:0) p (cid:0) τ, Y t i − , Y t i ; θ (cid:1)(cid:1) , (2) Date : June 12, 2018.2010
Mathematics Subject Classification.
Primary 60H35; Secondary 60H07, 60H10, 65C30, 62M09.
Key words and phrases.
Fractional Brownian motion, Stochastic differential equations, Malliavin cal-culus, Inference for stochastic processes.S. Tindel is partially supported by the ANR grant ECRU. Both authors are part of the BIGS (Biology,Genetics and Statistics) team at INRIA Nancy. where p stands for the transition semi-group of the diffusion Y . If Y enjoys some ergodicproperties, with invariant measure ν θ under P θ , then we get a . s . − lim n →∞ n ℓ n ( θ ) = E θ [ p ( τ, Z , Z ; θ )] , J θ ( θ ) , (3)where Z ∼ ν θ and L ( Z | Z ) = p ( τ, Z , · ; θ ) . Furthermore, it can be shown in a generalcontext that θ J θ ( θ ) admits a maximum at θ = θ . This opens the way to a MLEanalysis which is similar to the one performed in the case of i.i.d observations, at leasttheoretically.However, in many interesting cases, the transition semi-group p is not amenable toexplicit computations, and thus expression (2) has to be approximated in some sense.The most common approach, advocated for instance in [36], is based on a linearization ofeach p ( τ, Y t i − , Y t i ; θ ) , which transforms it into a Gaussian density N (cid:0) Y t i − + µ ( Y t i − ; θ ) τ, σσ ∗ ( Y t i − ; θ ) τ (cid:1) . This linearization procedure is equivalent to the approximation of equation (1) by anEuler (first order) numerical scheme. Refinements of this procedure, based on Milsteintype discretizations, are proposed in [10].Some special situations can be treated differently (and often more efficiently): for in-stance, in case of a constant diffusion coefficient, the continuous time likelihood can becomputed explicitly by means of Girsanov’s theorem. When the dimension of the drivingBrownian motion B is d = 1 , one can also apply Itô’s formula in order to be back toan equation with constant diffusion coefficient, or use Doss-Sousman representation ofsolutions to (1). Let us also mention that statistical inference for SDEs driven by Lévyprocesses is currently intensively investigated, with financial motivations in mind.The current article is concerned with the estimation problem for equations of theform (1), when the driving process B is a fractional Brownian motion. Let us recallthat a fractional Brownian motion B with Hurst parameter H ∈ (0 , , defined on a com-plete probability space (Ω , F , P ) , is a d -dimensional centered Gaussian process. Its lawis thus characterized by its covariance function, which is given by E (cid:2) B it B js (cid:3) = 12 (cid:0) t H + s H − | t − s | H (cid:1) ( i = j ) , s, t ∈ R + . The variance of the increments of B is then given by E h(cid:0) B it − B is (cid:1) i = | t − s | H , s, t ∈ R + , i = 1 , . . . , d, and this implies that almost surely the fBm paths are γ -Hölder continuous for any γ < H .Furthermore, for H = 1 / , fBm coincides with the usual Brownian motion, convertingthe family { B H ; H ∈ (0 , } into the most natural generalization of this classical process.In the last decade, some important advances have allowed to solve [33, 43] and un-derstand [19, 34] differential systems driven by fBm for H ∈ (1 / , . The rough pathsmachinery also allows to handle fBm with H ∈ (1 / , / , as nicely explained in [11, 14,27, 29]. However, the irregular situation H ∈ (1 / , / is not amenable to useful momentsestimates for the solution Y to (1) together with its Jacobian (that is the derivative withrespect to the initial condition). This is why we concentrate, in the sequel, on the simplercase H > / for our estimation problem. In any case, many real world noisy systems are N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 3 currently modeled by equations like (1) driven by fBm, and this is particularly present inthe Biophysics literature, as assessed by [25, 35], or for Finance oriented applications asin [5, 13, 20, 21, 39, 42]. This leads to a demand for rigorous estimation procedures forSDEs driven by fractional Brownian motion, which is the object of our paper.Concerns about the inference problem for fractional diffusion processes started a decadeago with the analysis of fractional Ornstein-Uhlenbeck processes [23]. Then a more recentrepresentative set of references on the topic includes [37, 41]. More specifically, [41] handlesthe case of a one-dimensional equation of the form Y t = a + θ Z t µ ( Y s ) ds + B t , t ∈ [0 , T ] , (4)where µ is regular enough, and where B is a fBm with H ∈ (0 , . The simple dependenceon the parameter θ and the fact that an additive noise is considered enables the use ofGirsanov’s transform in order to get an exact expression for the MLE. Convergence of theestimator is then obtained through an extensive use of Malliavin calculus.As far as [37] is concerned, it is focused on the case of a polynomial equation, forwhich the exact moments of the solution can be computed. The estimator relies thenon a generalization of the moment method, which tries to fit empirical moments of thesolution with their theoretical value. The range of application of this method is howeverconfined to specific situations, for the following reasons: • It assumes that N independent runs of equation (1) can be obtained, which isusually not the case. • It hinges on multiple integrals computations, which are time consuming and areavoided in most numerical schemes.As can be seen from this brief review, parameter estimation for rough equations is still inits infancy. We shall also argue that it is a hard problem.Indeed, if one wishes to transpose the MLE methods used for diffusion processes to thefBm context, an equivalent of the log-likelihood functions (2) should first be produced.But the covariance structure of B is quite complex and the attempts to put the law of Y defined by (1) into a semigroup setting are cumbersome, as illustrated by [1, 17, 31]. Wehave thus decided to consider a highly simplified version of the log-likelihood. Namely,still assuming that Y is observed at a discrete set of instants < t < · · · < t n < ∞ , set ℓ n ( θ ) = n X i =1 ln ( f ( t i , Y t i ; θ )) , (5)where we suppose that under P θ the random variable Y t i admits a density z f ( t i , z ; θ ) .Notice that in case of an elliptic diffusion coefficient σ the density f ( t i , · ; θ ) is strictlypositive, and thus expression (5) makes sense by a straightforward application of [11,Proposition 19.6]. However, the successful replication of the strategy implemented forBrownian diffusions (that we have tried to summarize above) relies on some highly nontrivial questions: existence of an invariant measure for equation (1), rate of convergenceto this invariant measure, convergence of expressions like (5), characterization of the limitin terms of θ as in (3), to mention just a few. We shall come back to these considerations A. CHRONOPOULOU AND S. TINDEL in the next section, but let us insist at this point on the fact that all those questions wouldfit into a research program over several years.Our aim in this paper is in a sense simpler: we assume that quantities like (5) aremeaningful for estimation purposes. Then we shall implement a method which enables tocompute ℓ n ( θ ) and optimize it in θ , producing thus a pseudo MLE estimator. We focusedfirst on this specific aspect of the problem for the following reasons:(1) From a statistical point of view, it is obviously essential to obtain a computa-tionally efficient estimation procedure. This will allow us for instance to evaluatenumerically the accuracy of our method.(2) The procedure itself is nontrivial, and requires the use of advanced stochasticanalysis tools: probabilistic representation of the density, Malliavin type integra-tion by parts, Stratonovich-Skorohod correction terms, discretization of systemsof pathwise stochastic differential equations for instance.We have thus decided to tackle the implementation issues first. If it turns out to besatisfying, we shall then try to proceed to a full justification of our method.Let us also mention that it might not be clear to the reader that ℓ n ( θ ) can be meaningfulin terms of statistical estimation, since it only involves evaluations at single points Y t i .However our numerical experiments indicate that this quantity behaves nicely for ourpurposes. Moreover, it will become clear from the forthcoming computations that ourmethodology can be extended to handle quantities like ˜ ℓ n ( θ ) := n X i =1 ln (cid:0) f ( t i , t i +1 , Y t i , Y t i +1 ; θ ) (cid:1) , where f ( s, t, x, z ; θ ) stands for the density of the couple ( Y s , Y t ) . This kind of pseudolog-likelihood is obviously closer in spirit to the diffusion case. Densities of tuples couldalso be considered at the price of technical complications.Let us now try to give a flavor of the kind of result we shall obtain in this article, in avery loose form: Theorem 1.1.
Consider Equation (1) driven by a d -dimensional fractional Brownianmotion B with Hurst parameter H > / . Assume µ and σ are smooth enough coefficients,and that σσ ∗ is strictly elliptic. For a sequence of times t < · · · < t n < ∞ , let y t i , i = 1 , . . . , n be the corresponding observations. Then: (i) The gradient of the log-likelihood function admits the following probabilistic represen-tation: ∇ l ℓ n ( θ ) = P ni =1 V i ( θ ) W i ( θ ) , with W i ( θ ) = E (cid:20) ( Y ti ( θ ) >y ti ) H (1 ,...,m ) (cid:16) Y t i ( θ ) (cid:17)(cid:21) (6) where H ( j ,...,j n ) ( Y t i ( θ )) is an expression involving Malliavin derivatives an Skorohod inte-grals of Y ( θ ) . A similar expression is also available for V i ( θ ) . (ii) A computational procedure is constructed in order to obtain H (1 ,...,m ) ( Y t i ( θ )) in a suit-able way. (iii) When Y t is replaced by its Euler scheme approximation with step T /M and expectedvalues in (6) are approximated thanks to N Monte Carlo steps, we show that
N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 5 • N can be chosen in function of M in an optimal way (see Proposition 4.7). • The corresponding approximation of ∇ l ℓ n ( θ ) converges to the real one with rate n − (2 γ − for any / < γ < H . All those results are stated in a more rigorous way in the remainder of the article.Here is how our article is structured: we give some preliminaries and notations on Youngand Malliavin calculus for fractional Brownian motion at Section 2. The probabilisticrepresentation for the log-likelihood is given at Section 3. Discretization procedures aredesigned at Section 4, and finally numerical examples are given at Section 5.1.2.
Remaining open problems.
We emphasized above the fact that only a part ofthe problem at stake was going to be solved in the current article. We now briefly sketchthe remaining tasks to be treated.The most important obstacle in order to fully justify our methodology is to get asuitable convergence theorem for ℓ n ( θ ) /n , where ℓ n ( θ ) is defined by (5). In a natural way,this should be based on some strong ergodicity properties for Y t . After a glance at theliterature on ergodicity for fractional systems, one can distinguish two cases: (i) When σ ( · ; θ ) is constant, the convergence of L ( Y t ) as t → ∞ is established in [15],with a (presumably non optimal) rate of convergence t − / . (ii) For a general smooth and elliptic coefficient σ , only the uniqueness of the invariantmeasure is shown in [17], with an interesting extension to the hypoelliptic case in [18].Nothing is known about the convergence of L ( Y t ) , not to mention rates.This brief review already indicates that the convergence to invariant measures is still quitemysterious for fractional differential equations, at least for a non constant coefficient σ . Moreover, recall that if ν ( θ ) stands for the invariant measure corresponding to thesystem with coefficients µ ( · ; θ ) , σ ( · ; θ ) , we also wish to retrieve some information on thedependence θ ν ( θ ) (See [16] for some partial results in this direction).Let us mention another concrete problem: even in the case of a constant σ , the conver-gence of L ( Y t ) to an invariant measure ν ( θ ) is proven in [15] in the total variation sense.In terms of the density p ( t, x ; θ ) of Y t , it means that p ( t, · ; θ ) converges to the density of ν in L topology. However, in order to get a limit for ℓ n ( θ ) /n , one expects to use at leasta convergence in some Sobolev space W α,p for α, p large enough.One possibility in order to get this sharper convergence is to bound first the density p ( t, · ; θ ) in another Sobolev space W α ′ ,p ′ and then to use interpolation theory. It seemsthus sufficient to obtain Gaussian bounds on p ( t, · ; θ ) , uniformly in t . In case of Browniandiffusions, these Gaussian bounds are obtained by analytic tools, thanks to the Markovproperty. This method being obviously not available for systems driven by fBm, a possibleinspiration is contained in the upper Gaussian bounds for the stochastic wave equationwhich can be found in [6]. The latter technical results stem from an intensive use of Malli-avin calculus, which should also be invoked in our case, and notice the recent efforts [2, 3]in this direction.Finally, let us mention that it seems possible to produce some reasonable convergentparametric estimators for equations driven by fBm in a rather general context. Amongthe methods which can be adapted from the diffusion case with the current stochasticanalysis techniques, let us mention the least square estimator of [22], as well as the local A. CHRONOPOULOU AND S. TINDEL asymptotic normality property shown in [12]. However, it seems obvious that the road toa complete picture of parameter estimation for stochastic equations driven by fBm is stillhard and long. We hope to complete it in some subsequent communications.2.
Preliminaries and notations
As mentioned in the introduction, we are concerned with equations driven by a d -dimensional fractional Brownian motion B . We recall here some basic facts about theway to solve those equations, and some Malliavin calculus tools which will be needed lateron. Let us introduce first some general notation for Hölder type spaces: Notation 2.1.
We will denote by C α ( V ) the set of V -valued α -Hölder functions for any α ∈ (0 , , and by C nb ( U ; V ) the set of n times differentiable functions, bounded togetherwith all their derivatives, from U to V . In the previous notation, U and V stand for twofinite dimensional vector spaces. The state space V can be omitted for notational sakewhen its value is non ambiguous. When we want to stress the fact that we are working ona finite interval [0 , T ] , we write C αT ( V ) for the space of α -Hölder functions f from [0 , T ] to V . The corresponding Hölder norms shall be denoted by k f k α,T . Differential equations driven by fBm.
Recall that the equation we are interestedin is of the form (1). Before stating the assumptions on our coefficients we need anadditional notation:
Notation 2.2.
For n, p ≥ , a function f ∈ C p ( R n ; R ) and any tuple ( i , . . . i p ) ∈{ , . . . , d } p , we set ∂ i ...i p f for ∂ p f∂x i ...∂x ip . Similarly, consider a function g θ ∈ C p (Θ; R ) , for n, p ≥ and a vector of parameters θ ∈ Θ ⊂ R q . For any tuple ( i , . . . i p ) ∈ { , . . . , q } p ,we set ∇ i ...i p g iθ for ∂ p g iθ ∂θ i ...∂θ ip , where i = 1 , . . . , n . Using this notation, we work under the following set of assumptions:
Hypothesis 2.3.
For any θ ∈ Θ , we assume that µ ( · ; θ ) : R m → R m and σ ( · ; θ ) : R m → R m,d are C b coefficients. Furthermore, we have sup θ ∈ Θ 2 X l =0 X ≤ i ,...,i l ≤ q k∇ li ··· i l µ ( · ; θ ) k ∞ + k∇ li ··· i l σ ( · ; θ ) k ∞ < ∞ . When equation (1) is driven by a fBm with Hurst parameter
H > / it can besolved, thanks to a fixed point argument, with the stochastic integral interpreted in the(pathwise) Young sense (see e.g. [14]). Let us recall that Young’s integral can be definedin the following way: Proposition 2.4.
Let f ∈ C γ , g ∈ C κ with γ + κ > , and ≤ s ≤ t ≤ . Thenthe integral R ts g ξ df ξ is well-defined as limit of Riemann sums along partitions of [ s, t ] .Moreover, the following estimation is fulfilled: (cid:12)(cid:12)(cid:12)(cid:12)Z ts g ξ df ξ (cid:12)(cid:12)(cid:12)(cid:12) ≤ C k f k γ k g k κ | t − s | γ , (7) N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 7 where the constant C only depends on γ and κ . A sharper estimate is also available: (cid:12)(cid:12)(cid:12)(cid:12)Z ts g ξ df ξ (cid:12)(cid:12)(cid:12)(cid:12) ≤ | g s | k f k γ | t − s | γ + c γ,κ k f k γ k g k κ | t − s | γ + κ . (8)With this definition in mind and under assumptions 2.3, we can solve our differentialsystem of interest, and the following moments bounds are proven in [11, 19]: Proposition 2.5.
Consider a fBm B with Hurst parameter H > / . Then:(1) Under Hypothesis 2.3, equation (1) driven by B admits a unique β -Hölder continuoussolution Y , for any β < H .(2) Furthermore, k Y k T,β ≤ | a | + c f,T k B k /ββ,T . (3) If we denote by Y a the solution to (1) with initial condition a , then k Y b − Y a k T,β ≤ | b − a | exp (cid:16) c f,T k B k /ββ,T (cid:17) . (4) If we only assume that f has linear growth, with ∇ f, ∇ f bounded, the followingestimate holds true: sup t ∈ [0 ,T ] | Y t | ≤ (1 + | a | ) exp (cid:16) c f,T k B k /ββ,T (cid:17) . Remark . The framework of fractional integrals is used in [19] in order to define integralswith respect to B . It is however easily seen to be equivalent to the Young setting we havechosen to work with.Some differential calculus rules for processes controlled by fBm will also be useful inthe sequel: Proposition 2.7.
Let B be a d -dimensional fBm with Hurst parameter H > / . Con-sider a, ˆ a ∈ R , b, ˆ b ∈ C αT ( R d ) with α + H > , and c, ˆ c ∈ C T ( R ) (all these assumptions areunderstood in the almost sure sense). Define two processes z, ˆ z on [0 , T ] by z t = a + d X j =1 Z t b ju dB ju + Z t c u du, and ˆ z t = ˆ a + d X j =1 Z t ˆ b ju dB ju + Z t ˆ c u du. Then for t ∈ [0 , T ] , one can decompose the product z t ˆ z t into z t ˆ z t = a ˆ a + n X j =1 Z t h ˆ z u b ju + z u ˆ b ju i dB ju + Z t [ z u ˆ c u + ˆ z u c u ] du, where all the integrals with respect to B are understood in the Young sense. The proof of this elementary and classical result is omitted here. See [28, Proposi-tion 2.8] for the proof of a similar rule.
A. CHRONOPOULOU AND S. TINDEL
Malliavin calculus techniques.
Our representation of the density for the solutionto (1) obviously relies on Malliavin calculus tools that we proceed now to recall. As alreadymentioned in the introduction, on a finite interval [0 , T ] and for some fixed H ∈ (1 / , , weconsider (Ω , F , P ) the canonical probability space associated with a fractional Brownianmotion with Hurst parameter H . That is, Ω = C ([0 , T ]; R d ) is the Banach space ofcontinuous functions vanishing at equipped with the supremum norm, F is the Borelsigma-algebra and P is the unique probability measure on Ω such that the canonicalprocess B = { B t , t ∈ [0 , T ] } is a d -dimensional fractional Brownian motion with Hurstparameter H . Remind that this means that B has d independent coordinates, each onebeing a centered Gaussian process with covariance R H ( t, s ) = ( s H + t H − | t − s | H ) . Functional spaces.
Let E be the space of d -dimensional elementary functions on [0 , T ] : E = n f = ( f , . . . , f d ); f j = n j − X i =0 a ji [ t ji ,t ji +1 ) , t < t j < · · · < t jn j − < t jn j = T, for j = 1 , . . . , d o . (9)We call H the completion of E with respect to the semi-inner product h f, g i H = d X i =1 h f i , g i i H , where h [0 ,t ] , [0 ,s ] i H := R ( s, t ) , s, t ∈ [0 , T ] . Then, one constructs an isometry K ∗ H : H → L ([0 , R d ) such that K ∗ H (cid:0) [0 ,t ] , . . . , [0 ,t d ] (cid:1) = (cid:0) [0 ,t ] K H ( t , · ) , . . . , [0 ,t d ] K H ( t d , · ) (cid:1) , where the kernel K H is given by K H ( t, s ) = c H s − H Z ts ( u − s ) H − u H − du and verifies that R H ( t, s ) = R s ∧ t K H ( t, r ) K H ( s, r ) dr , for some constant c H . Moreover, letus observe that K ∗ H can be represented in the following form: for ϕ = ( ϕ , . . . , ϕ d ) ∈ H ,we have K ∗ H ϕK ∗ H ϕ = (cid:0) K ∗ H ϕ , . . . , K ∗ H ϕ d (cid:1) , where [ K ∗ H ϕ i ] t = Z t ϕ ir ∂ r K H ( r, t ) dr. Malliavin derivatives.
Let us start by defining the Wiener integral with respect to B : for any element f in E whose expression is given as in (9), we define the Wienerintegral of f with respect to B as B ( f ) := d X j =1 n j − X i =0 a ji ( B jt ji +1 − B jt ji ) . We also denote this integral as R T f t dB t , since it coincides with a pathwise integral withrespect to B . N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 9
For θ : R → R , and j ∈ { , . . . , d } , denote by θ [ j ] the function with values in R d havingall the coordinates equal to zero except the j -th coordinate that equals to θ . It is readilyseen that E h B (cid:16) [ j ][0 ,s ) (cid:17) B (cid:16) [ k ][0 ,t ) (cid:17)i = δ j,k R s,t . This definition can be extended by linearity and closure to elements of H , and we obtainthe relation E [ B ( f ) B ( g )] = h f, g i H , valid for any couple of elements f, g ∈ H . In particular, B ( · ) defines an isometric mapfrom H into a subspace of L (Ω) .We can now proceed to the definition of Malliavin derivatives. With this notation 2.2in hand, let us consider S be the family of smooth functionals F of the form F = f ( B ( h ) , . . . , B ( h n )) , (10)where h , . . . , h n ∈ H , n ≥ , and f is a smooth function with polynomial growth,together with all its derivatives. Then, the Malliavin derivative of such a functional F isthe H -valued random variable defined by DF = n X i =1 ∂ i f ( B ( h ) , . . . , B ( h n )) h i . For all p > , it is known that the operator D is closable from L p (Ω) into L p (Ω; H ) (seee.g. [32, Section 1]). We will still denote by D the closure of this operator, whose domainis usually denoted by D ,p and is defined as the completion of S with respect to the norm k F k ,p := ( E ( | F | p ) + E ( k DF k p H )) p . It should also be noticed that partial Malliavin derivatives with respect to each component B j of B will be invoked: they are defined, for a functional F of the form (10) and j = 1 , . . . , d , as D j F = n X i =1 ∂ i f ( B ( h ) , . . . , B ( h n )) h [ j ] i , and then extended by closure arguments again. We refer to [32, Section 1] for the definitionof higher derivatives and Sobolev spaces D k,p for k > . Another essential object relatedto those derivatives is the so-called Malliavin matrix of a R m -valued random variable F ∈ D , , defined by γ F = (cid:18)D DF i , DF j E(cid:19) ≤ i,j ≤ m . (11)2.2.3. Skorohod integrals.
We will denote by δ the adjoint of the operator D (also referredto as the divergence operator ). This operator is closed and its domain, denoted by Dom( δ ) ,is the set of H -valued square integrable random variables u ∈ L (Ω; H ) such that | E [ h DF, u i H ] | ≤ C k F k , for all F ∈ D , , where C is some constant depending on u . Moreover, for u ∈ Dom( δ ) , δ ( u ) is the element of L (Ω) characterized by the duality relationship: E [ F δ ( u )] = E [ h DF, u i H ] , for any F ∈ D , . (12) The quantity δ ( u ) is usually called Skorohod integral of the process u .Skorohod integrals are obviously analytic objects, not suitable for easy numerical im-plementations. However, they can be related to the Young type integrals introduced atProposition 2.4. For this, we need to define another functional space as follows: Notation 2.8.
We call |H| the space of measurable functions ϕ : [0 , T ] → R d such that k ϕ k |H| := c H Z Z | ϕ r || ϕ u || r − u | H − drdu < + ∞ , where c H = H (2 H − , and we denote by h· , ·i |H| the associated inner product. We alsowrite D k,p ( |H| ) for the space of D k,p functionals with values in |H| . The following proposition is then a slight extension of [32, Proposition 5.2.3]:
Proposition 2.9.
Let { u ijt , t ∈ [0 , } , for i = 1 , . . . , m and j = 1 , . . . , d , be a stochasticprocess in D , ( |H| ) such that d X j =1 Z Z | D js u ijt | | t − s | H − dsdt < + ∞ a.s. (13) We also assume that almost surely, u has β -Hölder paths with β + H > . Then the Youngintegral P dj =1 R T u ijt dB jt exists and for all i = 1 , . . . , m can be written as d X j =1 Z T u ijt dB jt = δ ( u i ) + d X j =1 Z T Z T D js u ijt | t − s | H − dsdt, where δ ( u ) stands for the Skorohod integral of u . Probabilistic expression for the log-likelihood
Recall that we are focusing on equation (1) driven by a d -dimensional fBm B , and thatwe have chosen to use expression (5) as a substitute to the log-likelihood function. Wehave thus reduced the initial maximization problem to the solution of ∇ l ℓ n ( θ ) = 0 . Thiswill be performed numerically by means of a root approximation algorithm.Observe first that in order to define (5), the density of Y t ( θ ) must exist for any t > .Let us thus recall the classical setting (given in [19]) under which Y t admits a smoothdensity: Hypothesis 3.1.
Let µ and σ be coefficients satisfying Hypothesis 2.3. For ξ ∈ R m and θ ∈ Θ , set α ( ξ ) = σ ( ξ, θ ) σ ∗ ( ξ, θ ) . Then we assume that (i) For any k ≥ and j , . . . , j k ∈ { , . . . , m } we have sup θ ∈ Θ 2 X l =0 X ≤ p ,...,p l ≤ q k∇ lp ··· p l ∂ kj ,...,j k µ ( · ; θ ) k ∞ + k∇ lp ··· p l ∂ kj ,...,j k σ ( · ; θ ) k ∞ ≤ c k , for a strictly positive constant c k . (ii) There exists a strictly positive constant ε such that h α ( ξ ; θ ) η, η i R m ≥ ε | η | R m for anycouple of vectors η, ξ ∈ R m , uniformly in θ ∈ Θ . Then the density result for Y t can be read as follows: N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 11
Theorem 3.2.
Consider the stochastic differential equation (1) with initial condition a ∈ R m . Assume Hypothesis 3.1 is satisfied. Then, for any t > and θ ∈ Θ , the law of Y t ( θ ) admits a C ∞ density, denoted by f ( t, · ; θ ) , with respect to Lebesgue’s measure. In the sequel, we shall suppose that the density f ( t, · ; θ ) exists without further mention,the aim of this section being to produce a probabilistic representation of f ( t, · ; θ ) forcomputational purposes. To this aim, we shall first give the equations governing theMalliavin derivatives of the processes Y ( θ ) and ∇ Y ( θ ) , and then use a stochastic analysisformula in order to represent our log-likelihood. We separate these tasks in two differentsubsections.3.1. Some Malliavin derivatives.
This section is devoted to a series of preliminarylemmas which will enable to formulate our probabilistic representation of f ( t, · ; θ ) . Letus first introduce a notation which will prevail until the end of the paper: Notation 3.3.
For a set of indices or coordinates ( k , . . . , k r ) of length r ≥ and ≤ j ≤ r , we denote by ( k , . . . , ˇ k j , . . . , k r ) the set of indices or coordinates of length r − where k j has been omitted. We now give a general expression for the higher order derivatives of Y t , borrowedfrom [34]. Lemma 3.4.
Assume Hypothesis 2.3 and 3.1 hold true. For n ≥ and ( i , . . . , i n ) ∈{ , . . . , d } n , denote by D i ,...,i n Y it ( θ ) the n th Malliavin derivative of Y it ( θ ) with respect tothe coordinates B i , . . . , B i n of B . Then D i ,...,i n Y it ( θ ) , considered as an element of H ⊗ n ,satisfies the following linear equation: for t ≥ r ∨ · · · ∨ r n , D i ,...,i n r ,...,r n Y it ( θ ) = n X p =1 α ii p ,i ..., ˇ ı p ,...,i n ( r p ; r , . . . , ˇ r p , . . . , r n ; θ )+ Z tr ∨···∨ r n β ii ,...,i n ( s ; r , . . . , r n ; θ ) ds + d X l =1 Z tr ∨···∨ r n α il,i ,...,i n ( s ; r , . . . , r n ; θ ) dB ls , (14) where α ij,i ,...,i n ( s ; r , . . . , r n ; θ ) = X m X k ,...,k ν =1 ∂ νk ...k ν σ ij ( Y s ( θ ); θ ) D i ( I ) r ( I ) Y k s ( θ ) . . . D i ( I ν ) r ( I ν ) Y k ν s ( θ ) β ii ,...,i n ( s ; r , . . . , r n ; θ ) = X m X k ,...,k ν =1 ∂ νk ...k ν µ i ( Y s ( θ ); θ ) D i ( I ) r ( I ) Y k s ( θ ) . . . D i ( I ν ) r ( I ν ) Y k ν s ( θ ) . In the expressions above, the first sums are extended to the set of all partitions I , . . . , I ν of { , . . . , n } and for any subset K = { i , . . . , i η } of { , . . . , n } we set D i ( K ) r ( K ) for the derivativeoperator D i ,...,i η r ,...,r η . Notice that D i ,...,i n r ,...,r n Y it ( θ ) = 0 whenever t < r ∨ · · · ∨ r n . The formulas above might seem intricate. The following example illustrate their use ina simple enough situation:
Example . The second order derivative D , r ,r Y t ( θ ) can be computed as: D , r ,r Y t ( θ ) = α , ( r , r ; θ ) + α , ( r , r ; θ )+ Z tr ∨ r β , ( s, r , r ; θ ) ds + d X l =1 Z tr ∨ r α l, , ( s, r , r ; θ ) dB ls , where α , ( r , r ; θ ) = m X k =1 ∂ k σ ( Y r ( θ ); θ ) D r Y kr ( θ ) ,α , ( r , r ; θ ) = m X k =1 ∂ k σ ( Y r ( θ ); θ ) D r Y kr ( θ ) and β , ( s, r , r ; θ ) = ∂ kk µ ( Y s ( θ ); θ ) D , r ,r Y ks ( θ ) + ∂ k k µ ( Y s ( θ ); θ ) D r Y k s ( θ ) D r Y k s ( θ ) ,α l, , ( s, r , r ; θ ) = ∂ kk σ l ( Y s ( θ ); θ ) D , r ,r Y ks ( θ ) + ∂ k k σ l ( Y s ( θ ); θ ) D r Y k s ( θ ) D r Y k s ( θ ) , where we have used the convention of summation over repeated indices.Our formula for the log-likelihood will also involve some derivatives of the process Y ( θ ) with respect to the parameter θ . The existence of this derivative is assessed below: Proposition 3.6.
Under the same hypothesis as for Lemma 3.4, the random variable Y it ( θ ) is a smooth function of θ for any t ≥ . We denote by ∇ l Y it ( θ ) the derivative of Y it ( θ ) with respect to the l th element of the vector of parameters θ . This process satisfiesthe following SDE: ∇ l Y it ( θ ) = Z t [ ∂ i µ i ( Y u ( θ ); θ ) ∇ l Y iu ( θ ) + ∇ l µ i ( Y u ( θ ); θ )] du + d X j =1 Z t [ ∂σ ij ( Y u ( θ ); θ ) ∇ l Y iu ( θ ) + ∇ l σ ij ( Y u ( θ ); θ )] dB ju . Proof.
The proof goes exactly along the same lines as for [34, Proposition 4], and thedetails are left to the reader. (cid:3)
We shall also need some equations governing the Malliavin derivatives of ∇ l Y ( θ ) . Thisis the aim of the following lemma: Lemma 3.7.
For any l ∈ { , . . . , q } and n ≥ , the process ∇ l D i ,...,i n Y ( θ ) is n -timesdifferentiable in the Malliavin calculus sense. Moreover, taking up the notations of Lem-ma 3.4, the process ∇ l D i ,...,i n Y it ( θ ) satisfies the following linear equation: for t ≥ r ∨ N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 13 · · · ∨ r n , ∇ l D i ,...,i n r ,...,r n Y it ( θ ) = n X p =1 ˆ α i,li p ,i ..., ˇ ı p ,...,n ( r i p , r , . . . , ˇ r p , . . . , r n ; θ )+ Z tr ∨···∨ r n ˆ β i,li ,...,i n ( s ; r , . . . , r n ; θ ) ds + d X l =1 Z tr ∨···∨ r n ˆ α i,ll,i ,...,i n ( s ; r , . . . , r n ; θ ) dB ls , where ˆ α i,lj,i ,...,i n = ∇ l α ij,i ,...,i n and ˆ β i,lj,i ,...,i n = ∇ l β ii ,...,i n . More specifically, ˆ β i,pj,i ,...,i n isdefined recursively by ˆ β i,pi ,...,i n ( s ; r , . . . , r n ; θ )= X I ∪ ... ∪ I ν m X k ,...,k ν =1 n ∇ p [ ∂ νk ...k ν µ i ( Y s ( θ ); θ )] D i ( I ) r ( I ) Y k s ( θ ) · · · D i ( I ν ) r ( I ν ) Y k ν s ( θ )+ ∂ νk ...k ν µ i ( Y s ( θ ); θ ) ν X p =1 ∇ p D i ( I p ) r ( I p ) Y k p s ( θ ) D i ( I ) r ( I ) Y k s ( θ ) · · · D ˇ ı ( I p )ˇ r ( I p ) Y ˇ k p s ( θ ) · · · D i ( I ν ) r ( I ν ) Y k ν s ( θ ) o , where we have set ∇ p [ ∂ νk ...k ν µ i ( Y s ( θ ); θ )] = ∇ p ∂ νk ...k ν µ i ( Y s ( θ ); θ ) + ∂∂ νk ...k ν µ i ( Y s ( θ ); θ ) ∇ p Y s ( θ ) . Notice that the same kind of equation (skipped here for sake of conciseness) holds true forthe coefficients ˆ α i,lj,i ,...,i n . The next object we need for our calculations is the inverse of the Malliavin matrix γ Y t ( θ ) of Y t ( θ ) . Recall that according to (11), the Malliavin matrix of Y t ( θ ) is defined by γ t ( θ ) := γ Y t ( θ ) = (cid:0)(cid:10) D · Y it ( θ ) , D · Y jt ( θ ) (cid:11)(cid:1) ≤ i,j ≤ m , (15)where we have set γ t ( θ ) := γ Y t ( θ ) for notational sake in the computations below. We shallnow compute γ − t ( θ ) as the solution to a SDE: Proposition 3.8.
The matrix valued process γ − t ( θ ) is the unique solution to the followinglinear equation in η : η t ( θ ) = ˜ α − ( Y t ( θ ); θ ) − d X l =1 Z t [ η u ( θ ) ˜ α l ( Y u ( θ ); θ ) + ˜ α Tl ( Y u ( θ ); θ ) η u ] dB lu − Z t [ η u ( θ ) ˜ β ( Y u ( θ ); θ ) + ˜ β T ( Y u ( θ ); θ ) η u ( θ )] du, (16) with ˜ α ( Y t ( θ ); θ ) = m X j =1 Z t Z t σ ij ( Y r ( θ ); θ ) σ i ′ j ( Y r ′ ( θ ); θ ) | r − r ′ | H − dr dr ′ , i, i ′ = 1 , . . . , m and where the other coefficients ˜ α and ˜ β are defined by ˜ α l ( Y u ( θ ); θ ) = (cid:16) ∂ k σ i ′ l ( Y u ( θ ); θ ) (cid:17) ≤ i ′ ,k ≤ m and ˜ β ( Y u ( θ ); θ ) = (cid:16) ∂ k µ i ′ ( Y u ( θ ); θ ) (cid:17) ≤ i ′ ,k ≤ m . (17) Proof.
The proof of this fact is an adaptation of [19, Theorem 7] to the case of a SDEwith drift. We include it here for sake of completeness, and we drop the dependence of Y on θ for notational sake in the computations below.Let us start by invoking Proposition 2.7 and equation (14) in order to compute theproduct of two first-order Malliavin derivatives: D jr Y it D jr ′ Y i ′ t = σ ij ( Y r ) σ i ′ j ( Y r ′ ) + (18) + m X k =1 (Z t d X l =1 (cid:20) ∂ k σ il ( Y u ) D jr ′ Y i ′ u D jr Y ku + ∂ k σ i ′ l ( Y u ) D jr Y iu D jr ′ Y ku (cid:21) dB lu + Z t (cid:20) ∂ k µ i ( Y u ) D jr ′ Y i ′ u D jr Y ku + ∂ k µ i ′ ( Y u ) D jr Y iu D jr ′ Y ku du (cid:21)) . Moreover, recall that γ t is defined by (15). Thus, the covariance matrix becomes γ ii ′ t = d X j =1 D D j Y it , D j Y i ′ t E H = c H d X j =1 Z t Z t D jr Y it ( θ ) D jr ′ Y i ′ t ( θ ) | r − r ′ | H − dr dr ′ . Plugging (18) into this relation, we end up with the following equation for γ ii ′ : γ ii ′ t = ˜ α ii ′ + d X l =1 Z t m X k =1 (cid:20) ∂ k σ i l ( Y u ) γ i ′ ku + ∂ k σ i ′ l ( Y u ) γ iku (cid:21) dB lu + Z t m X k =1 (cid:20) ∂ k µ i ( Y u ) γ i ′ ku + ∂ k µ i ′ ( Y u ) γ i ku (cid:21) du. Using our notation (17) and matrix product rules, we obtain that γ t is solution to: γ t = d X l =1 Z t ( ˜ α l ( Y u ) γ u + γ u ˜ α Tl ( Y u )) dB lu + Z t ( ˜ β ( Y u ) γ u + γ u ˜ β T ( Y u )) du. Consider now η solution to (16). Applying again Proposition 2.7, it is readily checkedthat γ t η t = Id for any t ∈ [0 , T ] , which ends the proof. (cid:3) Remark . Gathering equation (16) and Proposition 2.5, it is easily seen that for any t > and θ ∈ Θ , Y t ( θ ) is a non degenerate random variable in the sense given at [32,Definition 2.1.2]: we have det( γ − t ) ∈ L p (Ω) for any p > .Now that we have derived an equation for η = γ − , an equation for the Malliavinderivative of η is also available: Proposition 3.10.
For any l ∈ { , . . . , q } and n ≥ , the process η t = γ − t is n -timedifferentiable in the Malliavin calculus sense. Moreover, the process D i ,...,i n η t satisfies the N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 15 following equation: for t ≥ r ∨ · · · ∨ r n , D i ,...,i n r ,...,r n η ijt ( θ ) = − n X k =1 k X k =1 ( D i ,...,i k r ,...,r k ˜ α − D i ,i ,...,i k − k r ,r ,...,r k − k ˜ α D i ,...,i n − k r ,...,r n − k ˜ α − ) ij − d X ℓ =1 Z tr ∨ ... ∨ r n C ijℓ,i ,...,i n ( s ; r , . . . , r n ; θ ) dB ℓs − Z tr ∨ ... ∨ r n A iji ,...,i n ( s ; r , . . . , r n ; θ ) ds, where A iji ,...,i n ( s ; r , . . . , r n ; θ ) = X m X k ,...,k ν m X k =1 n [ ∂ νk ,...,k ν ( ˜ β ( Y u ( θ ); θ )) kj D i ( I ) r ( I ) η iks ( θ ) . . . D i ( I ν ) r ( I ν ) η iks ( θ ) D i ( I ) r ( I ) Y is ( θ ) . . . D i ( I ν ) r ( I ν ) Y is ( θ )]+[ ∂ νk ,...,k ν ( ˜ β ( Y u ( θ ); θ )) ik D i ( I ) r ( I ) η kjs ( θ ) . . . D i ( I ν ) r ( I ν ) η kjs ( θ ) D i ( I ) r ( I ) Y js ( θ ) . . . D i ( I ν ) r ( I ν ) Y js ( θ )] o and the same kind of equation holds for C ijl,i ,...,i n ( s ; r , . . . , r n ; θ ) , with the coefficients β replaced by α l .Proof. The proof of this proposition is based on Lemma 3.4 and the fact that dA − λ dλ = − A − λ dA λ dλ A − λ . (cid:3) Finally, one can also differentiate η with respect to our standing parameter θ , whichyields: Lemma 3.11.
The derivative of the inverse of the Malliavin matrix η t with respect to θ satisfies the following SDE ∇ l η t ( θ ) = ∇ l ˜ α − − d X ℓ =1 Z t {∇ l η u ( θ ) ˜ α ℓ ( Y u ( θ ); θ ) + η u ( θ ) ∇ l [ ˜ α ℓ ( Y u ( θ ); θ )]+ ∇ l [ ˜ α Tℓ ( Y u ( θ ); θ )] η u ( θ ) + ˜ α Tℓ ( Y u ( θ ); θ ) ∇ l η u ( θ ) } dB ℓu − Z t {∇ l η u ( θ ) ˜ β ( Y u ( θ ); θ )+ η u ( θ ) ∇ l [ ˜ β ( Y u ( θ ); θ )] + ∇ l [ ˜ β T ( Y u ( θ ); θ )] η u ( θ ) + ˜ β T ( Y u ( θ ); θ ) ∇ l η u ( θ ) } du, where ∇ l [ ˜ β ℓ ( Y u )] = ∂ ˜ β ℓ ( Y u ) ∇ l Y u + ∇ l ˜ β ℓ ( Y u ) and ∇ l [ ˜ α ℓ ( Y u )] = ∂ ˜ α ℓ ( Y u ) ∇ l Y u + ∇ l ˜ α ℓ ( Y u ) . Probabilistic representation of the likelihood.
We have chosen to represent thelog-likelihood of our sample thanks to the following formula borrowed from the stochasticanalysis literature:
Proposition 3.12.
Let F be a R m -valued non degenerate random variable (see Re-mark 3.9 for references on this concept), and let f be the density of F . For n ≥ and ( j , . . . , j n ) ∈ { , . . . , m } n , let H ( j ,...,j n ) ( F ) be defined recursively by H ( j ) ( F ) = P mj =1 δ (( γ − F ) j j DF j ) and H ( j ,...,j n ) ( F ) = m X j =1 δ (cid:16)(cid:0) γ − F (cid:1) j n j DF j H ( j ,...,j n − ) ( F ) (cid:17) , (19) where the Skorohod operator δ is defined at Section 2.2.3. Then one can write f ( x ) = E (cid:2) ( F >x ) H (1 ,...,m ) ( F ) (cid:3) = E (cid:2) ( F − x ) + H (1 ,...,m, ,...,m ) ( F ) (cid:3) , (20) where ( F >x ) := Q mi =1 ( F i >x i ) and ( F − x ) + := Q mi =1 ( F i − x i ) + .Proof. The first formula is a direct application of [32, Proposition 2.1.5]. The second oneis obtained along the same lines, integrating by parts m additional times with respect tothe first one. (cid:3) The formula above can obviously be applied to Y t ( θ ) for any strictly positive t , sincewe have noticed at Remark 3.9 that Y t ( θ ) is a non-degenerate random variable. However,the expression of H ( j ,...,j n ) ( Y t ( θ )) given by (19) is written in terms of Skorohod integrals,which are not amenable to numerical computations. We will thus recast this expressionin terms of Young integrals plus some correction terms: Proposition 3.13.
Under Hypothesis 2.3 and 3.1, let us define Q pjist := ( γ − s ) pj D is Y jt ( θ ) for ≤ s < t ≤ T , p, j ∈ { , . . . , m } and i ∈ { , . . . , d } . Consider p ∈ { , . . . , m } and areal valued random variable G which is smooth in the Malliavin calculus sense. Set U p ( G ) = m X i =1 d X j =1 G Z t Q pjist dB is − c H m X i =1 d X j =1 Z t Z t D is (cid:2) GQ pjirt (cid:3) | r − s | H − drds, (21) where the integral with respect to B is understood in the Young sense. Then the quantities H ( j ,...,j n ) ( Y t ( θ )) defined at Proposition 3.12 can be expressed as H ( j ,...,j n ) ( Y t ( θ )) = m X j =1 U j n ◦ · · · ◦ U j (cid:0) Y jt ( θ ) (cid:1) . (22) Proof.
It is an immediate consequence of Proposition 2.9, since we have noticed in ourRemark 3.9 that Y t ( θ ) is a non-degenerate random variable. (cid:3) The previous proposition is still not sufficient to warranty an effective computationof the log-likelihood. Indeed, the right hand side of (21) contains terms of the form D s [ GQ pjirt ] , which should be given in a more explicit form. This is the content of our nextproposition. Proposition 3.14.
Set H ( j ,...,j n ) ( Y t ( θ )) := K j ...j n . Then the term D s [ K j ...j n Q pjirt ] in (21)can be computed inductively as follows: (i) We have D s [ K j ...j n Q pjirt ] = D s K j ...j n Q pjirt + K j ...j n D s Q pjirt , and D s Q pjirt is computed byinvoking Proposition 3.10 for the derivative of γ − t and Lemma 3.4 for the derivative of Y t ( θ ) . We are thus left with the computation of D s K j ...j n . (ii) Assume now that we can compute n − r Malliavin derivatives of K j ...j r . Notice thatthis condition is met for r = 0 , since Y t ( θ ) itself can be differentiated n times in anexplicit way according to Lemma 3.4 again. Then for any j , . . . , j r +1 and k ≤ n − r − , N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 17 the quantity K j ...j r +1 can be differentiated k times, with a Malliavin derivative given by D i ,...,i k ρ ...ρ k K j ...j r +1 = k X ℓ =1 D i ,..., ˇ i ℓ ...,i k ρ ... ˇ ρ ℓ ...ρ k ( K j ...j r Q pjiρ ℓ t ) + d X j =1 Z t D i ,...,i k ρ ...ρ k ( K j ...j r Q pjist ) dB js − c H Z t Z t D k +1 r ρ ...ρ k ( K j ...j r Q pjir t ) | r − r | H − dr dr . (23) Proof.
We focus on the induction step (ii), the other one being straightforward: for asmooth random variable W , one easily gets by induction that D i ,...,i p r ...r p δ ( W ) = p X ℓ =1 D i ,..., ˇ i ℓ ...,i p r ... ˇ r ℓ ...r p W r ℓ + δ ( D i ,...,i p r ...r p W ) . (24)Suppose we know the n − r Malliavin derivatives for U j r ◦ · · · ◦ U j ( F ) := K j ...j r . Recallmoreover that K j ...j r +1 = U j r +1 ( K j r ...j ) = δ ( K j r ...j Q · t ) Applying directly relation (24) we thus get, for k ≤ m − : D i ,...,i k ρ ...ρ k δ ( K j r ...j Q · t ) = k X ℓ =1 D i ,..., ˇ i ℓ ,...,i k − ρ ... ˇ ρ ℓ ...ρ k ( K j r ...j Q ρ ℓ t ) + δ ( D i ,...,i k ρ ...ρ k ( K j r ...j Q · t )) . Our formula (23) is now obtained by applying Proposition 2.9 to the Skorohod integral δ ( D kρ ...ρ k ( K j r ...j Q · t )) above. (cid:3) Example . As an illustration of the proposition above, we compute U ◦ U ( F ) for F = Y it , i ∈ { , . . . , m } and our d -dimensional fBm B .Write first U ( Y it ) = δ ( Y it ( γ − ) j D j Y it ) , and since this quantity has to be expressedin a suitable way for numerical approximations, we have U ( Y it ) = d X j =1 Y it Z t Q ij ut dB j u − c H d X j =1 Z t Z t D j u [ Y it Q ij u t ] | u − u | H − du du , where Q is defined at Proposition 3.13 and where the first integral in the right hand side isunderstood in the Young sense. In order to compute the second one, we have to computeMalliavin derivatives. This is done through Lemma 3.4 for Y and Proposition 3.10 for Q .We now have to differentiate U ( Y it ) : the derivation rules for Skorohod integrals imme-diately yield D j u [ U ( Y it )] = d X j =1 Y it Q ij u t + d X j =1 δ ( D j u Y it Q ij · t ) . Once again, the Skorohod integral above is not suitable for numerical approximations.Write thus D j u [ U ( Y it )] = d X j =1 Y it Q ij u t + d X j =1 Z t D j u [ Y it Q ij rt ] dB j r − c H d X j =1 Z t Z t D j u D j u [ Y it Q ij u t ] | u − u | H − du du , and compute the Malliavin derivatives of the products Y Q thanks to Lemma 3.4 for Y and Proposition 3.10 for Q . Once this is done, just write U ( U ( Y it )) = δ ( U ( Y it ) Q i · t )= d X j =1 U ( Y it ) Z t Q ij ut dB j u − c H d X j =1 Z t Z t D j u [ U ( Y it ) Q ij u t ] | u − u | H − du du . In order to give our formula for the derivative of the log-likelihood, we still need tocompute the derivative with respect to θ of H ( j ,...,j n ) ( Y t ( θ )) . For this we state the followinglemma Lemma 3.16.
The derivative with respect to θ of U p ( Y t ( θ )) can be written as ∇ l U p ( Y it ( θ )) = d X j =1 [ ∇ l Y it ( θ ) Z t Q pijst ( θ ) dB js + Y it ( θ ) Z t ∇ l [ Q pijst ( θ )] dB js ] − c H d X j =1 Z t Z t ∇ l [ D js Y it ( θ ) Q pijrt ( θ )] | r − s | H − drds, where ∇ l Y it ( θ ) is computed according to Proposition 3.6 and ∇ l [ D js Y it ] is given by Lem-ma 3.7. As far as ∇ l [ Q pjst ( θ )] is concerned, it is obtained through the following equation: ∇ l [ Q pjst ( θ )] = ∇ l η pjs ( θ ) D s Y jt ( θ ) + η pjs ( θ ) ∇ l [ D s Y jt ( θ )] , where the expression for ∇ l η pjs ( θ ) is a consequence of Lemma 3.11. We are now ready to state our probabilistic expression for the log-likelihood function (5).
Theorem 3.17.
Assume Hypothesis 2.3 and 3.1 hold true. Let y t i , i = 1 , . . . , n be theobservation arriving at time t i . Let also Y t i be the solution to the SDE (1) at time t i . Then,the gradient of the log-likelihood function admits the following probabilistic representation: ∇ l ℓ n ( θ ) = P ni =1 V i ( θ ) W i ( θ ) , with W i ( θ ) = E (cid:20) ( Y ti ( θ ) >y ti ) H (1 ,...,m ) (cid:16) Y t i ( θ ) (cid:17)(cid:21) (25) N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 19 and V i ( θ ) = E (cid:20) ∇ l Y t i ( θ ) ( Y ti ( θ ) >y ti ) H (1 ,...,m, ,...,m ) (cid:16) Y t i ( θ ) (cid:17) + (cid:16) Y t i ( θ ) − y t i (cid:17) + ∇ l H (1 ,...,m, ,...,m ) (cid:16) Y t i ( θ ) (cid:17)(cid:21) , (26) where (i) H ( j ,...,j n ) ( Y t i ( θ )) is given recursively by (22) and computed at Proposition 3.14 (ii) ∇ l Y t i ( θ ) is given by Proposition 3.6 (iii) ∇ l H (1 ,...,m, ,...,m ) is obtained by applying Lem-ma 3.16.Proof. Recall that under Hypothesis 2.3 and 3.1, Y t ( θ ) admits a C ∞ density f ( t, · ; θ ) forany t > and θ ∈ Θ . Moreover, we have defined ℓ n ( θ ) as ℓ n ( θ ) = P ni =1 ln( f ( t i , y t i ; θ )) .Thus ∇ l ℓ n ( θ ) = n X i =1 ∇ l f ( t i , y t i ; θ ) f ( t i , y t i ; θ ) := n X i =1 V i ( θ ) W i ( θ ) . Now W i ( θ ) can be expressed like (25) by a direct application of (20), first relation. As faras V i ( θ ) is concerned, write f ( t i , y t i ; θ ) = E (cid:2) ( Y t i ( θ ) − y t i ) + H (1 ,...,m, ,...,m ) ( Y t i ( θ )) (cid:3) , according to the second relation in (20). By using standard arguments, one is allowed todifferentiate this expression within the expectation, which directly yields (26). (cid:3) Discretization of the log-likelihood
The expression of the log-likelihood that we derived in Proposition 3.17 is a fraction oftwo expectations that do not have explicit formulas even in the one-dimensional case. Inaddition, our goal is to find the root of this non-explicit expression, the ML estimator,which is an even harder task. To solve this problem in practice we first use a stochasticapproximation algorithm in order to find the root of ∇ l ℓ n ( θ ) . In each iteration of thealgorithm we compute the value of the expression using Monte-Carlo (MC) simulations.For each Monte-Carlo simulation, since we do not have available an exact way of simulatingthe kernels of the expectation, we use an Euler approximation scheme. More specifically,we simulate using Euler approximation terms such as Y t , DY t , which are solutions tofractional stochastic differential equations.Therefore, in our approach we have three types of error in the computation of theMLE: the error of the stochastic approximation algorithm, the Monte-Carlo error and thediscretization bias introduced by the Euler approximation for the stochastic differentialequations. Our aim here is to combine the Monte Carlo and Euler approximations in anoptimal way in order to get a global error bound for the computation of ∇ l ℓ n ( θ ) .4.1. Pathwise convergence of the Euler scheme.
The Euler scheme is the mainsource of error in our computations. There is always a trade-off between the number ofEuler steps and the number of simulations, but what is usually computationally costly isthe number of Euler steps. This is even worse when we deal with fractional SDEs, sincethe rate of convergence depends on H and the closer the value of H to / , the moresteps are required for the simulation. In this section, we compute the magnitude of the discretization error we introduce. Wemeasure the bias of the Euler scheme via the root mean square error. That is, we want toestimate the quantity sup τ ∈ [0 ,T ] ( E | Y τ ( θ ) − ¯ Y Mτ ( θ ) | ) / , where Y t ( θ ) is the solution to theSDE (1) and ¯ Y Mτ ( θ ) is the Euler approximation of Y τ ( θ ) given on the grid { τ k ; k ≤ M } by ¯ Y Mτ k +1 ( θ ) = ¯ Y Mτ k ( θ ) + µ ( ¯ Y Mτ k ( θ ); θ )( τ k +1 − τ k ) + d X j =1 σ j ( ¯ Y Mτ k ( θ ); θ ) δB M,jτ k τ k +1 , (27)in which we denote δB M,jτ k τ k +1 = B M,jτ k +1 − B M,jτ k and τ k = kTM for k = 0 , . . . , M − . Noticethat those estimates can be found in [8, 11, 30]. We include their proof here because itis simple enough, and also because they can be easily generalized to the case of a linearequation. This latter case is of special interest for us, since it corresponds to Malliavinderivatives, and is not included in the aforementioned references. Notation 4.1.
For simplicity, in this section we write Y := Y ( θ ) . Proposition 4.2.
Let
T > and recall that ¯ Y M is defined by equation (27). Then, thereexists a random variable C with finite L p moments such that for all γ < H and H > / we have k Y t − ¯ Y k γ,T ≤ C T M − γ (28) Consequently, we obtain that the MSE is of order O ( M − γ ) .Proof. In order to prove (28) we apply techniques of the classical numerical analysis forthe flow of an ordinary differential equation driven by a smooth path. Namely, the exactflow of (1) is given by Φ( y ; s, t ) := Y t , where Y t is the unique solution of (1) when t ∈ [ s, T ] and the initial condition is Y s = y . Introduce also the numerical flow Ψ( y ; τ k , τ k +1 ) := y + µ ( y )( τ k +1 − τ k ) + d X j =1 σ j ( y ) δB M,jτ k τ k +1 , (29)where τ k = kTM , k = 0 , . . . , M − . Thus, we can write that ¯ Y Mτ k +1 = Ψ (cid:16) ¯ Y Mτ k ; τ k , τ k +1 (cid:17) , k = 0 , . . . , M − Y M = α. For q > k we also have that Ψ( y ; τ k , τ q ) := Ψ( · ; τ q − , τ q ) ◦ Ψ( · ; τ q − , τ q − ) ◦ . . . ◦ Ψ( y ; τ k , τ k +1 ) . The one-step error computes as r k = Φ( y ; τ k , τ k +1 ) − Ψ( y ; τ k , τ k +1 )= Z τ k +1 τ k h µ ( Y s ) − µ ( y ) i ds + Z τ k +1 τ k h σ ( Y s ) − σ ( y ) i dB s (30) N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 21
Furthermore, since Y ∈ C γ and B ∈ C γ for γ > / , using (7) we have (cid:12)(cid:12)(cid:12)Z τ k +1 τ k h σ ( Y s ) − σ ( y ) i dB s (cid:12)(cid:12)(cid:12) ≤ c γ k ∂σ k ∞ k Y k γ k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ ≤ c γ,σ k ∂σ k ∞ k B k /γγ k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ , where we used the fact that k Y k γ ≤ c σ k B k /γγ (see Proposition 2.5). Similarly, for thedrift part we have (cid:12)(cid:12)(cid:12)Z τ k +1 τ k h µ ( Y s ) − µ ( y ) i ds (cid:12)(cid:12)(cid:12) ≤ c γ k ∂µ k ∞ k Y k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ +1 ≤ c γ,µ k ∂µ k ∞ k B k /γγ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ +1 . Therefore, the one-step error (30) satisfies | r k | ≤ c µ,σ k B k /γγ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . (31)Now, we can write the classical decomposition of the error in terms of the exact andnumerical flow. Since ¯ Y Mτ k = Φ( ¯ Y Mτ k ; τ k , τ k ) and Y τ k = Φ( ¯ Y Mτ ; τ , τ k ) we have ¯ Y Mτ q − Y τ q = Φ( ¯ Y τ ; τ , τ k ) − Φ( ¯ Y τ q ; τ q , τ q ) = q − X k =0 (cid:16) Φ( ¯ Y Mτ k ; τ k , τ q ) − Φ( ¯ Y τ k +1 ; τ k +1 , τ q ) (cid:17) . (32)Since Φ (cid:16) ¯ Y Mτ k ; τ k , τ q (cid:17) = Φ (cid:16) Φ( ¯ Y Mτ k ; τ k , τ k +1 ); τ k +1 , τ q (cid:17) we obtain (cid:12)(cid:12)(cid:12) Φ( ¯ Y Mτ k ; τ k , τ q ) − Φ( ¯ Y Mτ k +1 ; τ k +1 , τ q ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) Φ (cid:16) Φ( ¯ Y Mτ k ; τ k , τ q ); τ k +1 , τ q (cid:17) − Φ( ¯ Y Mτ k +1 ; τ k +1 , τ q ) (cid:12)(cid:12)(cid:12) ≤ C T ( k B k γ ) | Φ( ¯ Y Mτ k ; τ k , τ k +1 ) − ¯ Y Mτ k +1 | , where we have used the fact that | Φ( α ; t, s ) − Φ( β ; t, s ) ≤ C T ( k B k γ ) | α − β | , where C T is a subexponential function (see Proposition 2.5 again). Moreover, owing torelation (31), | Φ( ¯ Y Mτ k ; τ k , τ q ) − ¯ Y Mτ k +1 | = | r k | ≤ c µ,σ k B k /γγ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . (33)Therefore, replacing (33) in (32) for any q ≤ n we obtain | ¯ Y Mτ q − Y τ q | ≤ c µ,σ k B k /γγ q − X k =0 (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ Let us push forward this analysis to Hölder type norms on the grid ≤ τ < . . . < τ n = T .We have for q ≥ pδ (cid:16) Y − ¯ Y M (cid:17) τ p τ q = (cid:16) Φ( Y τ p ; τ p , τ q ) − Y τ p (cid:17) − (cid:16) Ψ( ¯ Y Mτ p ; τ p , τ q ) − ¯ Y nτ p (cid:17) = (cid:16) Φ( Y τ p ; τ p , τ q ) − Y τ p (cid:17) − (cid:16) Φ( ¯ Y Mτ p ; τ p , τ q ) − ¯ Y Mτ p (cid:17) − (cid:16) Ψ( ¯ Y Mτ p ; τ p , τ q ) − Φ( ¯ Y Mτ p ; τ p , τ q ) (cid:17) = (cid:18)(cid:16) Φ( Y τ p ; τ p , τ q ) − Φ( ¯ Y Mτ p ; τ p , τ q ) (cid:17) − (cid:16) Y τ p − ¯ Y Mτ p (cid:17)(cid:19) − (cid:16) Ψ( ¯ Y Mτ p ; τ p , τ q ) − Φ( ¯ Y Mτ p ; τ p , τ q ) (cid:17) . Similar to the calculations leading to (33) we obtain (cid:12)(cid:12)(cid:12)
Ψ( ¯ Y Mτ p ; τ p , τ q ) − Φ( ¯ Y Mτ p ; τ p , τ q ) (cid:12)(cid:12)(cid:12) ≤ c µ,σ k B k /γγ q − X k = p (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . Moreover, owing to Proposition 2.5 part (2), observe that (cid:12)(cid:12)(cid:12)(cid:16) Φ( Y τ p ; τ p , τ q ) − Φ( ¯ Y Mτ p ; τ p , τ q ) (cid:17) − (cid:16) Y τ p − ¯ Y Mτ p (cid:17)(cid:12)(cid:12) | τ q − τ p | γ ≤ c ( k B k γ ) | Y τ p − ¯ Y Mτ p | . Consequently, we have that for ≤ p < q ≤ M (cid:12)(cid:12)(cid:12) δ (cid:16) Y − ¯ Y M (cid:17) τ p τ q (cid:12)(cid:12)(cid:12) ≤ c ′ ( k B k /γγ ) n q − X k = p (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ + | τ q − τ p | γ q X k =0 (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ o which easily yields that sup p,q =0 , ,...,M − ,p = q (cid:12)(cid:12)(cid:12) δ (cid:16) Y − ¯ Y M (cid:17) τ p τ q (cid:12)(cid:12)(cid:12) | τ p − τ q | γ ≤ c ( k B k γ ) M − γ . By “lifting” this error estimate to [0 , T ] and since | t − s | ≤ T /M , k Y t − ¯ Y k γ, ∞ ,T ≤ C M − γ , (34)which concludes the first part of the proof.Regarding the order of the Mean Square Error, it suffices to note that the constant C has finite L p moments. (cid:3) As mentioned before, an elaboration of Proposition 4.2 is needed in the sequel. Indeed,in the expression of the log-likelihood in Proposition 3.17 we need to discretize morecomplicated quantities of the underlying process, such as (14) or (16). To this aim, let usnotice first that all those equations can be written under the following generic form: Z t = α + Z t ξ u Z u du + Z t ξ ,ju Z t dB ju , (35) N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 23 where ξ , ξ are stochastic processes with bounded moments of any order. The corre-sponding Euler discretization is ¯ Z Mτ k = ¯ Z Mτ k + ξ τ k ¯ Z Mτ k ( τ k +1 − τ k ) + d X j =1 ξ ,jτ k ¯ Z τ k δB j,Mτ k τ k +1 , (36)and we give first an approximation result in this general context: Proposition 4.3.
Let
T > , and consider the R q -valued solution Z to equation (35),where α ∈ R q , ξ , ξ ,j ∈ R q,q and we suppose that k ξ k γ and k ξ ,j k γ belong to L p (Ω) forany value of p ≥ . Let ¯ Z M be defined by equation (36). Then, there exists a randomvariable C ′ with L p finite moments, such that for all γ < H and H > / we have k Z − ¯ Z k γ,T ≤ C ′ T M − γ (37) Consequently, we obtain that the Mean Square Error is of order O ( M − γ ) .Proof. We follow a similar approach as in the previous proposition. Thus, the exact flowis equal to Φ( ζ ; s, t ) := Z t , where Z t is the unique solution of equation (35) when t ∈ [ s, T ] and the initial condition is Z s = ζ . Consider also the numerical flow Ψ( ζ ; τ k , τ k +1 ) := ζ + ξ u ζ ( τ k +1 − τ k ) + d X j =1 ξ ,ju ζ δB j,Mτ k τ k +1 , where τ k = kT /M , n = 0 , . . . , M − . Thus, we have ¯ Z Mτ k +1 = Ψ( ¯ Z Mτ k ; τ k +1 , τ k ) , k = 0 , . . . , M − Z M = α. In this case, the one-step error can be written as r k = Φ( ζ ; τ k , τ k +1 ) − Ψ( ζ ; τ k , τ k +1 )= Z τ k +1 τ k ξ u ( Z s − ζ ) du + Z τ k +1 τ k ξ u ( Z s − ζ ) dB u We now treat each term separately. Therefore, using the fact that k Z k γ ≤ exp( c k B k /γγ ) ,which is recalled at Proposition 2.5 point (4) in a slightly different context, we have that (cid:12)(cid:12)(cid:12)Z τ k +1 τ k ξ s ( Z s − ζ ) dB s (cid:12)(cid:12)(cid:12) ≤ c γ k Zξ k γ k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ ≤ c γ k exp( k B k /γγ ) k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . Similarly, we also have (cid:12)(cid:12)(cid:12)Z τ k +1 τ k ξ s ( Z s − ζ ) ds (cid:12)(cid:12)(cid:12) ≤ c γ k Zξ k γ k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ ≤ c γ k exp( k B k /γγ ) k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . Therefore, the one-step error satisfies the following inequality | r k | ≤ c γ exp( k B k /γγ ) k B k γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . Along the same lines as for Proposition 4.2, the decomposition of the error in terms ofthe exact and numerical flow becomes ¯ Z Mτ q − Z τ q = Φ( ¯ Z Mτ q ; τ q , τ q ) − Φ( ¯ Z Mτ ; τ , τ k ) = q − X k =0 (cid:16) Φ( ¯ Z Mτ k +1 ; τ k +1 , τ q ) − Φ( ¯ Z Mτ k ; τ k , τ q ) (cid:17) , and the same inequalities allowing to go from (32) to (33) yield | ¯ Z τ q − Z τ q | ≤ c γ (cid:12)(cid:12)(cid:12)(cid:12) TM (cid:12)(cid:12)(cid:12)(cid:12) γ . The claim of the proposition follows now as in Proposition 4.2. (cid:3)
We now use the previous proposition in order to approximate the kernels of the expec-tations in ∇ l ℓ n ( θ ) . Let us first introduce the following notation: Notation 4.4.
Let W i ( θ ) , V i ( θ ) as in (25) and (26) respectively and define w i ( θ ) and v i ( θ ) as w i ( θ ) = ( Y ti ( θ ) >y ti ) H (1 ,...,m ) (cid:16) Y t i ( θ ) (cid:17) (38) v i ( θ ) = ∇ l Y t i ( θ ) ( Y ti ( θ ) >y ti ) H (1 ,.,m, ,.,m ) + (cid:16) Y t i ( θ ) − y t i (cid:17) + ∇ l H (1 ,.,m, ,.,m ) . (39) Let also ¯ w Mi and ¯ v Mi to be the Euler discretized versions of (38) and (39) using 4.3, andset ¯ W Mi ( θ ) = E [ ¯ w Mi ] and ¯ V Mi ( θ ) = E [¯ v Mi ] . Our convergence result for ∇ l ℓ n ( θ ) can be read as follows: Theorem 4.5.
Recall from Theorem 3.17 that ∇ l ℓ n ( θ ) can be decomposed as ∇ l ℓ n ( θ ) = P ni =1 V i ( θ ) W i ( θ ) . Then the following approximation result holds true: (cid:12)(cid:12) V i ( θ ) − ¯ V Mi ( θ ) (cid:12)(cid:12) + (cid:12)(cid:12) W i ( θ ) − ¯ W Mi ( θ ) (cid:12)(cid:12) ≤ cM γ − , for a strictly positive constant c .Proof. We focus on the bound for | V i ( θ ) − ¯ V Mi ( θ ) | , the other one being very similar.Now, applying Proposition 4.3 to the particular case of the equations governing Malliavinderivatives, we easily get k v t − ¯ v k γ,T ≤ C M − γ , for an integrable random variable C . The proof is now easily finished by invoking theinequality (cid:12)(cid:12) V i ( θ ) − ¯ V Mi ( θ ) (cid:12)(cid:12) ≤ E [ k v t − ¯ v k γ,T ] . (cid:3) N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 25
Remark . We have given two separate approximations for V i ( θ ) and W i ( θ ) . In orderto fully estimate ( V i ( θ ) /W i ( θ )) − ( ¯ V Mi ( θ ) / ¯ W Mi ( θ )) , one should also prove that W i ( θ ) isbounded away from 0. This requires a lower bound for densities of differential equationsdriven by fractional Broawnian motion, which are out of the scope of the current article.4.2. Efficiency of the Monte Carlo simulation.
In this section we aim to study thecomputational tradeoff between the length of a time period in the Euler discretization(i.e. /M ) and the number of Monte Carlo simulations of the sample path (i.e. N ). Inorder to do so we consider ¯ w Mi and ¯ v Mi as above.Recall that, given t units of computer time, the Monte-Carlo estimators for W i ( θ ) and V i ( θ ) can be written as c ( t, M ) c ( t, M ) X k =1 w Mi,k , c ( t, M ) c ( t, M ) X k =1 v Mi,k where { w Mi,ℓ ; ℓ ≥ } (resp. { v Mi,ℓ ; ℓ ≥ } ) is a sequence of i.i.d. copies of w Mi (resp. of v Mi ),and c ( t, M ) , c ( t, M ) are the maximal number of runs one is allowed to consider with t units of computer time. Using the result by [10] we can state the following proposition: Proposition 4.7.
Let N be the number of Monte Carlo simulations and M the numberof steps of the Euler scheme, then the tradeoff between N and M for computing W i ( θ ) (and similarly V i ( θ ) ) is N ≍ M ˜ γ γ − − , for all / < γ < H and ˜ γ = T m ( d + 1) , where T is the time horizon, m the dimensionof the observed process and d the dimension of the noise process.Proof. We discuss the proof only for W i , by following exactly the same steps we can obtainthe same result for V i .We only need to check that our process w satisfies the conditions of Theorem 1 in [10].(i) We can easily see that the discretized ¯ w Mt i converges uniformly to w t i .(ii) In addition, we have bounded moments of w t i , thus E [ ¯ W t i ] → E [ w t i ] .(iii) From Proposition 4.5 we have that the rate of convergence of the Euler scheme of ¯ w Mt i is M − γ , for / < γ < H .(iv) The computer time required to generate ¯ w Mt i is given by τ (1 /M ) , which satisfies: τ (1 /M ) = T m ( d + 1) M = ˜ γM where T is the length of the time period, m is the dimension of the SDE, d is thedimension of the fBm and M is the number of Euler steps.By applying Theorem 1 (by [10]) the optimal rule for choosing the number of Monte-Carlo simulations and the number of Euler steps is chosen such that the asymptotic erroris minimized. Therefore, for t the total budget of computer time, as t increases, then theEuler step should converge to zero with order − γ ˜ γ +2 − γ or equivalently: M ≍ t − γ ˜ γ +2 − γ thus t ≍ M − ˜ γ +2 − γ − γ . But the number of operations needed for an arbitrary Monte Carlo simulation t is equalto ˜ γM N . Thus, we finally obtain that N ≍ M − ˜ γ +2 − γ − γ − . (cid:3) Discretization of the score function.
Consider the following discretized versionof the score function, i.e. ∇ l ℓ n ( θ ) : ˆ ∇ l ℓ n ( θ ) = ˆ V i ˆ W i := N P Nk =1 ¯ v Mi,k N P Nk =1 ¯ w Mi,k , (40)where ¯ w M ,k , ¯ w M ,k , . . . and ¯ v M ,k , ¯ v M ,k , . . . are iid copies of ¯ w Mi and ¯ v Mi respectively. Our aim inthis section is to give a global bound for the mean square error obtained by approximating ∇ l ℓ n ( θ ) by ˆ ∇ l ℓ n ( θ ) . Proposition 4.8.
The discretized score function ˆ ∇ l ℓ n ( θ ) converges to the continuousscore function ∇ l ℓ n ( θ ) with rate of convergence of order M − (2 γ − , where / < γ < H and M is the number of Euler steps used in the discretization.Proof. We discuss the idea of the proof for the W i term first: E (cid:16) ˆ W i − W i (cid:17) = E N N X k =1 ¯ w Mi,k − E [ w i ( θ )] ! = E (cid:18) N N X k =1 ¯ w Mi,k − N N X k =1 w i,k + 1 N N X k =1 w i,k − E [ w i ( θ )] (cid:19) . Thanks now to the independence property between Monte Carlo runs, we get E (cid:16) ˆ W i − W i (cid:17) ≤ N N X k =1 E ( ¯ w Mi,k − w i,k ) + 2 E (cid:18) N N X k =1 w i,k − E [ w i ( θ )] (cid:19) = 1 N N X k =1 (Euler MSE) + (Monte Carlo MSE) ≍ ( M − γ ) + 1 N , and thus
MSE (cid:16) ˆ W i − W i (cid:17) ≍ r ( M − γ ) + 1 N .
Now, if we use Proposition 4.7, i.e. N ≍ M − ˜ γ +2 − γ − γ − , for all / < γ < H , and ˜ γ = T m ( d + 1) , where T is the time horizon, m the dimension of the observed processand d the dimension of the noise process, we have M SE (cid:16) ˆ W i − W i (cid:17) ≍ q M − γ + M ˜ γ − γ +3 ≍ M − γ , since the first is the dominant term above.Following the same procedure, we can show that MSE( ˆ V i − V i ) ≍ M − γ and thus theclaim of the proposition follows easily. (cid:3) Remark . In Proposition 4.8 the rate of convergence is independent of the dimensionof the problem, i.e. it is independent of the parameter ˜ γ = T m ( d + 1) . N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 27 Numerical Examples
In this section our aim is to investigate the performance of the suggested maximum like-lihood method in practice. We study the one-dimensional fractional Ornstein-Uhlenbeckprocess, a linear two-dimensional system of fractional SDEs and then some real data givenby a financial time series. Before presenting our results, we first discuss some technicalissues raised by the algorithmic implementation of our method.The goal is to find the root of the quantity ∇ l ℓ n ( θ ) with respect to θ . We can divide thisprocedure in two parts. The first part consists in computing the root of the log-likelihoodusing a stochastic approximation algorithm. This is a stochastic optimization techniquefirstly introduced by Robbins and Monro (1951) that is used when only noisy observationsof the function are available. In our case it is appropriate, since we want to solve ∇ l ℓ n ( θ ) = 0 , where ∇ l ℓ n ( θ ) is given by Theorem 3.17 and has to be approximated by ˆ ∇ l ℓ n ( θ ) . Thus,the recursive procedure is of the following form ˆ θ k +1 = ˆ θ k − a k ˆ ∇ l ℓ n (ˆ θ k ) . (41)where ˆ ∇ l ℓ n is the estimate of ∇ l ℓ n at the k-th iteration based on the observations and a k is a sequence of real numbers such that P ∞ k =1 a k = ∞ and P ∞ k =1 a k < ∞ . Underappropriate conditions (see for example [4]), the iteration in (41) converges to θ almostsurely. The step sizes satisfy a k > and the way that we choose them can be foundin [26].The second part consists of the computation of ˆ ∇ l ℓ n (ˆ θ k ) at each step of the stochas-tic approximation algorithm. Thus, for a given value of θ k (the one computed at the k -th iteration) we want to compute ˆ ∇ l ℓ n ( θ k ) when we are given n discrete observationsof the process: y t i , i = 1 , . . . , n . Here, we describe the main idea of the algorithm weuse for only one step. Thus, assume that we are at [ t i − , t i ] , and at time t i we obtainthe i -th observation. We want to compute W i ( θ ) and V i ( θ ) according to expressions (25)and (26) respectively. To compute the expectations we use simple Monte-Carlo simula-tions.Therefore, we discretize the time interval into N steps t i − = s < s < · · · < s N = t i . From each simulated path (apart from that of fBm) we only need to keep the terminalvalue which is the value of the process at time t i . The algorithm is the following(1) Simulate N values of fBm in the interval [ t i − , t i ] using for example the circulantmatrix method (any exact -preferably- simulation technique can be used).(2) Using the simulated values from step 1 and an Euler scheme for the SDE (1),simulate the value of the process at time t i . For example, for k = 0 , . . . , N ¯ Y Ms k = ¯ Y Ms k − + µ ( ¯ Y Ms k )( s k − s k − ) + d X j =1 σ ( j ) ( ¯ Y Ms k − )( B ( j ) s k − B ( j ) s k − ) . (3) Using step 2 and the observation at time t i , compute the indicator function ( Y ti ( θ ) >y ti ) . (4) Using step 1 and an Euler scheme simulate D t i Y iτ , as given in Lemma 3.3 for n = 1 -first Malliavin derivative-.(5) Using step 1 and an Euler scheme simulate η kjt i , k, j = 1 , . . . , m , as given in Propo-sition 3.7.(6) Steps 4 and 5 are used to compute Q pjst i , p ∈ { , . . . , m } , j ∈ { , . . . , d } as definedin Propositions 3.12 and 3.14.(7) Simulate the Malliavin derivative of the product D s [ Y t Q pjrt ] .(8) Using the previous steps, numerical integration for the double integral and nu-merical integration for the stochastic integral we compute U p ( Y t i ( θ )) as defined inProposition 3.12.(9) Recursively compute H (1 ,...,m ) ( Y t i ( θ )) as given in (19).(10) Combine steps 3 and 9 to obtain the kernel W i ( θ ) .(11) We repeat steps 1 through 10 N times and we average these values to obtain anestimate for the expectation W i ( θ ) .Using a similar procedure we can obtain an estimate for the expectation V i ( θ ) . Finally,for each i = 1 , . . . , n we compute V i ( θ ) /W i ( θ ) and sum over i to obtain the desired valueof the log-likelihood at θ k .We have completed the study of our numerical approximation of the log-likelihood, andare now ready for the analysis of some numerical examples.5.1. Fractional Ornstein-Uhlenbeck process.
Though our method can be applied tohighly nonlinear contexts, we focus here on some linear situations, which allow easiercomparisons with existing methods or exact computations. Let us first study the one-dimensional fractional Ornstein-Uhlenbeck process, i.e. dY t = − λY t dt + dB t , (42)where the solution is given Y t ( λ ) = R t e − λ ( t − s ) dB s (notice the existence of an explicitsolution here). In this case our methodology is quite simplified. The log-likelihood canbe written as follows: ∂ λ ℓ ( λ ; y ) = n X i =1 E (cid:20) ∂ λ Y t ( λ ) ( Y t ( λ ) >y ) H (1 , ( λ ) + (cid:16) Y t ( λ ) − y (cid:17) + ∂ λ H (1 , ( λ ) (cid:21) E (cid:20) ( Y t ( λ ) >y ) H (1) (cid:16) Y t ( λ ) , (cid:17)(cid:21) . The Malliavin derivative of Y t ( λ ) satisfies the following ODE D s Y t ( λ ) = 1 − λ Z ts D s Y u ( λ ) du, with solution D s Y t ( λ ) = e − λ t { s ≤ t } . The corresponding norm is k D · Y t ( λ ) k = c H Z ts Z ts e − λ ( u + v ) | u − v | H − dudv. The higher order derivatives of Y t ( λ ) are equal to zero. Therefore, H (1) (cid:16) Y t ( λ ) (cid:17) = 1 k D · Y t ( λ ) k Z ts e − λu dB u N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 29 and thus H (1 , ( λ ) = 1 k D · Y t ( λ ) k Z ts Z ts e − λ ( u + v ) dB u dB v − c H k D · Y t ( λ ) k − . The derivative with respect to the unknown parameter λ satisfies ∂ λ Y t ( λ ) = − Z t Y s ( λ ) − λ∂ λ Y s ( λ ) ds with solution ∂ λ Y t ( λ ) = R t ( t − s ) e − λ ( t − s ) dB s . The last term we need to compute is: ∂ λ H (1 , ( λ ) = 1 k D · Y t ( λ ) k (cid:20) k D · Y t ( λ ) k Z ts Z tr − ( u + v ) e − λ ( u + v ) dB u dB v − c H k D · Y t ( λ ) k Z ts Z tr − ( u + v ) e − λ ( u + v ) | u − v | H − dudv (cid:21) − c H R ts R tr − ( u + v ) e − λ ( u + v ) | u − v | H − dudv k D · Y t ( λ ) k . Now, we compute the MLE following the algorithm we described above. The resultswe obtained are summarized in the following table:True λ MLE ˆ λ Standard Error0.5 0.497 0.003694 3.861 0.00127
Remark . The value of H used for the simulation of the process is 0.6. The number ofobservations is n = 50 , the number of Euler steps is M = 500 , the number of stochasticapproximation steps is K = 50 and the number of MC simulations N = 500 .5.2. Two-dimensional fractional SDE.
In this section we study the following systemof fractional OU processes: dY (1) t = − αY (2) t dt + βdB (1) t dY (2) t = − βY (1) t dt + βdB (2) t . (43)In this case, the computations are more involved even though the SDEs are linear functionsof Y . Furthermore, the parameter we want to estimate is two-dimensional as well ( θ =( α, β ) T ), which complicated the optimization procedure. Therefore, instead of computingonly one derivative, we need to compute both derivatives with respect to α and β andthen compute the solution of the system of two equations ∇ α ℓ ( α, β ; y ) = 0 , ∇ β ℓ ( α, β ; y ) = 0 , where ∇ l ℓ ( α, β ; y ) = n X i =1 [ E [ ( Y t ( α,β ) >y ) H (1 , ( Y t ( α, β ))] − × { E [ ∇ l Y t ( α, β ) ( Y t ( α,β ) >y ) H (1 , , , ( α, β ) + ( Y t ( α, β ) − y ) + ∇ l H (1 , , , ( α, β )] } and l = α or β . The Malliavin derivative of Y t computes as follows: D s Y (1) t = β − α Z ts D s Y (2) u du D s Y (2) t = β − β Z ts D s Y (1) u du. The covariance matrix γ t is given by ( h D · Y it , D · Y jt i ) ≤ i,j ≤ . The inverse of the covariancematrix satisfies the following SDE γ − t = − Z t [ γ − u M + M T γ − u ] du, where M = (cid:20) αβ (cid:21) Now, it remains to compute the quantities H (1 , and H (1 , , , . This can be done usingthe recursive formulas in Proposition 3.12, but we need to keep in mind that higher orderderivatives of Y are equal to zero, thus they will be simplified. Indeed, H (1) ( Y t ) = X j =1 Y t Z t ( γ − s ) j D s Y jt dB js − c H Z t Z t D s Y jt Q rt | r − s | H − drds. Moreover, we can easily see that H (1 , ( Y t ) = H (1) ( Y t ) Z t Q st dB s − c H Z t Z t D s H (1) ( Y t ) Q rt | r − s | H − drdsH (1 , , , ( Y t ) = H (1 , , ( Y t ) Z t Q st dB s − c H Z t Z t D s H (1 , , ( Y t ) Q rt | r − s | H − drds Of course, recall that Q pjst = ( γ − s ) pj D s Y jt . In practice, these quantities are computed re-cursively. The last step is to compute the derivative of H (1 , , , ( Y t ) with respect to α and β , which in this case is not as complicated and compute the MLEs using the algorithmdiscussed in the previous section. The table below summarizes our results, and we haveplotted the corresponding histograms in Figure 1.Parameter True Value MLE Standard Error α β Remark . The value of H used for the simulation of the process is 0.6. The number ofobservations is n = 50 , the number of Euler steps is N = 500 , the number of stochasticapproximation steps is K = 50 and the number of MC simulations M = 500 .5.3. Application to financial data.
One of the most popular applications of fractionalSDEs is in finance. Hu and Oksendal, [20], introduced the fractional Black-Scholes modelin order to account for inconsistencies of the existing models in practice. More specifically,the stock price is described therein by a fractional geometric Brownian motion with Hurstparameter / < H < . The choice of this model is based on empirical studies thatdisplayed the presence of long-range dependence on stock prices, for example in Willinger,Taqqu and Teverovsky, [42]. N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 31
Histogram of alpha alpha D en s i t y Histogram of beta beta D en s i t y Drift Parameter. Diffusion Parameter.
Figure 1.
Empirical Distribution of the estimators for α and β .However, the presence of fractional Brownian motion in the model allows for arbitragein the general setting. It has been shown that arbitrage opportunities can be avoided in anumber of ways, for example the reader can refer to Rogers [39], Dasgupta and Kallianpur[7] and Cheridito [5]. We choose to model the stock price as as follows: dS t = µS t dt + σdB t , (44)where B is a fractional Brownian motion with Hurst index / < H < . For this SDE(as well as for a more general class of fractional SDEs) Guasoni, [13], proved that thereis no arbitrage when transaction costs are present.Our goal is to estimate the unknown parameters µ and σ based on daily observations ofthe S&P 500 index (data from June 2010 until December 2010). Since the Hurst parameteris piece-wise constant, we devide the data in three groups (of 50 daily observations each)and we compute for each one the Hurst index using the Rescaled-Range (R/S) statistic.We obtain that for the first group of data ˆ H = 0 . , for the second ˆ H = 0 . and forthe third one ˆ H = 0 . . For the different groups, we apply our maximum likelihoodapproach in order to estimate µ and σ . The estimates are summarized in the followingtable:Estim. Parameters Group 1: ˆ H = 0 . Group 2: ˆ H = 0 . Group 3: ˆ H = 0 . µ ˆ σ Remark . The volatility is computed in years. In addition, during this period of timethe historical volatility is around 0.38, which is coherent with our own estimation.
References [1] F. Baudoin, L. Coutin: Operators associated with a stochastic differential equation driven by frac-tional Brownian motions.
Stoch. Proc. Appl. (2007), no. 5, 550–574.[2] F. Baudoin, C. Ouyang: Gradient Bounds for Solutions of Stochastic Differential Equations Drivenby Fractional Brownian Motions.
Arxiv
Preprint (2011).[3] F. Baudoin, C. Ouyang, S. Tindel: Gaussian bounds for the density of solutions of stochastic differ-ential equations driven by fractional Brownian motions. In preparation.[4] J. R. Blum: Multidimensional stochastic approximation methods.
The Annals of Mathematical Sta-tistics. (1954), no. 4, 737–744.[5] P. Cheridito: Arbitrage in fractional Brownian motion models. Finance Stoch. , (2003), no. 4,533–553.[6] R. Dalang, E. Nualart: Potential theory for hyperbolic SPDEs. Ann. Probab. (2004), no. 3A,2099–2148.[7] A. Dasgupta and G. Kallianpur: Arbitrage opportunities for a class of Gladyshev processes, Appl.Math. Optim. , (2000), no. 3, 377-âĂŞ385.[8] A. Deya, A. Neuenkirch, S. Tindel: A Milstein-type scheme without Lévy area terms for SDEsdriven by fractional Brownian motion. Arxiv preprint (2010). To appear in
Ann. Inst. Henri PoincarProbab. Stat. .[9] D. Duffie, P. Glynn: Efficient Monte Carlo simulation of security prices.
Ann. Appl. Probab. (4)(1995), 897–905.[10] G. Durham, A. Gallant: Numerical techniques for maximum likelihood estimation of continuous-timediffusion processes. J. Bus. Econom. Statist. (2002), no. 3, 297–338.[11] P. Friz and N. Victoir: Multidimensional stochastic processes as rough paths. Theory and applications.
Cambridge University Press, 2010.[12] E. Gobet: Local asymptotic mixed normality property for elliptic diffusion: a Malliavin calculusapproach.
Bernoulli (2001), no. 6, 899–912.[13] P. Guasoni: No arbitrage under transaction costs, with fractional Brownian motion and beyond. Mathematical Finance , (2006), no. 3, 569–582.[14] M. Gubinelli: Controlling rough paths. J. Funct. Anal. , 86-140 (2004).[15] M. Hairer: Ergodicity of stochastic differential equations driven by fractional Brownian motion.
Ann.Probab. (2005), no. 2, 703–758.[16] M. Hairer, A. Majda: A simple framework to justify linear response theory. Nonlinearity (2010),no. 4, 909–922.[17] M. Hairer, A. Ohashi: Ergodicity theory of SDEs with extrinsic memory. Ann. Probab. (2007),no. 5, 1950–1977.[18] M. Hairer, S. Pilai: Ergodicity of hypoelliptic SDEs driven by fractional Brownian motion. Arxiv
Preprint (2009).[19] Y. Hu, D. Nualart: Differential equations driven by Hölder continuous functions of order greaterthan / . Abel Symp. (2007), 349-413.[20] Y. Hu and B. Oksendal: Fractional white noise calculus and applications to finance, Infnite dimen-tional analysis, quantum probbility and related topics , (2003), no. 1, 1–32.[21] Y. Hu, B. Oksendal and A. Sulem: Optimal sonsumption and portfolio in a Black-Scholes marketdriven by fractional Brownian motion, Infnite dimentional analysis, quantum probbility and relatedtopics , (2003), no. 4, 519–536.[22] R. Kasonga: Maximum likelihood theory for large interacting systems. SIAM J. Appl. Math. (1990), no. 3, 865–875.[23] M. Kleptsyna, A. Le Breton: Statistical analysis of the fractional Ornstein-Uhlenbeck type process. Stat. Inference Stoch. Process. (2002), no. 3, 229–248.[24] A. Kohatsu-Higa: Lower bounds for densities of uniformly elliptic random variables on Wiener space. Probab. Theory Related Fields (2003), no. 3, 421–457.[25] S. Kou, X. Sunney-Xie: Generalized Langevin equation with fractional Gaussian noise: subdiffusionwithin a single protein molecule.
Phys. Rev. Lett. , no. 18 (2004). N INFERENCE FOR FRACTIONAL DIFFERENTIAL EQUATIONS 33 [26] H. J. Kushner, G.G. Yin:
Stochastic approximation algorithms and applications.
Springer-Verlag,1997.[27] A. Lejay (2003):
An Introduction to Rough Paths.
Séminaire de probabilités 37, vol. of LectureNotes in Mathematics, 1-59.[28] J. León, S. Tindel: Malliavin calculus for fractional delay equations.
Arxiv
Preprint (2009).[29] T. Lyons, Z. Qian (2002):
System control and rough paths.
Oxford University Press.[30] Y. Mishura, G. Shevchenko: The rate of convergence for Euler approximations of solutions of stochas-tic differential equations driven by fractional Brownian motion.
Stochastics (5) (2008), 489–511.[31] A. Neuenkirch, I. Nourdin, A. Rößler, S. Tindel : Trees and asymptotic developments for fractionaldiffusion processes. Ann. Inst. Henri Poincar Probab. Stat. (2009), no. 1, 157–174.[32] D. Nualart: Malliavin Calculus and Related Topics.
Springer-Verlag, 2006.[33] D. Nualart, A. Rˇaşcanu: Differential equations driven by fractional Brownian motion.
Collect. Math. no. 1 (2002), 55-81.[34] D. Nualart, B. Saussereau: Malliavin calculus for stochastic differential equations driven by a frac-tional Brownian motion. Stochastic Process. Appl. (2009), no. 2, 391–409.[35] D. Odde, E. Tanaka, S. Hawkins, H. Buettner: Stochastic dynamics of the nerve growth cone and itsmicrotubules during neurite outgrowth.
Biotechnology and Bioengineering , no. 4, 452-461 (1996).[36] A. Pedersen: Consistency and asymptotic normality of an approximate maximum likelihood estima-tor for discretely observed diffusion processes. Bernoulli (1995), no. 3, 257–279.[37] A. Papavasiliou, C. Ladroue: Parameter estimation for rough differential equations. Arxiv preprint(2009).[38] Ll. Quer, D. Nualart: Gaussian density estimates for solutions to quasi-linear stochastic partialdifferential equations.
Stochastic Process. Appl. (2009), no. 11, 3914–3938.[39] L. C. G. Rogers: Arbitrage with fractional Brownian motion,
Math. Finance , (1997), no. 1, 95–105.[40] M. Sorensen: Parametric inference for discretely sampled stochastic differential equations. In An-dersen, T.G. Davis, R.A., Kreiss, J.-P. and Mikosch, T. (eds.): Handbook of Financial Time Series,Springer (2009), 531 - 553.[41] C. Tudor, F. Viens: Statistical aspects of the fractional stochastic calculus. Ann. Statist. (2007),no. 3, 1183–1212.[42] W. Willinger, M. S. Taqqu and V. Teverovsky: Stock market prices and long-range dependence, Finance Stoch. , (1999), no. 1, 1–13.[43] M. Zähle: Integration with respect to fractal functions and stochastic calculus I. Probab. TheoryRelat. Fields (1998), 333-374.
Alexandra Chronopoulou, Samy Tindel: